Finish and benchmark inline/mem2reg improvements. by eddyb · Pull Request #552 · Rust-GPU/rust-gpu

eddyb · 2026-04-07T17:21:10Z

Draft pull request until it can be rebased again, and measurements retaken.

Background for these inliner and mem2reg changes:

[Migrated] rustc_codegen_spirv taking a long time processing my (large) shader #113
in early 2022 (by @hatoo) was the first sign that our mem2reg didn't scale well
[Migrated] Inliner->mem2reg can quadratically amplify already-exponential inlining (turning seconds into minutes). #63
in early 2024 (prompted by @schell) showed top-down inlining to be O(2ⁿ)
(with mem2reg being quadratic on top of that, further amplifying the cost)
most of the changes in this PR (those tagged [2024]) were developed in early 2024,
and almost all of them landed a year later as part of Linker: speedup and debug info preservation #21
(with WIP descriptions, and without the compile time impact being measured)
rustc hangs for a few minutes since recent Rust toolchain upgrade #546
more recently, still points to mem2reg performance leaving a lot to be desired
(I have not been able to reproduce that instance, however, as you'll see below)

In order to properly assess the impact of both already-landed changes, and newer ones,
this PR reverts commits from #21, for a "current main minus any improvements" baseline,
on top of which all changes are reapplied, allowing each step to be measured.
(this also results in a tiny effective diff, since most commits form revert+reapply pairs)

Sadly, @schell's renderling couldn't be included (due to indirect dependencies on specific glam versions), but the other old samples were simple enough to allow automating compatibility (with both Rust-GPU 0.9 and current main), and the measurement commands and raw output are available in a gist.

Commit	Description	^rene@73e827b _{inline/mem2reg}	^{rust-gpu-shadertoys@81a56fd} _{inline/mem2reg}	^{ab-proof-of-space-gpu@fc99af15} _{inline/mem2reg}
`c2f98b6`	_{Last Rust-GPU release (v0.9)}	`1.4s`/`323.5s`	`15.7s`/`2738.1s`
`15d2589`	_{main w/o inline/mem2reg changes ^{(i.e. after all 5 reverts)}}	`1.6s`/`286.6s`	`16.2s`/`2265.1s`	`2.4s`/`354.9s`
`077eaaa`	_{↑+ inliner debuginfo fixes}	`10s`/`313.1s`	`110.2s`/`2515s`	`13.5s`/`377.3s`
`e48c7f9`	_{↑+ bottom-up inliner}	`0.09s`/`50.9s`	`0.4s`/`856.4s`	`1.1s`/`203.2s`
`9a0a9cc`	_{↑+ mem2reg label ID indexing}	`0.09s`/`41.2s`	`0.5s`/`328.2s`	`1.1s`/`67.5s`
`d7ad912`	_{↑+ use phis to inline returns, instead of var+store+load} _{[commit equivalent to main]}	`0.07s`/`4.1s`	`0.4s`/`4.2s`	`1.1s`/`5.6s`
`6c34184`	_{↑+ apply rewrite rules less often in mem2reg}	`0.07s`/`1.9s`	`0.4s`/`2.6s`	`1.1s`/`3.3s`
`d11677c`	_{↑+ remove_duplicate_debuginfo during inlining}	`0.04s`/`1.3s`	`0.3s`/`1.8s`	`1.1s`/`3s`
`f842afe`	_{↑+ mem2reg during inlining}	`0.07s`	`0.9s`	`2.4s`

While the changes with the largest impact have already landed in Rust-GPU, the last 3 commits still result in a combined ~4x reduction in inlining+mem2reg times for the tested shaders (except for rene, where it's more like a 20x reduction).

(and that's without being able to reproduce #546 - until then, it's unclear how much of #547 is subsumed by the last 3 changes in this PR)

… values." This reverts commit fcd1b1e.

This reverts commit efbf694.

This reverts commit ea20ef3.

This reverts commit 41ec7ea.

…rtion." This reverts commit 355122d.

…tion.

…) lookup.

…values.

…_all` invocation.

…nlined function.

LegNeato · 2026-04-07T22:31:05Z

I rebased this on current main. I also did a perf run with this and this PR + cherry-picked #547. It is generally neutral overall, so this PR ate most of the benefit of that.

Details

Setup

Baseline: c665c62c5d (this PR)
Comparison: b92b3359ce (this PR + cherry-picked #547)

Median End-to-End Summary

workload	baseline total median (s)	`+#547` total median (s)	delta
`rene/rene-shader`	2.937	2.790	-5.0%
`rust-gpu-shadertoys/shaders`	19.124	19.303	+0.9%
`abundance/crates/farmer/ab-proof-of-space-gpu`	26.206	26.122	-0.3%

Raw Results

`rene/rene-shader`

metric	baseline median	`+#547` median	median delta	mean delta
`link_inline`	0.070	0.066	-5.7%	-9.9%
`link_block_ordering_pass_and_mem2reg-after-inlining`	0.018	0.016	-11.1%	-16.0%
`link`	2.754	2.495	-9.4%	-3.9%
`total`	2.937	2.790	-5.0%	-3.3%

`rust-gpu-shadertoys/shaders`

metric	baseline median	`+#547` median	median delta	mean delta
`link_inline`	0.686	0.430	-37.3%	-27.5%
`link_block_ordering_pass_and_mem2reg-after-inlining`	0.097	0.040	-58.8%	-51.2%
`link`	18.506	18.680	+0.9%	+4.0%
`total`	19.124	19.303	+0.9%	+3.7%

`abundance/crates/farmer/ab-proof-of-space-gpu`

metric	baseline median	`+#547` median	median delta	mean delta
`link_inline`	2.342	2.176	-7.1%	-10.1%
`link_block_ordering_pass_and_mem2reg-after-inlining`	0.293	0.244	-16.7%	-18.0%
`link`	25.991	25.905	-0.3%	+8.9%
`total`	26.206	26.122	-0.3%	+8.8%

Firestar99

Looks good, one question:

In the last commit f842afe mem2reg during inlining, you're measurements only have one value as if moved mem2reg to inlining, but you just added mem2reg to inlining without removing it whereever it was called previously.
So it should still have a (potentially small) runtime, or can it be removed from where it's called previously?

eddyb · 2026-04-08T09:07:34Z

I rebased this on current main. I also did a perf run ... It is generally neutral overall, so this PR ate most of the benefit of that.

Thanks, that's good to know. When it comes to rebasing, I will need to force-push a version that locally has the right commit identity for jj, sorry in advance (but I won't do that before updating the numbers).

In the last commit f842afe mem2reg during inlining, you're measurements only have one value as if moved mem2reg to inlining, but you just added mem2reg to inlining without removing it whereever it was called previously. So it should still have a (potentially small) runtime, or can it be removed from where it's called previously?

In theory the separate invocations of mem2reg might have become redundant, but I didn't want to risk them being skipped in case e.g. the inliner only runs mem2reg if inlining actually happened.

I could modify the separate mem2reg invocation to assert that nothing changes (i.e. we're only paying for a redundant rescanning), and if that never gets hit in anything we have access to, we can remove it.
(I was mainly trying to get big wins from small diffs, with minimal/zero risk of breakage)

Also see #547 (comment) for an update from @nazar-pc on the #546 mystery.
(I might re-measure on top of the Rust-GPU commit they were on, now that I have the automation for it)

LegNeato · 2026-04-08T17:40:50Z

Sure, force-push away. I guess I should probably switch to jj.

eddyb mentioned this pull request Apr 7, 2026

batch mem2reg to process all variables in a single pass #547

Open

eddyb added 13 commits April 7, 2026 12:59

Revert "linker/inline: use OpPhi instead of OpVariable for return…

eeffeaa

… values." This reverts commit fcd1b1e.

Revert "WIP: mem2reg speedup"

1ee7ca3

This reverts commit efbf694.

Revert "WIP: couple of inliner things that need to be disentangled"

20a4e71

This reverts commit ea20ef3.

Revert "WIP: (TODO: finish bottom-up cleanups) bottom-up inlining"

61cdafd

This reverts commit 41ec7ea.

Revert "linker/inline: fix OpVariable debuginfo collection and inse…

0c93d11

…rtion." This reverts commit 355122d.

[2024] linker/inline: fix OpVariable debuginfo collection and inser…

aec6a02

…tion.

[2024] linker/inline: use bottom-up inlining to minimize redundancy.

3920ddc

[2024] linker/mem2reg: index SPIR-V blocks by their label IDs for O(1…

19b7472

…) lookup.

[2024] linker/inline: use OpPhi instead of OpVariable for return …

e3c8836

…values.

linker/inline: fix typos in comments.

e4ac5c8

[2024] linker/mem2reg: apply rewrite rules only once per `insert_phis…

045b98c

…_all` invocation.

linker/inline: also run remove_duplicate_debuginfo on every fully-i…

3c3912e

…nlined function.

linker/inline: also run mem2reg on every fully-inlined function.

c665c62

LegNeato force-pushed the eddyb/faster-inline-mem2reg branch from f842afe to c665c62 Compare April 7, 2026 21:25

Update rebased compiletest expectations

0ae3cd0

Firestar99 approved these changes Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finish and benchmark inline/mem2reg improvements.#552

Finish and benchmark inline/mem2reg improvements.#552
eddyb wants to merge 14 commits intomainfrom
eddyb/faster-inline-mem2reg

eddyb commented Apr 7, 2026 •

edited

Loading

Uh oh!

LegNeato commented Apr 7, 2026 •

edited

Loading

Setup

Median End-to-End Summary

Raw Results

`rene/rene-shader`

`rust-gpu-shadertoys/shaders`

`abundance/crates/farmer/ab-proof-of-space-gpu`

Uh oh!

Firestar99 left a comment

Uh oh!

eddyb commented Apr 8, 2026

Uh oh!

LegNeato commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eddyb commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LegNeato commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Setup

Median End-to-End Summary

Raw Results

rene/rene-shader

rust-gpu-shadertoys/shaders

abundance/crates/farmer/ab-proof-of-space-gpu

Uh oh!

Firestar99 left a comment

Choose a reason for hiding this comment

Uh oh!

eddyb commented Apr 8, 2026

Uh oh!

LegNeato commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eddyb commented Apr 7, 2026 •

edited

Loading

LegNeato commented Apr 7, 2026 •

edited

Loading

`rene/rene-shader`

`rust-gpu-shadertoys/shaders`

`abundance/crates/farmer/ab-proof-of-space-gpu`