Skip to content

Finish and benchmark inline/mem2reg improvements.#552

Draft
eddyb wants to merge 14 commits intomainfrom
eddyb/faster-inline-mem2reg
Draft

Finish and benchmark inline/mem2reg improvements.#552
eddyb wants to merge 14 commits intomainfrom
eddyb/faster-inline-mem2reg

Conversation

@eddyb
Copy link
Copy Markdown
Member

@eddyb eddyb commented Apr 7, 2026

Draft pull request until it can be rebased again, and measurements retaken.


Background for these inliner and mem2reg changes:


In order to properly assess the impact of both already-landed changes, and newer ones,
this PR reverts commits from #21, for a "current main minus any improvements" baseline,
on top of which all changes are reapplied, allowing each step to be measured.
(this also results in a tiny effective diff, since most commits form revert+reapply pairs)

Sadly, @schell's renderling couldn't be included (due to indirect dependencies on specific glam versions), but the other old samples were simple enough to allow automating compatibility (with both Rust-GPU 0.9 and current main), and the measurement commands and raw output are available in a gist.


Commit Description rene@73e827b
inline/mem2reg
rust-gpu-shadertoys@81a56fd
inline/mem2reg
ab-proof-of-space-gpu@fc99af15
inline/mem2reg
c2f98b6 Last Rust-GPU release (v0.9) 1.4s/323.5s 15.7s/2738.1s
15d2589 main w/o inline/mem2reg changes
(i.e. after all 5 reverts)
1.6s/286.6s 16.2s/2265.1s 2.4s/354.9s
077eaaa ↑+ inliner debuginfo fixes 10s/313.1s 110.2s/2515s 13.5s/377.3s
e48c7f9 ↑+ bottom-up inliner 0.09s/50.9s 0.4s/856.4s 1.1s/203.2s
9a0a9cc ↑+ mem2reg label ID indexing 0.09s/41.2s 0.5s/328.2s 1.1s/67.5s
d7ad912 ↑+ use phis to inline returns,
instead of var+store+load

[commit equivalent to main]
0.07s/4.1s 0.4s/4.2s 1.1s/5.6s
6c34184 ↑+ apply rewrite rules
less often in mem2reg
0.07s/1.9s 0.4s/2.6s 1.1s/3.3s
d11677c ↑+ remove_duplicate_debuginfo during inlining 0.04s/1.3s 0.3s/1.8s 1.1s/3s
f842afe ↑+ mem2reg during inlining 0.07s 0.9s 2.4s

While the changes with the largest impact have already landed in Rust-GPU, the last 3 commits still result in a combined ~4x reduction in inlining+mem2reg times for the tested shaders (except for rene, where it's more like a 20x reduction).

(and that's without being able to reproduce #546 - until then, it's unclear how much of #547 is subsumed by the last 3 changes in this PR)

@LegNeato LegNeato force-pushed the eddyb/faster-inline-mem2reg branch from f842afe to c665c62 Compare April 7, 2026 21:25
@LegNeato
Copy link
Copy Markdown
Collaborator

LegNeato commented Apr 7, 2026

I rebased this on current main. I also did a perf run with this and this PR + cherry-picked #547. It is generally neutral overall, so this PR ate most of the benefit of that.

Details

Setup

  • Baseline: c665c62c5d (this PR)
  • Comparison: b92b3359ce (this PR + cherry-picked #547)

Median End-to-End Summary

workload baseline total median (s) +#547 total median (s) delta
rene/rene-shader 2.937 2.790 -5.0%
rust-gpu-shadertoys/shaders 19.124 19.303 +0.9%
abundance/crates/farmer/ab-proof-of-space-gpu 26.206 26.122 -0.3%

Raw Results

rene/rene-shader

metric baseline median +#547 median median delta mean delta
link_inline 0.070 0.066 -5.7% -9.9%
link_block_ordering_pass_and_mem2reg-after-inlining 0.018 0.016 -11.1% -16.0%
link 2.754 2.495 -9.4% -3.9%
total 2.937 2.790 -5.0% -3.3%

rust-gpu-shadertoys/shaders

metric baseline median +#547 median median delta mean delta
link_inline 0.686 0.430 -37.3% -27.5%
link_block_ordering_pass_and_mem2reg-after-inlining 0.097 0.040 -58.8% -51.2%
link 18.506 18.680 +0.9% +4.0%
total 19.124 19.303 +0.9% +3.7%

abundance/crates/farmer/ab-proof-of-space-gpu

metric baseline median +#547 median median delta mean delta
link_inline 2.342 2.176 -7.1% -10.1%
link_block_ordering_pass_and_mem2reg-after-inlining 0.293 0.244 -16.7% -18.0%
link 25.991 25.905 -0.3% +8.9%
total 26.206 26.122 -0.3% +8.8%

Copy link
Copy Markdown
Member

@Firestar99 Firestar99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one question:

In the last commit f842afe mem2reg during inlining, you're measurements only have one value as if moved mem2reg to inlining, but you just added mem2reg to inlining without removing it whereever it was called previously.
So it should still have a (potentially small) runtime, or can it be removed from where it's called previously?

@eddyb
Copy link
Copy Markdown
Member Author

eddyb commented Apr 8, 2026

I rebased this on current main. I also did a perf run ... It is generally neutral overall, so this PR ate most of the benefit of that.

Thanks, that's good to know. When it comes to rebasing, I will need to force-push a version that locally has the right commit identity for jj, sorry in advance (but I won't do that before updating the numbers).


In the last commit f842afe mem2reg during inlining, you're measurements only have one value as if moved mem2reg to inlining, but you just added mem2reg to inlining without removing it whereever it was called previously. So it should still have a (potentially small) runtime, or can it be removed from where it's called previously?

In theory the separate invocations of mem2reg might have become redundant, but I didn't want to risk them being skipped in case e.g. the inliner only runs mem2reg if inlining actually happened.

I could modify the separate mem2reg invocation to assert that nothing changes (i.e. we're only paying for a redundant rescanning), and if that never gets hit in anything we have access to, we can remove it.
(I was mainly trying to get big wins from small diffs, with minimal/zero risk of breakage)


Also see #547 (comment) for an update from @nazar-pc on the #546 mystery.
(I might re-measure on top of the Rust-GPU commit they were on, now that I have the automation for it)

@LegNeato
Copy link
Copy Markdown
Collaborator

LegNeato commented Apr 8, 2026

Sure, force-push away. I guess I should probably switch to jj.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants