Finish and benchmark inline/mem2reg improvements.#552
Conversation
f842afe to
c665c62
Compare
|
I rebased this on current DetailsSetup
Median End-to-End Summary
Raw Results
|
| metric | baseline median | +#547 median |
median delta | mean delta |
|---|---|---|---|---|
link_inline |
0.070 | 0.066 | -5.7% | -9.9% |
link_block_ordering_pass_and_mem2reg-after-inlining |
0.018 | 0.016 | -11.1% | -16.0% |
link |
2.754 | 2.495 | -9.4% | -3.9% |
total |
2.937 | 2.790 | -5.0% | -3.3% |
rust-gpu-shadertoys/shaders
| metric | baseline median | +#547 median |
median delta | mean delta |
|---|---|---|---|---|
link_inline |
0.686 | 0.430 | -37.3% | -27.5% |
link_block_ordering_pass_and_mem2reg-after-inlining |
0.097 | 0.040 | -58.8% | -51.2% |
link |
18.506 | 18.680 | +0.9% | +4.0% |
total |
19.124 | 19.303 | +0.9% | +3.7% |
abundance/crates/farmer/ab-proof-of-space-gpu
| metric | baseline median | +#547 median |
median delta | mean delta |
|---|---|---|---|---|
link_inline |
2.342 | 2.176 | -7.1% | -10.1% |
link_block_ordering_pass_and_mem2reg-after-inlining |
0.293 | 0.244 | -16.7% | -18.0% |
link |
25.991 | 25.905 | -0.3% | +8.9% |
total |
26.206 | 26.122 | -0.3% | +8.8% |
Firestar99
left a comment
There was a problem hiding this comment.
Looks good, one question:
In the last commit f842afe mem2reg during inlining, you're measurements only have one value as if moved mem2reg to inlining, but you just added mem2reg to inlining without removing it whereever it was called previously.
So it should still have a (potentially small) runtime, or can it be removed from where it's called previously?
Thanks, that's good to know. When it comes to rebasing, I will need to force-push a version that locally has the right commit identity for
In theory the separate invocations of I could modify the separate Also see #547 (comment) for an update from @nazar-pc on the #546 mystery. |
|
Sure, force-push away. I guess I should probably switch to |
Draft pull request until it can be rebased again, and measurements retaken.
Background for these inliner and
mem2regchanges:rustc_codegen_spirvtaking a long time processing my (large) shader #113in early 2022 (by @hatoo) was the first sign that our
mem2regdidn't scale wellmem2regcan quadratically amplify already-exponential inlining (turning seconds into minutes). #63in early 2024 (prompted by @schell) showed top-down inlining to be
O(2ⁿ)(with
mem2regbeing quadratic on top of that, further amplifying the cost)[2024]) were developed in early 2024,and almost all of them landed a year later as part of Linker: speedup and debug info preservation #21
(with WIP descriptions, and without the compile time impact being measured)
rustchangs for a few minutes since recent Rust toolchain upgrade #546more recently, still points to
mem2regperformance leaving a lot to be desired(I have not been able to reproduce that instance, however, as you'll see below)
In order to properly assess the impact of both already-landed changes, and newer ones,
this PR reverts commits from #21, for a "current
mainminus any improvements" baseline,on top of which all changes are reapplied, allowing each step to be measured.
(this also results in a tiny effective diff, since most commits form revert+reapply pairs)
Sadly, @schell's
renderlingcouldn't be included (due to indirect dependencies on specificglamversions), but the other old samples were simple enough to allow automating compatibility (with both Rust-GPU0.9and currentmain), and the measurement commands and raw output are available in a gist.rene@73e827binline/mem2regrust-gpu-shadertoys@81a56fdinline/mem2regab-proof-of-space-gpu@fc99af15inline/mem2regv0.9)1.4s/323.5s15.7s/2738.1smainw/o inline/mem2reg changes(i.e. after all 5 reverts)
1.6s/286.6s16.2s/2265.1s2.4s/354.9s10s/313.1s110.2s/2515s13.5s/377.3s0.09s/50.9s0.4s/856.4s1.1s/203.2smem2reglabel ID indexing0.09s/41.2s0.5s/328.2s1.1s/67.5sinstead of var+store+load
[commit equivalent to
main]0.07s/4.1s0.4s/4.2s1.1s/5.6sless often in
mem2reg0.07s/1.9s0.4s/2.6s1.1s/3.3sremove_duplicate_debuginfoduring inlining0.04s/1.3s0.3s/1.8s1.1s/3smem2regduring inlining0.07s0.9s2.4sWhile the changes with the largest impact have already landed in Rust-GPU, the last 3 commits still result in a combined ~4x reduction in inlining+
mem2regtimes for the tested shaders (except forrene, where it's more like a 20x reduction).(and that's without being able to reproduce #546 - until then, it's unclear how much of #547 is subsumed by the last 3 changes in this PR)