A handful of optimizations for the DRC collector#12974
Open
fitzgen wants to merge 7 commits intobytecodealliance:mainfrom
Open
A handful of optimizations for the DRC collector#12974fitzgen wants to merge 7 commits intobytecodealliance:mainfrom
fitzgen wants to merge 7 commits intobytecodealliance:mainfrom
Conversation
Also add fast-path entry points that take a `u32` size directly that has already been rounded to the free list's alignment. Altogether, this shaves off ~309B instructions retired (48%) from the benchmark in bytecodealliance#11141
Ideally we would just use a `SecondaryMap<VMSharedTypeIndex, TraceInfo>` here but allocating `O(num engine types)` space inside a store that uses only a couple types seems not great. So instead, we just have a fixed size cache that is probably big enough for most things in practice.
Inline `dec_ref`, `trace_gc_ref`, and `dealloc` into `dec_ref_and_maybe_dealloc`'s main loop so that we read the `VMDrcHeader` once per object to get `ref_count`, type index, and `object_size`, avoiding 3 separate GC heap accesses and bounds checks per freed object. For struct tracing, read gc_ref fields directly from the heap slice at known offsets instead of going through gc_object_data → object_range → object_size which would re-read the object_size from the header. 301,333,979,721 -> 291,038,676,119 instructions (~3.4% improvement)
…exists When the GC store is already initialized and the allocation succeeds, avoid async machinery entirely. This avoids the overhead of taking/restoring fiber async state pointers on every allocation. 291,038,676,119 -> 230,503,364,489 instructions (~20.8% improvement)
Avoids converting `ModuleInternedTypeIndex` to `VMSharedTypeIndex` in host code, which requires look ups in the instance's module's `TypeCollection`. We already have helpers to do this conversion inline in JIT code. 230,503,364,489 -> 216,937,168,529 instructions (~5.9% improvement)
Moves the `externref` host data cleanup inside the `ty.is_none()` branch of `dec_ref_and_maybe_dealloc`, since only `externref`s have host data. Additionally the type check is sort of expensive since it involves additional bounds-checked reads from the GC heap.
79013cf to
56a5b5a
Compare
Subscribe to Label Actioncc @fitzgen DetailsThis issue or pull request has been labeled: "wasmtime:api", "wasmtime:ref-types"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on #12969
See each commit message for details.
More coming soon after this.