Use bump allocation in DRC free list and other improvements#12969
Use bump allocation in DRC free list and other improvements#12969fitzgen merged 2 commits intobytecodealliance:mainfrom
Conversation
Subscribe to Label Actioncc @fitzgen DetailsThis issue or pull request has been labeled: "wasmtime:api", "wasmtime:ref-types"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
|
With this approach, can't deallocation be quadractic w.r.t. the number of blocks? Insofar as this has removal/insertion from a sorted vector which is O(n), so deallocating N items could result in quadratic runtime? |
|
Hm yeah I suppose so. I can re-add the |
|
For the bump part though, is this just a tradeoff of rss vs throughput (or something like that)? I'm not sure how big GC heaps are by default but if they're big enough this seems not great where it'll unconditionally fill up the entire linear memory before reusing anything that's been deallocated. Or is the initial size of linear memories typically smaller? |
The initial size is one Wasm page. After that, right now, we always prefer growing the GC heap to collecting when possible, but that is changing in #12942 |
Also add fast-path entry points that take a `u32` size directly that has already been rounded to the free list's alignment. Altogether, this shaves off ~309B instructions retired (48%) from the benchmark in bytecodealliance#11141
14c182c to
fc82c63
Compare
|
Latest commit switched back to |
alexcrichton
left a comment
There was a problem hiding this comment.
Personally I still sort of feel like the true way forward here will be to compile the allocation algorithm to wasm itself. We talked about this historically, and I realize it's far out of scope for this PR, but I feel like we're going to otherwise endlessly golf various forms of allocators here and there with heuristics and such, whereas putting in wasm with a strict API would make it much easier to experiment and play with at least.
| let (&block_index, &block_len) = self | ||
| .free_block_index_to_len | ||
| .iter() | ||
| .find(|(_, len)| **len >= alloc_size)?; |
There was a problem hiding this comment.
Oh, well, uh, I guess this is quadratic as well... I realize this is preexisting, though, so sorry I didn't see this earlier :(
Mind tagging this with a FIXME and an issue? We presumably don't want to make this tier 1 with a quadratic-complexity allocator...
There was a problem hiding this comment.
Edit: nevermind, with enough extra btrees we can make this work
I agree, I don't think that we need to touch the free list again after this PR (plus the follow up fixme) until if/when we self-host the free list. |
BTreeMapand use a cache-friendlyVecinstead.u32size directly that has already been rounded to the free list's alignment.Altogether, this shaves off ~309B instructions retired (48%) from the benchmark in #11141