Conversation
MVCC PebbleDB changes: - Invert version ordering so newest versions sort first per key, making historical lookups a direct SeekGE instead of an unbounded reverse seek. - Introduce a per-request historical read session that holds one Pebble snapshot plus reusable iterators per store prefix, eliminating the per-read iterator create/destroy overhead. - Add a request-scoped (module, key, version) read cache so repeated hot-key reads inside a single trace skip Pebble entirely. - Add a sentinel-versioned latest-value fast path (math.MaxInt64 per key) usable as a bloom-filter accelerated db.Get when tracing at or past the latest committed height. - Harden batch writes: sort MVCC batches before commit, guard against empty store keys/keys to keep state-sync imports safe, and commit the latest-version metadata key separately so it never violates Pebble's strictly-increasing batch invariant. - Extend iterator navigation to drive MVCC seeks by logical key under the new descending-version encoding, with forward/reverse helpers and defensive skips for the sentinel pointer. Tracing hooks: - Add ReadTraceCollector / ReadTraceEvent / TraceableStateStore types in db_engine/types so callers can optionally observe low-level reads. The MVCC Database implements TraceableStateStore as a no-op when no collector is attached, so this change is zero-cost in the default configuration; the actual consumers live in the follow-up profiling PR. evmrpc: - Add a block-level replay cache keyed by block hash + tx index so a debug trace request that replays txs [0..N) against the historical state reuses the post-tx context from earlier calls in the same block instead of replaying from scratch. No migration is required: the new on-disk version encoding is applied on fresh state-synced data, and the latest-value sentinel is additive. Made-with: Cursor
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3266 +/- ##
==========================================
- Coverage 59.30% 59.23% -0.08%
==========================================
Files 2071 2073 +2
Lines 169811 170250 +439
==========================================
+ Hits 100713 100843 +130
- Misses 60327 60623 +296
- Partials 8771 8784 +13
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
ReplayTransactionTillIndex was storing sdk.Context values whose MultiStore field still pointed at the live replay state. The -1 checkpoint was cached before replay started but then advanced in place as transactions were applied, and later cache hits reused the stored checkpoint directly. Store a branched CacheMultiStore snapshot when caching a checkpoint, and branch again when resuming from one, so each replay starts from an immutable saved state instead of mutating the cached checkpoint itself. Made-with: Cursor
sei-db prune: the existing NextPrefix guard was written for ascending version order. Under the new descending encoding the newest version of a key is seen first, so whenever the newest version was above the prune height the iterator skipped to the next logical key and every older version (including ones below the prune height) was leaked on disk. Rewrite the loop around per-logical-key state: on entering a new key with newest > prune, seek straight to the first version <= prune, then keep one version below the prune height (when KeepLastVersion=true) and delete the rest. The fast-path seek preserves the old optimization of skipping above-prune versions. evmrpc replay cache: replayStateCache grew one entry per (block, tx) forever with no eviction, each holding a branched CacheMultiStore. Wrap it in an expirable LRU bounded by block count (32) and TTL (10m); keep per-block mutexes so inner-map updates stay safe under concurrent traces. Regression tests: - sei-db/.../prune_test.go exercises the descending bug directly by reading raw pebble entries after Prune, covering keepLast true/false, mixed above/below-prune keys, and sentinel preservation. They fail on the pre-fix code. - evmrpc/simulate_cache_test.go covers get/put, LRU eviction past the cap, and concurrent access under -race.
- Change replayStateCacheMu to *sync.Mutex so Backend copies stay safe under copylocks (Backend has value-receiver methods and tracers.go dereferences the pointer into a local). - Annotate two int64->uint64 conversions on non-negative block heights with nolint:gosec, matching existing conventions in this package. - Wrap recordReadTrace defer calls in closures so time.Since is evaluated at defer fire time, not at defer registration. - Drop redundant Database selector from tracedDatabase iterator calls.
Drops the reserved math.MaxInt64 latest-pointer entry and returns the historical read path to the inverted-order SeekGE lookup plus the request-scoped read cache and reusable iterator session. Benchmarks showed the sentinel pointer made traces slower, not faster: the extra db.Get per read plus the write amplification from emitting a pointer entry per set outweighed the bloom-filter-accelerated hit. This keeps the other MVCC wins while removing the sentinel. Also drops the sentinel-skip branches in Prune/RawIterate/iterator and the corresponding sentinel-specific prune test.
- Drop unused batchOp.order field; sortBatchOps sorts by key only. - Drop unused NewReadTraceEvent helper and its time import. - Fix misleading Reverse:true trace events in getMVCCSlice and getMVCCSliceWithSession; these paths are forward SeekGE + First. - Drop stale "reverse iteration is NOT supported" comment on iterator. - Rename shadowed builtin cap to size in replay-cache eviction test.
- Move latestVersionKey Set into the pebble batch so data and version metadata commit atomically, restoring the invariant that crash recovery observes a consistent version pointer. - Extract decodeMVCCEntry to share the post-seek validate/decode/clone logic between getMVCCSlice and getMVCCSliceWithSession. - Drop the fine-grained per-step trace events on the Get hot path and route the remaining top-level timing through traceGetMVCCSlice, which short-circuits when no collector is attached (no slices.Clone in the default zero-collector path). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drops the ReadTrace types, per-op trace hooks, tracedDatabase wrapper, and historicalReadSession read-cache. This branch is scoped to MVCC perf changes; the tracing plumbing belongs with the debug trace-profile endpoint PR and can be re-introduced there once it lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reject legacy ascending-version databases on open. On a populated DB lacking the s/_mvcc_descending marker, return an error that points operators at state sync; on an empty DB, write the marker so future opens fast-path. Protects against silent version mis-decoding after an in-place upgrade to the new encoding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Archive nodes that cannot state-sync need to keep reading and writing
their existing ascending-version MVCC databases. Rather than forcing a
migration, detect the on-disk encoding at open time and dispatch to the
matching code path.
- OpenDB sets Database.descending from the s/_mvcc_descending sentinel:
present or empty DB -> descending (fast path, marker written on
fresh DBs); marker absent on a populated DB -> ascending legacy mode,
no error, no marker write.
- Public Get/Has/Iterator/ReverseIterator/Prune dispatch to *Descending
or *Ascending implementations. Batch threads the mode through writes
(NewBatchWithMode) so ApplyChangeset / Import / DeleteKeysAtVersion
persist in the DB's native encoding.
- File layout makes the split obvious for review:
- db_descending.go, iterator_descending.go -- perf fast path (this
branch's rewrite)
- db_ascending.go, iterator_ascending.go -- verbatim port of main's
legacy implementation, only adjusted to use MVCCEncodeAscending and
the ascendingIterator type
- comparator.go -- both encode/decode
variants plus a parameterized MVCCEncode(key, v, descending) for
shared call sites
- Tests cover all three open paths (fresh-writes-marker, legacy-opens-
ascending, marked-roundtrip) including a write+read+reopen cycle on a
seeded legacy DB to confirm the ascending path is end-to-end correct.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop three ornamental wrappers flagged by review: - Batch.appendSet / Batch.appendDelete / RawBatch.appendSet were each five-line clone-and-append helpers called 1-3 times; inline at the call sites. - NewBatch / NewRawBatch were zero-argument wrappers that always forwarded with descending=true to NewBatchWithMode. Drop the wrappers, rename NewBatchWithMode -> NewBatch and NewRawBatchWithMode -> NewRawBatch so the mode is required at the one kind of call site that exists. - snapshotReplayState was a one-liner ctx.WithMultiStore(ctx.MultiStore() .CacheMultiStore()) wrapper with two call sites. Inline it. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The block-level replay-state LRU only paid off when trace requests clustered on the same block in quick succession. evmrpc handles one call at a time and can't observe whether callers are clustering; workload-aware prefix reuse belongs above this layer (in a block explorer, indexer, or tracing harness that actually knows its access pattern). Baking it into the RPC added stateful fields on Backend, an LRU size / TTL we picked without data, and a double-lock put path that existed only to serve a workload the RPC can't verify. Drop the cache entirely: - blockReplayState, replayStateCache, replayStateCacheMu, replayStateCacheBlocks, replayStateCacheTTL from evmrpc/simulate.go - getReplayState, putReplayState methods - the cache check + two putReplayState calls in ReplayTransactionTillIndex; unconditionally replay from tx 0 - evmrpc/simulate_cache_test.go (no longer meaningful) - the golang-lru/v2/expirable import (still used elsewhere in evmrpc) Per-call perf is unaffected: the descending-version MVCC encoding still makes each replayed read cheap, which is the right layer for a per-call optimization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The three-way file split (db.go / db_descending.go / db_ascending.go)
implied two equal modes when the intent is one canonical path plus a
backstop. Fold the descending implementation back into db.go so the
canonical Database lives in one place, and keep db_ascending.go and
iterator_ascending.go as the clearly-labeled legacy carveouts. Rename
iterator_descending.go -> iterator.go for the same reason: it is the
iterator; iterator_ascending.go is the compat variant.
File layout after:
- db.go - Database, OpenDB, dispatchers, writes, descending
mode implementation, shared helpers, metrics
- db_ascending.go - legacy ascending-mode carveout for pre-migration DBs
- iterator.go - descending iterator (the iterator)
- iterator_ascending.go - legacy iterator carveout
No behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two identical code paths were waiting for a single kernel: - retrieveLatestVersion and retrieveEarliestVersion differed only in the metadata key name. Fold both into retrieveVersionKey(db, key). - Batch.Write and RawBatch.Write share the otel metrics defer, the pebble batch open/close, sortBatchOps, and the ops-application loop. The only delta is the latestVersionKey stamp in Batch.Write. Extract writeBatchOps(storage, ops, beforeCommit) and pass the stamp as a hook from Batch.Write; RawBatch.Write passes nil. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Leftover from the earlier replay-cache revert — the one-line whitespace delta was the only difference from main in this file. Match main exactly so simulate.go drops out of the PR's diff stat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yzang2019
approved these changes
Apr 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes and provide context
Roughly another ~2x speedup on debug_trace
Testing performed to validate your change