Skip to content

feat: forest-dev export-state-tree [skip ci]#6885

Open
hanabi1224 wants to merge 3 commits intomainfrom
hm/export-state-tree-cmd
Open

feat: forest-dev export-state-tree [skip ci]#6885
hanabi1224 wants to merge 3 commits intomainfrom
hm/export-state-tree-cmd

Conversation

@hanabi1224
Copy link
Copy Markdown
Contributor

@hanabi1224 hanabi1224 commented Apr 9, 2026

Summary of changes

This PR adds a dev tool for exporting state trees together with messages, message receipts and events for a tipset range

forest-dev export-state-tree --chain mainnet --from 5915000 --to 5910000
-rw-------. 1 me me  56G Apr 10 14:46 statetree_mainnet_5910000_5915000.forest.car.zst

forest-dev export-state-tree --chain calibnet--from 3550000 --to 3540000
-rw-------. 1 me me 3.2G Apr 10 10:41 statetree_calibnet_3540000_3550000.forest.car.zst

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Outside contributions

  • I have read and agree to the CONTRIBUTING document.
  • I have read and agree to the AI Policy document. I understand that failure to comply with the guidelines will lead to rejection of the pull request.

Summary by CodeRabbit

  • New Features
    • Added export-state-tree CLI command enabling users to export parent state trees for specified tipset ranges, with customizable database and output paths
    • Added CLI documentation for new forest-dev subcommands including the state tree export functionality
    • Implemented IPLD block streaming utility to efficiently process and serialize blockchain state tree data

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

Walkthrough

This PR introduces a new export-state-tree CLI subcommand that exports consecutive parent state trees for a specified tipset range as a forest CAR file. The implementation includes the command definition, integration into the subcommand dispatcher, CLI documentation, and a new IpldStream utility for traversing IPLD blocks.

Changes

Cohort / File(s) Summary
CLI Documentation
docs/docs/users/reference/cli.sh
Added documentation sections for the new export-state-tree command via generate_markdown_section invocations.
Export State Tree Subcommand
src/dev/subcommands/export_state_tree_cmd.rs
Implemented new ExportStateTreeCommand that computes a tipset range (from exclusive, to inclusive), iterates backward over tipsets, collects parent state roots, message receipts roots, block headers, and event roots, then streams the IPLD data as a forest CAR file to disk.
Subcommand Integration
src/dev/subcommands/mod.rs
Added export_state_tree_cmd module and ExportStateTree enum variant with dispatch logic to invoke the command's run() method.
IPLD Stream Utility
src/ipld/util.rs
Introduced IpldStream<DB> as a pinned async stream that yields CarBlock items by traversing IPLD DAG links from provided root CIDs, deduplicating visited nodes and filtering blocks via should_save_block_to_snapshot.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI Handler
    participant ChainStore
    participant DB as Database<br/>(ManyCar)
    participant IPLDStream as IpldStream
    participant CarEncoder as CAR Encoder
    participant FS as File System

    CLI->>ChainStore: Resolve tipsets in range [to, from]
    ChainStore->>DB: Query blocks by height
    DB-->>ChainStore: Return tipsets
    
    CLI->>ChainStore: Iterate backward over tipsets
    loop For each tipset in range
        ChainStore-->>CLI: Parent state root, receipts root, headers
        CLI->>CLI: Collect IPLD roots
    end
    
    CLI->>IPLDStream: Create stream with collected roots
    
    loop Poll stream for blocks
        IPLDStream->>DB: Load block by CID
        DB-->>IPLDStream: CarBlock data
        IPLDStream->>IPLDStream: Extract child CIDs (DAG_CBOR)
        IPLDStream-->>CLI: Yield CarBlock
        CLI->>CarEncoder: Add block to CAR
    end
    
    CLI->>CarEncoder: Compress frames
    CarEncoder->>FS: Write to temporary file
    CarEncoder->>FS: Persist to output path
    FS-->>CLI: Success
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • feat: forest-tool state compute #6167: Similar CLI subcommand additions that resolve tipsets, interact with the chain database (ManyCar/AnyCar), and export forest CAR snapshots for tipset validation purposes.

Suggested reviewers

  • akaladarshi
  • LesnyRumcajs
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a new forest-dev subcommand for exporting state trees.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch hm/export-state-tree-cmd
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch hm/export-state-tree-cmd

Comment @coderabbitai help to get the list of available commands and usage tips.

@hanabi1224 hanabi1224 force-pushed the hm/export-state-tree-cmd branch 4 times, most recently from 332b428 to 6a19b7c Compare April 10, 2026 02:58
@hanabi1224 hanabi1224 force-pushed the hm/export-state-tree-cmd branch from 6a19b7c to 30263b6 Compare April 10, 2026 06:09
@hanabi1224 hanabi1224 marked this pull request as ready for review April 14, 2026 06:31
@hanabi1224 hanabi1224 requested a review from a team as a code owner April 14, 2026 06:31
@hanabi1224 hanabi1224 requested review from LesnyRumcajs and sudo-shashank and removed request for a team April 14, 2026 06:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/dev/subcommands/export_state_tree_cmd.rs (1)

48-49: Add rustdoc for run.

ExportStateTreeCommand::run is public and newly introduced, so it should have a brief doc comment like the struct does.

As per coding guidelines, "Document public functions and structs with doc comments"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dev/subcommands/export_state_tree_cmd.rs` around lines 48 - 49, Add a
brief rustdoc comment for the public async method ExportStateTreeCommand::run
describing its purpose and behavior (e.g., what running the command does and any
important side effects or return behavior). Place the doc comment immediately
above the fn signature using ///, mirroring the style used for the
ExportStateTreeCommand struct and keeping it concise and informative for public
API consumers.
src/ipld/util.rs (1)

425-440: Document IpldStream and IpldStream::new.

Both are new public APIs, but neither has rustdoc yet. Please add short docs covering traversal order and missing-block behavior.

As per coding guidelines, "Document public functions and structs with doc comments"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipld/util.rs` around lines 425 - 440, Add rustdoc comments for the public
struct IpldStream and its constructor IpldStream::new: document that IpldStream
traverses IPLD nodes in the order provided by the cid_vec/roots (FIFO or
DFS/BFS—state actual traversal used by the implementation), explain how seen:
CidHashSet prevents revisiting nodes, and describe missing-block behavior (e.g.,
whether missing CIDs cause the stream to yield an error, skip, or terminate).
Place the docs directly above the pub struct IpldStream<DB> and above pub fn
new(db: DB, roots: Vec<Cid>) so users know traversal order, dedup semantics, and
how the stream reacts to unavailable blocks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/dev/subcommands/export_state_tree_cmd.rs`:
- Around line 30-45: Add two boolean flags to ExportStateTreeCommand named
message_receipts and events (both #[arg(long)] with default false) so receipts
and event roots are opt-in rather than always included; update the export
invocation code that reads ExportStateTreeCommand to pass these flags into the
exporter/export_state_tree routine so it only includes message_receipts and
events when those flags are true; keep GC snapshot code that currently requires
receipts/events unchanged but explicitly set message_receipts = true and events
= true where snapshots are created for gc (the GC snapshot creator symbol), and
ensure user-facing callers (e.g., the archive/export command symbol) continue to
use the default false values unless the flags are passed.

---

Nitpick comments:
In `@src/dev/subcommands/export_state_tree_cmd.rs`:
- Around line 48-49: Add a brief rustdoc comment for the public async method
ExportStateTreeCommand::run describing its purpose and behavior (e.g., what
running the command does and any important side effects or return behavior).
Place the doc comment immediately above the fn signature using ///, mirroring
the style used for the ExportStateTreeCommand struct and keeping it concise and
informative for public API consumers.

In `@src/ipld/util.rs`:
- Around line 425-440: Add rustdoc comments for the public struct IpldStream and
its constructor IpldStream::new: document that IpldStream traverses IPLD nodes
in the order provided by the cid_vec/roots (FIFO or DFS/BFS—state actual
traversal used by the implementation), explain how seen: CidHashSet prevents
revisiting nodes, and describe missing-block behavior (e.g., whether missing
CIDs cause the stream to yield an error, skip, or terminate). Place the docs
directly above the pub struct IpldStream<DB> and above pub fn new(db: DB, roots:
Vec<Cid>) so users know traversal order, dedup semantics, and how the stream
reacts to unavailable blocks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5b8d4c5b-9ef9-43b6-b9c4-c7ede1b9cac7

📥 Commits

Reviewing files that changed from the base of the PR and between be0d0db and dd259a0.

📒 Files selected for processing (4)
  • docs/docs/users/reference/cli.sh
  • src/dev/subcommands/export_state_tree_cmd.rs
  • src/dev/subcommands/mod.rs
  • src/ipld/util.rs

Comment on lines +30 to +45
pub struct ExportStateTreeCommand {
/// Filecoin network chain (e.g., calibnet, mainnet)
#[arg(long, required = true)]
chain: NetworkChain,
/// Optional path to the database folder
#[arg(long)]
db: Option<PathBuf>,
/// The maximum tipset epoch to export state tree from (Exclusive)
#[arg(long)]
from: ChainEpoch,
/// The minimum tipset epoch to export state tree from (Inclusive)
#[arg(long)]
to: ChainEpoch,
/// The path to the output `ForestCAR` file
#[arg(short, long)]
output: Option<PathBuf>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make receipts and events opt-in instead of unconditional.

This command always pulls message receipts and event roots into the export, but there is no flag to keep them out. That makes the default output far heavier than Forest’s other export paths and turns export-state-tree into a much broader snapshot than the name implies.

Based on learnings, "enable message_receipts and events (message_receipts: true, events: true) only for GC snapshots as defined in src/db/gc/snapshot.rs, since these are internal snapshots created during garbage collection. For user-facing export commands such as src/tool/subcommands/archive_cmd.rs, disable receipts and events by default (message_receipts: false, events: false) to keep user-facing snapshots smaller, unless explicitly requested."

Also applies to: 98-109

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dev/subcommands/export_state_tree_cmd.rs` around lines 30 - 45, Add two
boolean flags to ExportStateTreeCommand named message_receipts and events (both
#[arg(long)] with default false) so receipts and event roots are opt-in rather
than always included; update the export invocation code that reads
ExportStateTreeCommand to pass these flags into the exporter/export_state_tree
routine so it only includes message_receipts and events when those flags are
true; keep GC snapshot code that currently requires receipts/events unchanged
but explicitly set message_receipts = true and events = true where snapshots are
created for gc (the GC snapshot creator symbol), and ensure user-facing callers
(e.g., the archive/export command symbol) continue to use the default false
values unless the flags are passed.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 0% with 89 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.02%. Comparing base (be0d0db) to head (dd259a0).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/dev/subcommands/export_state_tree_cmd.rs 0.00% 64 Missing ⚠️
src/ipld/util.rs 0.00% 24 Missing ⚠️
src/dev/subcommands/mod.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/dev/subcommands/mod.rs 73.52% <0.00%> (-1.10%) ⬇️
src/ipld/util.rs 54.98% <0.00%> (-5.82%) ⬇️
src/dev/subcommands/export_state_tree_cmd.rs 0.00% <0.00%> (ø)

... and 9 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be0d0db...dd259a0. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

if cid.codec() == fvm_ipld_encoding::DAG_CBOR {
let new_cids = extract_cids(&data)?;
if !new_cids.is_empty() {
this.cid_vec.reserve(new_cids.len());
Copy link
Copy Markdown
Member

@LesnyRumcajs LesnyRumcajs Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.cid_vec.reserve(new_cids.len());

Is this useful in any way? I would assume extend would do at most one allocation to accommodate for new_cids, no need to manually reserve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants