Skip to content

Read the RGS sync timestamp from the network graph#881

Open
tnull wants to merge 2 commits intolightningdevkit:mainfrom
tnull:2026-04-fix-615
Open

Read the RGS sync timestamp from the network graph#881
tnull wants to merge 2 commits intolightningdevkit:mainfrom
tnull:2026-04-fix-615

Conversation

@tnull
Copy link
Copy Markdown
Collaborator

@tnull tnull commented Apr 20, 2026

Fixes #615.

Previously, after each successful Rapid Gossip Sync update the
background task wrote `latest_rgs_snapshot_timestamp` to the persisted
`NodeMetrics` immediately, while the network graph itself is only
flushed to disk later by LDK's background processor. A crash in that
window left the on-disk metric ahead of the on-disk graph — on restart
we'd resume RGS from the newer timestamp and permanently skip the
updates that were never persisted together with the graph.

Instead, seed the RGS start timestamp from
`NetworkGraph::get_last_rapid_gossip_sync_timestamp`, which is part of
the graph's own serialized state and therefore lands on disk atomically
with the channel updates it describes. The same source now backs the
RGS timestamp reported via `NodeStatus::latest_rgs_snapshot_timestamp`,
so the reported value always matches what's reflected in the graph.
Worst case after a crash is that we refetch the snapshots since the
last persisted graph — an idempotent operation — rather than silently
losing them.

The `latest_rgs_snapshot_timestamp` field is retired from `NodeMetrics`,
and TLV slot 6 is kept readable for backwards compatibility via LDK's
`legacy` TLV grammar. Old persisted records still deserialize; new
records no longer carry slot 6. The dead "reset RGS timestamp on
gossip-source switch" block in the P2P builder branch also goes away,
since the graph's timestamp remains the correct resume point across a
P2P→RGS switch.

As a slightly related prefactor, we also move NodeMetrics persistence to a new update_and_persist_node_metrics helper.

tnull added 2 commits April 20, 2026 13:44
Extract the repeated "acquire write lock on `node_metrics`, mutate a
field or two, then write the encoded struct to the kv-store" idiom into
a single helper in `io::utils`. As a side effect, `write_node_metrics`
is inlined into the helper.

Co-Authored-By: HAL 9000
Previously, after each successful Rapid Gossip Sync update the
background task wrote `latest_rgs_snapshot_timestamp` to the persisted
`NodeMetrics` immediately, while the network graph itself is only
flushed to disk later by LDK's background processor. A crash in that
window left the on-disk metric ahead of the on-disk graph — on restart
we'd resume RGS from the newer timestamp and permanently skip the
updates that were never persisted together with the graph.

Instead, seed the RGS start timestamp from
`NetworkGraph::get_last_rapid_gossip_sync_timestamp`, which is part of
the graph's own serialized state and therefore lands on disk atomically
with the channel updates it describes. The same source now backs the
RGS timestamp reported via `NodeStatus::latest_rgs_snapshot_timestamp`,
so the reported value always matches what's reflected in the graph.
Worst case after a crash is that we refetch the snapshots since the
last persisted graph — an idempotent operation — rather than silently
losing them.

The `latest_rgs_snapshot_timestamp` field is retired from `NodeMetrics`,
and TLV slot 6 is kept readable for backwards compatibility via LDK's
`legacy` TLV grammar. Old persisted records still deserialize; new
records no longer carry slot 6. The dead "reset RGS timestamp on
gossip-source switch" block in the P2P builder branch also goes away,
since the graph's timestamp remains the correct resume point across a
P2P→RGS switch.

Co-Authored-By: HAL 9000
@tnull tnull requested a review from TheBlueMatt April 20, 2026 11:45
@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented Apr 20, 2026

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@tnull tnull added this to the 0.8 milestone Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ill-timed crash can cause a missed RGS update

2 participants