Skip to content

feat: Add disk usage percentage and warn on high usage#57

Queued
lfrancke wants to merge 12 commits intomainfrom
feat/disk-usage-percent
Queued

feat: Add disk usage percentage and warn on high usage#57
lfrancke wants to merge 12 commits intomainfrom
feat/disk-usage-percent

Conversation

@lfrancke
Copy link
Copy Markdown
Member

@lfrancke lfrancke commented Mar 30, 2026

Summary

  • Add usage_percent field to disk collection output, calculated as (total - available) / total * 100
  • Log at WARN level when any disk exceeds 85% usage, making it easy to spot in Graylog/Vector

Motivated by a customer running out of space on an attached PVC over the weekend.

Test plan

  • cargo test --all-features passes
  • cargo clippy with RUSTFLAGS="-D warnings" passes

lfrancke and others added 11 commits March 30, 2026 10:45
The tracing statement for `user.gid` was reading from `user.uid`
instead of `user.gid`, causing the wrong value to be reported.
Replace `into_iter().next().is_none()` with `list().is_empty()`
for clarity, and use `list().iter()` for the actual collection.
This was likely a debugging leftover — the error source chain is
already captured via the `successors` iterator below.
JSON serialization and file write can fail at runtime (e.g. disk
full). Log the error and continue the loop instead of crashing,
since this tool may run continuously for hours.
std::thread::sleep blocks the entire tokio worker thread.
Since main is already async, use the non-blocking alternative.
In a container debugging tool, broken DNS config (/etc/resolv.conf)
is a likely scenario to diagnose. Log the error and skip DNS lookups
instead of panicking.
…andling

The network collector silently swallowed interface listing errors by
returning empty data. Now it returns Result so the orchestrator wraps
it in ComponentResult, matching the pattern used by other fallible
collectors. Errors appear in JSON output instead of being silently
lost.
HashMap produces non-deterministic JSON output, making it hard to
diff containerdebug output across runs. BTreeMap sorts keys
consistently.
Add `usage_percent` field to disk collection output. When a disk
exceeds 85% usage, log at WARN level instead of INFO so it stands
out in log aggregation systems.
@lfrancke lfrancke moved this to Development: Waiting for Review in Stackable Engineering Mar 30, 2026
@lfrancke lfrancke self-assigned this Mar 30, 2026
@sbernauer sbernauer self-requested a review March 31, 2026 06:58
@sbernauer sbernauer moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Mar 31, 2026
@lfrancke lfrancke requested a review from sbernauer April 9, 2026 10:28
@lfrancke lfrancke added this pull request to the merge queue Apr 9, 2026
Any commits made after this event will not be merged.
@lfrancke lfrancke moved this from Development: In Review to Development: Done in Stackable Engineering Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Development: Done

Development

Successfully merging this pull request may close these issues.

2 participants