Auto-clean task worktrees and monitor disk pressure by bborn · Pull Request #535 · bborn/taskyou

bborn · 2026-04-20T16:44:03Z

Summary

Today's incident (2026-04-20) filled the agent server's disk to 100% after ~13 Boston reel iterations, causing SQLite I/O errors and a stuck task in a retry loop. Root cause: each TaskYou worktree is ~500MB (node_modules + binary outputs), and the default cleanup grace period was 7 days with an hourly check.

This PR tightens cleanup and adds disk-pressure handling so the daemon can't silently fill disk again.

Grace period 7 days → 24 hours (DefaultWorktreeCleanupMaxAge). Satisfies the "max 24h" acceptance criterion.
Stale cleanup pass 1h → 10 min. Tasks past the grace period are reclaimed within minutes.
Cleanup trigger on close. Status change to done/archived kicks off cleanupStaleWorktrees immediately. Respects the grace period; valuable when users set it to 0 for immediate removal.
Disk monitoring via syscall.Statfs. New GetDiskStats, thresholds DefaultDiskWarnPercent=85, DefaultDiskRefusePercent=90, overridable via disk_warn_percent / disk_refuse_percent settings.
Worker samples disk every 5 min. Above warn → logs. Above refuse → force-cleans stale worktrees ignoring grace period.
Refuse new tasks when disk >= refuse threshold. Tasks move to blocked with a clear message (pointing to ty worktrees cleanup --max-age 0) instead of erroring silently.
New ty worktrees disk command to inspect usage and thresholds.

Test plan

go build ./... clean
go vet ./... clean
go test ./... all suites pass
New internal/executor/disk_test.go covers Statfs success, missing path, threshold refusal logic, and config fallback
Regression test guards against raising DefaultWorktreeCleanupMaxAge above 48h

🤖 Generated with Claude Code

- Reduce DefaultWorktreeCleanupMaxAge from 7 days to 24 hours so closed task worktrees are reclaimed before a batch of reel-heavy tasks can fill the disk. - Run the stale-worktree cleanup every 10 minutes (was 1 hour) so tasks that cross the grace period are reclaimed within minutes, not an hour. - Trigger cleanup immediately when a task transitions to done/archived. No-op under the default grace period, but reclaims disk fast when a user sets worktree_cleanup_max_age to a short value or zero. - Add GetDiskStats (syscall.Statfs) plus DefaultDiskWarnPercent (85%) and DefaultDiskRefusePercent (90%) constants. New settings disk_warn_percent and disk_refuse_percent allow overrides. - The executor now samples disk usage every 5 minutes. Above the warn threshold it logs a warning; above refuse it force-cleans stale worktrees ignoring the grace period. - Before starting new tasks, refuse and move them to blocked with a clear message when disk >= refuse threshold. Prevents the silent SQLite I/O errors that jammed the daemon on 2026-04-20. - Expose `ty worktrees disk` to view current usage and thresholds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-clean task worktrees and monitor disk pressure#535

Auto-clean task worktrees and monitor disk pressure#535
bborn wants to merge 1 commit intomainfrom
task/2455-taskyou-worktrees-dont-auto-clean-filled

bborn commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bborn commented Apr 20, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant