F18: fix AHCI CI-level completion drain by ryanbreen · Pull Request #307 · ryanbreen/breenix

ryanbreen · 2026-04-16T15:32:47Z

Summary

Replace AHCI single-shot PORT_IS completion handling with a bounded CI-level drain loop.
Defer slot-0 wake publication until the sampled PORT_IS is acknowledged and the port is stable, preventing the next command from being issued while the prior AHCI interrupt remains asserted.
Document the F18 Linux audit, final 5/5 Parallels sweep, and cleanup recommendation.

Validation

Clean AArch64 build: no warning/error lines in logs/breenix-parallels-cpu0/f18-ahci-ci-loop/build-final.log.
5x ./run.sh --parallels --test 60 serial criteria:
- run1-run5 reached [init] bsshd started (PID 2)
- ahci_timeouts=0
- corruption_markers=0

Note: run.sh exits 1 because the Parallels screenshot helper cannot find the generated VM window; serial logs are the validation source, consistent with previous F-series sweeps.

Audit: the F17 Breenix handler was edge-sensitive. It read PORT_IS, acknowledged PORT_IS/HBA_IS through ack_port_interrupt(), then read PORT_CI once and completed at most the slot implied by that single interrupt-status sample. A completion that cleared PORT_CI around that one-shot status sample could leave PORT_CI clear while no waiter was woken. Linux v6.8 uses the level-sensitive model in drivers/ata/libahci.c: ahci_port_intr() acknowledges PORT_IRQ_STAT, ahci_handle_port_interrupt() delegates command completion to ahci_qc_complete(), and ahci_qc_complete() reads PORT_CMD_ISSUE/PORT_SCR_ACT into qc_active before calling ata_qc_complete_multiple(). That derives completion from hardware-active state rather than relying on a single interrupt edge; SERR/error handling remains separate before normal command completion. Fix: loop each active AHCI port up to eight times, compute completed slots as PORT_ACTIVE_MASK & !PORT_CI, clear active bits atomically, acknowledge sampled PORT_IS, then re-read PORT_IS and PORT_CI and continue if the port reasserted or another active slot has cleared. Slot-0 wake publication is deferred until after the port is stable, preventing the woken waiter from issuing the next command while the prior AHCI interrupt line remains asserted. The existing single-active-slot interrupt fallback is preserved, and CI loop iterations are emitted as AHCI_RING site=CI_LOOP with token=<iteration>. Co-authored-by: Ryan Breen <ryan@ryanbreen.com> Co-authored-by: Claude Code <noreply@anthropic.com>

Five final ./run.sh --parallels --test 60 runs reached bsshd with zero AHCI timeouts and zero corruption markers by serial-log criteria. The run.sh process still exits 1 because the Parallels screenshot helper cannot find the generated VM window, matching prior F-series sweeps where serial output is the validation source. Co-authored-by: Ryan Breen <ryan@ryanbreen.com> Co-authored-by: Claude Code <noreply@anthropic.com>

Record the F18 audit, Linux AHCI reference, CI-level completion fix, final 5/5 Parallels sweep, and cleanup recommendation. The exit report is preserved with the run artifacts for validator handoff. Co-authored-by: Ryan Breen <ryan@ryanbreen.com> Co-authored-by: Claude Code <noreply@anthropic.com>

ryanbreen and others added 3 commits April 16, 2026 11:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

F18: fix AHCI CI-level completion drain#307

F18: fix AHCI CI-level completion drain#307
ryanbreen wants to merge 3 commits intodiagnostic-fix/f17-local-wakefrom
probe/f18-ahci-ci-loop

ryanbreen commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryanbreen commented Apr 16, 2026

Summary

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant