Skip to content

Restart HeartbeatProcess on transient pipe errors#391

Merged
ianks merged 2 commits intomainfrom
ianks/heartbeat-process-restart-on-pipe-error
Apr 8, 2026
Merged

Restart HeartbeatProcess on transient pipe errors#391
ianks merged 2 commits intomainfrom
ianks/heartbeat-process-restart-on-pipe-error

Conversation

@ianks
Copy link
Copy Markdown
Contributor

@ianks ianks commented Apr 7, 2026

Previously, if the HeartbeatProcess monitor subprocess exited unexpectedly -- due to a transient Redis connection error, OOM killer, or OS signal -- the next @pipe.write in send_message would raise IOError: closed stream or Errno::EPIPE. Since the heartbeat thread runs with abort_on_exception = true, this killed the worker mid-run, causing tests to be skipped and connections not returned to the pool.

HeartbeatProcess#tick! now rescues those errors, restarts the subprocess via a new restart! method, and retries. It allows up to MAX_RESTART_ATTEMPTS (3) consecutive failures before re-raising, and resets the counter on each successful tick so transient errors don't exhaust the budget permanently.

Also adds mocha as a dev dependency for the new unit tests.

ianks added 2 commits April 7, 2026 16:55
When the monitor subprocess exits unexpectedly (e.g. due to a transient
network/Redis error), @pipe.write raises IOError or Errno::EPIPE. With
abort_on_exception=true on the heartbeat thread, this killed the entire
worker mid-run, causing tests to be skipped.

HeartbeatProcess#tick! now catches these errors, restarts the subprocess
via a new restart! method, and retries up to MAX_RESTART_ATTEMPTS (3)
consecutive failures before re-raising. The counter resets on each
successful tick so transient errors don't exhaust the budget permanently.

Also adds mocha as a dev dependency and a dedicated test file covering
retry, restart, max-attempts, and counter-reset behaviour.
@ianks ianks merged commit b2dd54a into main Apr 8, 2026
22 checks passed
@ianks ianks deleted the ianks/heartbeat-process-restart-on-pipe-error branch April 8, 2026 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants