Skip to content

Add Queue.workerLifetimeJitter to stagger worker shutdowns#476

Open
dereuromark wants to merge 2 commits intomasterfrom
feat-worker-lifetime-jitter
Open

Add Queue.workerLifetimeJitter to stagger worker shutdowns#476
dereuromark wants to merge 2 commits intomasterfrom
feat-worker-lifetime-jitter

Conversation

@dereuromark
Copy link
Copy Markdown
Owner

Summary

Adds an optional Queue.workerLifetimeJitter config (seconds). Each worker picks a random offset in [0, workerLifetimeJitter] at startup and adds it to its effective workerLifetime / --max-runtime. When a fleet of workers is spawned at the same instant (ECS tasks, Kubernetes deployments, systemd unit with many instances), this prevents every worker from terminating on the same tick and producing a thundering-herd of simultaneous restarts.

Defaults to 0, so behavior is unchanged unless the option is set.

// config/app.php
$config['Queue']['workerLifetime'] = 300;
$config['Queue']['workerLifetimeJitter'] = 30; // workers now exit between 300s and 330s

Credit

Idea and original implementation by Rommel Penaflor (@xrompdev) in #475, where it was proposed against a legacy CakePHP 2.x Symphosize fork and therefore could not be merged directly. This PR is a fresh port to the modern Queue\Queue\Processor, keeping the operational intent intact.

Implementation notes

  • Jitter is computed once per worker (right after $startTime = time() in Processor::run()), not re-rolled each loop iteration, so the exit time is stable per worker.
  • Extracted into Processor::computeLifetimeJitterOffset() so the bounds/default behavior is unit-testable without spinning up the full run loop.
  • Only applied when $maxRuntime > 0 — unlimited workers stay unlimited.
  • <= 0 jitter values are ignored (returns 0), so a misconfigured negative value is a no-op rather than an error.
  • If jitter was applied, the worker logs Applying worker lifetime jitter: +Ns seconds so operators can see the stagger in action.

Docs

Added a dedicated bullet in docs/sections/configuration.md directly after the workerLifetime section, explaining the ECS/K8s use case.

Each worker picks a random offset in [0, workerLifetimeJitter] at startup
and adds it to its effective lifetime. Prevents thundering-herd restarts
when a fleet of workers (e.g. ECS/Kubernetes tasks) is spawned at the
same instant and would otherwise all exit on the same tick.

Defaults to 0 (no jitter), preserving existing behavior.

Idea and original implementation (against a legacy CakePHP 2.x fork) by
Rommel Penaflor (@xrompdev) in #475; ported to the modern processor.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 22, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.22%. Comparing base (5410512) to head (781c3c3).

Files with missing lines Patch % Lines
src/Queue/Processor.php 80.00% 2 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff            @@
##             master     #476   +/-   ##
=========================================
  Coverage     77.22%   77.22%           
- Complexity      949      954    +5     
=========================================
  Files            45       45           
  Lines          3196     3219   +23     
=========================================
+ Hits           2468     2486   +18     
- Misses          728      733    +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dereuromark dereuromark marked this pull request as ready for review April 22, 2026 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants