Fix start_date not restored for rescheduled tasks when scheduler queu…#64816
Open
peachchen0716 wants to merge 6 commits intoapache:mainfrom
Open
Fix start_date not restored for rescheduled tasks when scheduler queu…#64816peachchen0716 wants to merge 6 commits intoapache:mainfrom
peachchen0716 wants to merge 6 commits intoapache:mainfrom
Conversation
peachchen0716
pushed a commit
to peachchen0716/airflow
that referenced
this pull request
Apr 6, 2026
added 2 commits
April 6, 2026 20:56
…es them In _check_and_change_state_before_execution, the code that restores TaskInstance.start_date to the original first-poke time was gated on ti.state == UP_FOR_RESCHEDULE. In the normal scheduler flow the scheduler advances state to QUEUED before the worker picks up the task, so ti.refresh_from_db() returns QUEUED and the guard never fires. This causes start_date to be reset to utcnow() on every re-execution, inflating the dagrun.first_task_start_delay and dagrun.first_task_scheduling_delay metrics by the full reschedule wait time. Replace the state guard with an unconditional TaskReschedule lookup scoped to the current try_number. The query returns None for non-rescheduled tasks so behavior is unchanged in the normal case; for rescheduled tasks it correctly restores start_date from the first poke regardless of whether state is UP_FOR_RESCHEDULE or QUEUED at execution time.
dcc4316 to
881928c
Compare
The supervisor always sends start_date=utcnow() when calling ti_run to mark a task as RUNNING. For sensors in reschedule mode this overwrote the original start_date on every re-poke, inflating the dagrun.first_task_scheduling_delay metric by the full reschedule wait. The fix mirrors the existing deferral guard (next_kwargs): if a TaskReschedule record exists for the TI, restore start_date from the first record instead of accepting the supervisor's utcnow() value. Also fix the newsfragment which referenced a non-existent metric name (dagrun.first_task_start_delay) — the real metric is dagrun.first_task_scheduling_delay. Verified in Breeze: start_date stayed fixed across all reschedule pokes, confirmed stable through to SUCCESS.
881928c to
868f047
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a sensor runs in
reschedulemode, the supervisor sendsstart_date=utcnow()on every poke. Theti_runexecution API endpoint applied this value unconditionally, resettingstart_dateon each re-poke. This inflated thedagrun.first_task_scheduling_delaymetric by including reschedule wait time.A guard already existed for deferred tasks (
if ti.next_kwargs: data.pop("start_date")), but no equivalent existed for rescheduled tasks.Fix
Added a reschedule guard in
ti_run(execution_api/routes/task_instances.py): whenstart_dateis present in the update payload and the task has priorTaskReschedulerecords, the originalstart_datefrom the first reschedule entry is restored instead of accepting the supervisor'sutcnow()value.Also fixed
_check_and_change_state_before_execution(used in test utilities only) to preservestart_datefor rescheduled tasks the same way.Testing
test_ti_run_restores_start_date_for_rescheduled_task— verifies the production path (ti_run) restoresstart_datefromTaskRescheduleon a subsequent pokeBreeze verification
Triggered
verify_reschedule_start_dateDAG (reschedule-modePythonSensor, poke every 10 s) and observedti.start_dateacross three pokes:2025-07-14T10:00:00Z2025-07-14T10:00:00Z2025-07-14T10:00:21Z2025-07-14T10:00:00Z2025-07-14T10:00:42Z2025-07-14T10:00:00ZBefore the fix
start_datedrifted on every poke; after the fix it stays at the first-poke value.Was generative AI tooling used to co-author this PR?
Generated-by: Claude Sonnet 4.6 following the guidelines