Skip to content

fix: Launchers - Kubernetes - Jobs - Fixed LaunchedKubernetesJob.ended_at plus Set ExecutionNode.ended_at when SYSTEM_ERROR occurs in Orchestrator#200

Merged
yuechao-qin merged 1 commit intomasterfrom
ycq/fix-only-multinode-issues
Apr 9, 2026
Merged

fix: Launchers - Kubernetes - Jobs - Fixed LaunchedKubernetesJob.ended_at plus Set ExecutionNode.ended_at when SYSTEM_ERROR occurs in Orchestrator#200
yuechao-qin merged 1 commit intomasterfrom
ycq/fix-only-multinode-issues

Conversation

@yuechao-qin
Copy link
Copy Markdown
Collaborator

@yuechao-qin yuechao-qin commented Apr 7, 2026

TL;DR

Closes https://github.com/Shopify/oasis-frontend/issues/542

Fixed Kubernetes job completion detection and added missing timestamp for failed container executions.

What changed?

  • Changed Kubernetes job condition type from "Succeeded" to "Complete" in the ended_at method to properly detect when jobs finish
  • Added ended_at timestamp assignment when marking container executions as SYSTEM_ERROR in the orchestrator

Why make this change?

The Kubernetes API uses "Complete" rather than "Succeeded" as the condition type for finished jobs, so the previous code was likely not detecting job completions correctly. Additionally, failed executions were missing proper timestamp tracking, which is important for monitoring and debugging execution lifecycles.

@yuechao-qin yuechao-qin marked this pull request as ready for review April 7, 2026 22:45
@yuechao-qin yuechao-qin requested a review from Ark-kun as a code owner April 7, 2026 22:45
@Ark-kun Ark-kun changed the title fix: Multinode ended_at and Complete status fix: Launchers - Kubernetes - Jobs - Fixed detecting job completion and adding missingended_at Apr 9, 2026
@Ark-kun Ark-kun changed the title fix: Launchers - Kubernetes - Jobs - Fixed detecting job completion and adding missingended_at fix: Launchers - Kubernetes - Jobs - Fixed detecting job completion plus Setting ExecutionNode.ended_at when SYSTEM_ERROR occurs Apr 9, 2026
@Ark-kun Ark-kun changed the title fix: Launchers - Kubernetes - Jobs - Fixed detecting job completion plus Setting ExecutionNode.ended_at when SYSTEM_ERROR occurs fix: Launchers - Kubernetes - Jobs - Fixed detecting job completion plus Setting ExecutionNode.ended_at when SYSTEM_ERROR occurs in Orchestrator Apr 9, 2026
@Ark-kun Ark-kun changed the title fix: Launchers - Kubernetes - Jobs - Fixed detecting job completion plus Setting ExecutionNode.ended_at when SYSTEM_ERROR occurs in Orchestrator fix: Launchers - Kubernetes - Jobs - Fixed LaunchedKubernetesJob.ended_at plus Set ExecutionNode.ended_at when SYSTEM_ERROR occurs in Orchestrator Apr 9, 2026
@yuechao-qin yuechao-qin merged commit 75bda80 into master Apr 9, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants