Skip to content

fix: filter correct disk image during boot disk selection#110

Open
aditsharma55 wants to merge 1 commit intomainfrom
BREV-8794/fix-bootdisk-image-selection
Open

fix: filter correct disk image during boot disk selection#110
aditsharma55 wants to merge 1 commit intomainfrom
BREV-8794/fix-bootdisk-image-selection

Conversation

@aditsharma55
Copy link
Copy Markdown
Contributor

Summary

  • Fix boot disk image selection that was picking k8s worker-node images (e.g. worker-node-v-1-33-ubuntu24.04-cuda12.8) instead of instance images (e.g. ubuntu24.04-cuda13.0) due to Nebius API pagination and non-deterministic ordering
  • Replace the old first-match image selection with a score-based system that evaluates all images across all pages and picks the highest-scored one, ensuring ubuntu24+cuda13 is always preferred over worker-node images for default deployments
  • Remove iptables-persistent / netfilter-persistent from cloud-init since the correct instance image (ubuntu24.04-cuda13.0) does not ship with netfilter-persistent, and the previous cloud-init commands were causing failures (sudo: netfilter-persistent: command not found)

What changed

Image selection — pagination fix:

The old code used Image().List() which only returned the first page of results. With a small default page size, ubuntu24.04-cuda13.0 could be omitted entirely. Replaced with Image().Filter() which auto-paginates via the SDK iterator.

Image selection — score-based ranking:

The old if/else matching was order-dependent and first-match-wins. The new approach scores every non-ARM64 image using a tiered system

Cloud-init cleanup:

Removed iptables-persistent package, the netfilter-persistent.service systemd ordering drop-in, and the netfilter-persistent save command. These were added for a previous image that shipped with netfilter-persistent pre-installed; the current image does not have it, and these commands caused cloud-init to fail.

Relevant Linear Ticket:

https://linear.app/nvidia/issue/BRE2-901/issue-when-deploying-nebius-h100

@aditsharma55 aditsharma55 requested a review from a team as a code owner April 17, 2026 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant