Skip to content

feat: add DNS pre-flight check to verify Docker connectivity before s…#398

Merged
phoenix-server merged 3 commits intocameri:mainfrom
YashIIT0909:fix/dns-preflight-check
Apr 9, 2026
Merged

feat: add DNS pre-flight check to verify Docker connectivity before s…#398
phoenix-server merged 3 commits intocameri:mainfrom
YashIIT0909:fix/dns-preflight-check

Conversation

@YashIIT0909
Copy link
Copy Markdown
Contributor

Add DNS connectivity pre-flight check to relay startup

Description

This PR adds a defensive pre-flight check to ./scripts/start that verifies if Docker containers can correctly resolve external domains before starting the build process.

The implementation:

  1. Attempts to connect to dl-cdn.alpinelinux.org from a lightweight temporary Alpine container.
  2. Includes retry logic (3 attempts with a 2s delay) to avoid false positives from flaky connections.
  3. On total failure, it prints a clear, yellow-formatted warning explaining the likely cause and providing immediate steps to fix it.
  4. Non-blocking: The script does not exit on failure; it only warns the user, as the build might still succeed if layers are cached.

Related Issue

#395

Motivation and Context

Linux users (especially on Arch and Ubuntu) frequently encounter DNS bridge conflicts between Docker and systemd-resolved. This usually results in a cryptic EAI_AGAIN error deep in the docker build logs (during apk add or npm install).

By moving this check to the very beginning of the startup script, we provide users with immediate, actionable feedback, saving them time spent debugging low-level Docker networking issues.

How Has This Been Tested?

  • Failure Path: Manually disabled host networking to confirm the warning triggers with the correct "Suggested fixes" text after 3 retries.
  • Success Path: Confirmed the script proceeds silently (without warning) when connectivity is healthy.
  • Tooling: Switched from nslookup to wget --spider to avoid known BusyBox parsing bugs in Alpine that cause false negatives.
  • Cleanup: Verified that the --rm flag prevents any orphaned containers from being left behind.

Screenshots (if appropriate):

N/A

Types of changes

  • Non-functional change (docs, style, minor refactor)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my code changes.
  • All new and existing tests passed.

scripts/start Outdated
Comment on lines +25 to +31
for i in 1 2 3; do
if docker run --rm alpine wget --spider --timeout=5 https://dl-cdn.alpinelinux.org > /dev/null 2>&1; then
DNS_OK=true
break
fi
[ "$i" -lt 3 ] && sleep 2
done
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add some logging here to indicate what is happening to the user?

And can we also set an exponential backoff on the sleep command so it waits 2s the first time, 4s, the second time, and then 8s? Any thoughts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that exponential backoff is a more robust way to handle the bridge stabilization period-I’ll implement the 2s/4s/8s sequence.
Regarding the logs, I’ll add the required logging to indicate the status of each attempt.
I'll push the updated code shortly.

Comment on lines +33 to +46
if [ "$DNS_OK" = false ]; then
echo ""
echo -e "\033[1;33m WARNING: Docker DNS resolution failed after 3 attempts.\033[0m"
echo -e "\033[0;33m Containers cannot resolve external domains (e.g. dl-cdn.alpinelinux.org)."
echo " This is commonly caused by a DNS bridge conflict with systemd-resolved."
echo ""
echo " Suggested fixes:"
echo " 1. Add DNS to /etc/docker/daemon.json:"
echo ' { "dns": ["8.8.8.8", "8.8.4.4"] }'
echo " 2. Then run sudo systemctl restart docker"
echo ""
echo -e " The build will continue, but may fail during package installation.\033[0m"
echo ""
fi
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this message go to stderr?
should we use HEREDOC for the multiline message?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great points. Sending the warning to stderr is definitely better practice for diagnostic messages, and using a HEREDOC would be a cleaner approach . I'll refactor the block using cat <&2 to address both. Pushing the update shortly !

@YashIIT0909 YashIIT0909 requested a review from cameri April 9, 2026 10:09
@phoenix-server phoenix-server merged commit e8d4aae into cameri:main Apr 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants