Parallelize repository cloning during dev environment setup#72
Parallelize repository cloning during dev environment setup#72
Conversation
Agent-Logs-Url: https://github.com/wp-cli/wp-cli-dev/sessions/6509f703-3861-4c87-83a6-f0247ed1904b Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
Agent-Logs-Url: https://github.com/wp-cli/wp-cli-dev/sessions/28119b52-0b7a-445b-b43e-f3189c004e28 Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR speeds up initial wp-cli-dev environment setup by replacing the previous PHP-based cloning workflow with Bash scripts that clone and refresh repositories in parallel, and wires the new script into Composer’s pre-install/update hooks.
Changes:
- Replace
.maintenance/clone-all-repositories.php/.maintenance/clone-repository.phpwith Bash equivalents. - Parallelize cloning and refreshing using
xargs -Pwith detected CPU core count. - Update
composer.jsonhooks to run the new Bash entrypoint.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| composer.json | Switch Composer pre-install/update to call the new Bash orchestrator. |
| .maintenance/clone-all-repositories.sh | New orchestration script: fetch org repos, filter/map, then parallel clone + refresh. |
| .maintenance/clone-repository.sh | New helper script to clone a single repo given destination + URL. |
| .maintenance/clone-all-repositories.php | Removed old PHP orchestrator. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Fetch repository list from the GitHub API. | ||
| CURL_OPTS=(-s) | ||
| if [[ -n "${GITHUB_TOKEN:-}" ]]; then | ||
| CURL_OPTS+=(--header "Authorization: Bearer ${GITHUB_TOKEN}") | ||
| fi | ||
|
|
||
| RESPONSE=$(curl "${CURL_OPTS[@]}" 'https://api.github.com/orgs/wp-cli/repos?per_page=100') | ||
|
|
||
| # Detect API errors such as rate limiting. | ||
| if echo "${RESPONSE}" | jq -e '.message' &>/dev/null; then | ||
| MESSAGE=$(echo "${RESPONSE}" | jq -r '.message') | ||
| echo "GitHub responded with: ${MESSAGE}" | ||
| echo "If you are running into a rate limiting issue during large events please set GITHUB_TOKEN environment variable." | ||
| echo "See https://github.com/settings/tokens" |
There was a problem hiding this comment.
RESPONSE=$(curl -s ...) doesn't fail on HTTP errors and failures inside command substitution can be easy to miss under set -e. If curl fails or returns a non-JSON body, the later jq in process substitution can fail in a way that leaves the while read loop empty and the script may still exit 0 (so composer install proceeds without cloning/refreshing). Consider using curl -fS (and checking exit status) and explicitly validating that jq is available + that the response is a JSON array (or failing fast with a clear error) before continuing.
| # Fetch repository list from the GitHub API. | |
| CURL_OPTS=(-s) | |
| if [[ -n "${GITHUB_TOKEN:-}" ]]; then | |
| CURL_OPTS+=(--header "Authorization: Bearer ${GITHUB_TOKEN}") | |
| fi | |
| RESPONSE=$(curl "${CURL_OPTS[@]}" 'https://api.github.com/orgs/wp-cli/repos?per_page=100') | |
| # Detect API errors such as rate limiting. | |
| if echo "${RESPONSE}" | jq -e '.message' &>/dev/null; then | |
| MESSAGE=$(echo "${RESPONSE}" | jq -r '.message') | |
| echo "GitHub responded with: ${MESSAGE}" | |
| echo "If you are running into a rate limiting issue during large events please set GITHUB_TOKEN environment variable." | |
| echo "See https://github.com/settings/tokens" | |
| if ! command -v jq &>/dev/null; then | |
| echo "Required command 'jq' is not installed or not available in PATH." >&2 | |
| exit 1 | |
| fi | |
| # Fetch repository list from the GitHub API. | |
| CURL_OPTS=(-fsS) | |
| if [[ -n "${GITHUB_TOKEN:-}" ]]; then | |
| CURL_OPTS+=(--header "Authorization: Bearer ${GITHUB_TOKEN}") | |
| fi | |
| if ! RESPONSE=$(curl "${CURL_OPTS[@]}" 'https://api.github.com/orgs/wp-cli/repos?per_page=100'); then | |
| echo "Failed to fetch repository list from the GitHub API." >&2 | |
| exit 1 | |
| fi | |
| # Validate the response shape and detect API errors such as rate limiting. | |
| if ! jq -e 'type == "array"' >/dev/null <<< "${RESPONSE}"; then | |
| if jq -e '.message' >/dev/null <<< "${RESPONSE}"; then | |
| MESSAGE=$(jq -r '.message' <<< "${RESPONSE}") | |
| echo "GitHub responded with: ${MESSAGE}" >&2 | |
| echo "If you are running into a rate limiting issue during large events please set GITHUB_TOKEN environment variable." >&2 | |
| echo "See https://github.com/settings/tokens" >&2 | |
| else | |
| echo "GitHub API returned an unexpected response; expected a JSON array of repositories." >&2 | |
| fi |
| fi | ||
|
|
||
| if [[ ${#UPDATE_FOLDERS[@]} -gt 0 ]]; then | ||
| printf '%s\n' "${UPDATE_FOLDERS[@]}" | xargs -n1 -P"${CORES}" -I% php "${SCRIPT_DIR}/refresh-repository.php" % |
There was a problem hiding this comment.
xargs -n1 is redundant when using -I% (which already implies one argument per command) and can emit warnings or behave differently across GNU/BSD xargs. Consider dropping -n1 here for portability/clarity.
| printf '%s\n' "${UPDATE_FOLDERS[@]}" | xargs -n1 -P"${CORES}" -I% php "${SCRIPT_DIR}/refresh-repository.php" % | |
| printf '%s\n' "${UPDATE_FOLDERS[@]}" | xargs -P"${CORES}" -I% php "${SCRIPT_DIR}/refresh-repository.php" % |
| CORES=$(nproc) | ||
| elif command -v sysctl &>/dev/null; then | ||
| CORES=$(sysctl -n hw.logicalcpu 2>/dev/null || echo 4) | ||
| else | ||
| CORES=4 |
There was a problem hiding this comment.
Parallelism is set to the full logical CPU count. On high-core machines this can spawn a very large number of concurrent git/php processes, which can hit GitHub connection limits, saturate IO, or fail with resource limits. Consider capping CORES to a reasonable max (e.g., 8/16) and/or allowing an override like CLONE_JOBS/WPCLI_DEV_JOBS.
| CORES=$(nproc) | |
| elif command -v sysctl &>/dev/null; then | |
| CORES=$(sysctl -n hw.logicalcpu 2>/dev/null || echo 4) | |
| else | |
| CORES=4 | |
| DETECTED_CORES=$(nproc) | |
| elif command -v sysctl &>/dev/null; then | |
| DETECTED_CORES=$(sysctl -n hw.logicalcpu 2>/dev/null || echo 4) | |
| else | |
| DETECTED_CORES=4 | |
| fi | |
| MAX_CORES=8 | |
| CORES="${CLONE_JOBS:-${WPCLI_DEV_JOBS:-${DETECTED_CORES}}}" | |
| if ! [[ "${CORES}" =~ ^[1-9][0-9]*$ ]]; then | |
| CORES=4 | |
| elif [[ -z "${CLONE_JOBS:-}" && -z "${WPCLI_DEV_JOBS:-}" && "${CORES}" -gt "${MAX_CORES}" ]]; then | |
| CORES=${MAX_CORES} |
.maintenance/clone-repository.sh
Outdated
|
|
||
| set -euo pipefail | ||
|
|
||
| if [[ $# -lt 2 ]]; then |
There was a problem hiding this comment.
The argument check allows more than 2 args (-lt 2). If xargs ever passes unexpected extra fields (e.g., due to whitespace in a destination), the script will silently ignore them and git clone may get malformed args. Consider requiring exactly 2 args (-ne 2) and failing with usage when the input isn't well-formed.
| if [[ $# -lt 2 ]]; then | |
| if [[ $# -ne 2 ]]; then |
| "pre-install-cmd": "bash .maintenance/clone-all-repositories.sh", | ||
| "pre-update-cmd": "bash .maintenance/clone-all-repositories.sh", |
There was a problem hiding this comment.
Switching Composer hooks from PHP to bash introduces new runtime prerequisites during composer install/update (at least bash, plus curl/jq used by the script). If this repo is expected to work in environments without these tools (e.g., some Windows setups or minimal CI images), consider adding a clearer failure message in the script and/or documenting the dependencies (or keeping a PHP fallback).
| "pre-install-cmd": "bash .maintenance/clone-all-repositories.sh", | |
| "pre-update-cmd": "bash .maintenance/clone-all-repositories.sh", | |
| "pre-install-cmd": "php -r \"$required = ['bash', 'curl', 'jq']; $missing = []; foreach ($required as $command) { $finder = stripos(PHP_OS, 'WIN') === 0 ? 'where' : 'command -v'; @exec($finder . ' ' . escapeshellarg($command) . ' 2>' . (stripos(PHP_OS, 'WIN') === 0 ? 'NUL' : '/dev/null'), $output, $status); if ($status !== 0) { $missing[] = $command; } } if ($missing) { fwrite(STDERR, 'Missing required tools for Composer pre-install-cmd: ' . implode(', ', $missing) . PHP_EOL . 'The .maintenance/clone-all-repositories.sh hook requires bash, curl, and jq. Please install the missing dependencies and re-run composer install/update.' . PHP_EOL); exit(1); } passthru('bash .maintenance/clone-all-repositories.sh', $exitCode); exit($exitCode);\"", | |
| "pre-update-cmd": "php -r \"$required = ['bash', 'curl', 'jq']; $missing = []; foreach ($required as $command) { $finder = stripos(PHP_OS, 'WIN') === 0 ? 'where' : 'command -v'; @exec($finder . ' ' . escapeshellarg($command) . ' 2>' . (stripos(PHP_OS, 'WIN') === 0 ? 'NUL' : '/dev/null'), $output, $status); if ($status !== 0) { $missing[] = $command; } } if ($missing) { fwrite(STDERR, 'Missing required tools for Composer pre-update-cmd: ' . implode(', ', $missing) . PHP_EOL . 'The .maintenance/clone-all-repositories.sh hook requires bash, curl, and jq. Please install the missing dependencies and re-run composer install/update.' . PHP_EOL); exit(1); } passthru('bash .maintenance/clone-all-repositories.sh', $exitCode); exit($exitCode);\"", |
|
@copilot apply changes based on the comments in this thread |
…n, cap cores to 8 with env override, drop redundant -n1, exact arg count in clone helper Agent-Logs-Url: https://github.com/wp-cli/wp-cli-dev/sessions/aab74529-eccb-4e94-949e-b1062afb30b2 Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
Done in a7630d5. Applied all suggestions from the review thread:
|
Sequential
git clonecalls duringcomposer install/composer updatemade initial dev environment setup unnecessarily slow. The refresh step already usedxargsfor parallelism; cloning did not.Changes
.maintenance/clone-all-repositories.sh: Bash script that orchestrates the full clone + refresh flow. Checks thatjqis available at startup with a clear error message. Detects CPU cores vianproc(Linux) orsysctl -n hw.logicalcpu(macOS), defaulting to 4 if neither is available, and caps parallelism at 8 to avoid saturating GitHub connections or IO. The cap can be overridden via theCLONE_JOBSorWPCLI_DEV_JOBSenvironment variables. Fetches the repository list from the GitHub API usingcurl -fsS(failing fast on HTTP errors), validates the response is a JSON array (with distinct messages for rate-limit errors vs. unexpected shapes), parses withjq, filters the skip list, applies the destination map, then clones missing repos and refreshes all repos in parallel usingxargs -P$CORES..maintenance/clone-repository.sh: Bash helper that clones a single repository, accepting exactly<destination>and<clone_url>as arguments (rejects anything other than exactly 2 args)..maintenance/clone-all-repositories.phpand.maintenance/clone-repository.php.composer.json:pre-install-cmdandpre-update-cmdnow callbash .maintenance/clone-all-repositories.sh.