An example of a worker-critic agentic workflow
prompts/generate-master-figure.md: base prompt for figure generation.prompts/figma-addendum.md: additive instructions for the Figma-native baseline variant.prompts/figma-external-review-addendum.md: additive instructions for the Figma-native external-review variant.prompts/critic-review-addendum.md: additive instructions for the reviewed variant.prompts/external-review-addendum.md: additive instructions for the external-review variant.prompts/generate-master-figure-with-figma.md: generated prompt equal to the base prompt plus the Figma addendum.prompts/generate-master-figure-with-figma-external-review.md: generated prompt equal to the base prompt plus the Figma addendum plus the Figma external-review addendum.prompts/generate-master-figure-with-critic.md: generated prompt equal to the base prompt plus the review addendum.prompts/generate-master-figure-with-external-review.md: generated prompt equal to the base prompt plus the external-review addendum.
Condition B is defined as a persistent two-session loop: one continuing worker session plus one continuing same-model critic session that reviews the current SVG and is reused across review rounds rather than respawned each time.
Condition C is defined as one continuing worker session plus repeated external gpt-5.4-pro review calls that receive the current SVG and prior review history.
Regenerate the derived prompts with:
uv run python scripts/build_prompts.pyRun the external gpt-5.4-pro reviewer with:
uv run python scripts/external_review.py \
--proposal inputs/project_description.tex \
--svg artifacts/master-figure/master-figure.svg \
--history-dir runs/<run-id>/reviews \
--output-md artifacts/master-figure-external-review/review.md \
--output-json artifacts/master-figure-external-review/review.jsonThis script reads OPENAI_API_KEY from the environment, sends the proposal text plus the current SVG source to the Responses API, optionally includes prior markdown reviews from --history-dir, and writes both the raw review and a parsed JSON summary.
Launch an isolated background Codex run with:
uv run python scripts/launch_codex_exec.py af
uv run python scripts/launch_codex_exec.py cf --figma-file-url 'https://www.figma.com/design/...'
uv run python scripts/launch_codex_exec.py base
uv run python scripts/launch_codex_exec.py critic
uv run python scripts/launch_codex_exec.py externalEach launch creates an isolated temp workspace under /tmp/worker-critic-example-runs/<run-id>/ by seeding a minimal snapshot of this repo, initializing a fresh git repo there, writing a run-local launch script, and then starting codex exec inside a named tmux session with gpt-5.4, model_reasoning_effort="xhigh", and --dangerously-bypass-approvals-and-sandbox.
af is the Figma-native baseline. It uses the shared base prompt plus prompts/figma-addendum.md, targets the configured Figma file through the MCP server, and writes local artifacts under artifacts/master-figure-figma/ inside the temp workspace.
cf is the Figma-native external-review condition. It uses the shared base prompt plus the Figma addendum and the Figma external-review addendum, keeps the Figma frame as the editable source of truth, exports a local SVG from that frame for review, and loops with the external gpt-5.4-pro reviewer until approval.
Before af or cf launches, the harness runs scripts/check_figma_mcp.py as a one-call Figma preflight against the requested file key. If use_figma is blocked by plan limits, rate limits, or permissions, the launcher aborts immediately with the Figma error instead of starting a long run and silently degrading into a non-Figma fallback.
For Figma-native conditions, use --figma-file-url to point the run at a specific Figma file. The launcher injects that exact URL and file key into the run-local AGENTS.md, the prompt snapshot, and launch.json.
Each run saves:
- the exact prompt sent to Codex;
- a generated run-local
AGENTS.mdderived fromrun-AGENTS.md; - a JSON launch record, tmux session name, and pane PID;
- a tmux wrapper log plus Codex exit code when the run finishes;
- Codex JSONL event output;
- Codex stderr;
- the last assistant message;
- all intermediate artifacts requested by the run-specific bookkeeping addendum.
To inspect or attach to a live run:
tmux list-sessions
tmux attach -t worker-critic-<run-id>
tmux capture-pane -pt worker-critic-<run-id>The shared template at run-AGENTS.md is the single file used for A, B, and C. The launcher fills in the condition-specific objective and runs/<run-id>/ paths, then writes the rendered file to the temp workspace as AGENTS.md.
These temp workspaces are intentionally independent of the source repo:
- each run gets its own
.gitdirectory; - no git worktree is attached to the source repo;
- no parent
notes/files are exposed to the run; uvoperates inside the temp workspace and can create its own local.venv.
Use a different temp parent if needed:
uv run python scripts/launch_codex_exec.py base --workspace-root /tmp/my-run-rootThe repo also has a Claude-based harness that keeps session continuity explicitly across review rounds.
Launch an isolated background Claude run with:
uv run python scripts/launch_claude_exec.py base
uv run python scripts/launch_claude_exec.py critic
uv run python scripts/launch_claude_exec.py externalThis launcher:
- creates an isolated temp repo under
/tmp/worker-critic-example-runs/<run-id>/; - renders a run-local
CLAUDE.mdfromrun-CLAUDE.md; - starts a tmux-backed runner that uses persistent Claude sessions;
- keeps one worker session across the whole run;
- for Condition B, keeps one persistent Claude critic session across review rounds;
- for Condition C, keeps one persistent worker session and uses
scripts/anthropic_review.pyfor the external Foundry reviewer.
The Claude runner is implemented in scripts/run_claude_condition.py.
The Azure Foundry reviewer is implemented in scripts/anthropic_review.py.
The shared review prompt is prompts/review-master-figure.md.
Current Foundry note:
- local smoke tests on April 6, 2026 confirmed
claude-opus-4-6on this endpoint; - the explicit Sonnet model names I tried (
claude-sonnet-4-6,claude-sonnet-4-5, andclaude-sonnet-4) were rejected by the current deployment; - so the Claude launcher defaults to
claude-opus-4-6for the worker and critic unless you override--worker-modelor--critic-model.
Example override:
uv run python scripts/launch_claude_exec.py critic \
--worker-model claude-opus-4-6 \
--critic-model claude-opus-4-6The external reviewer can be called directly with:
uv run python scripts/anthropic_review.py \
--proposal inputs/project_description.tex \
--svg artifacts/master-figure/master-figure.svg \
--history-dir runs/<run-id>/reviews \
--output-md artifacts/master-figure-external-review/review.md \
--output-json artifacts/master-figure-external-review/review.json \
--model claude-opus-4-6After the three runs finish, collect the final figures and build the comparison media with:
uv run python scripts/build_comparison_artifacts.py \
--run-prefix 20260406-192417 \
--output-dir artifacts/20260406-192417-comparisonThis copies the final PNGs and notes from the /tmp run workspaces into a repo-local directory and generates:
final-comparison.png: labeled side-by-side final figures;gifs/base-progress.gif: base-condition draft progression;gifs/critic-progress.gif: same-model-critic draft progression;gifs/external-progress.gif: external-review draft progression;summary.md: source run roots, frame counts, and copied artifact paths.