worker-critic-example

An example of a worker-critic agentic workflow

Prompt files

prompts/generate-master-figure.md: base prompt for figure generation.
prompts/figma-addendum.md: additive instructions for the Figma-native baseline variant.
prompts/figma-external-review-addendum.md: additive instructions for the Figma-native external-review variant.
prompts/critic-review-addendum.md: additive instructions for the reviewed variant.
prompts/external-review-addendum.md: additive instructions for the external-review variant.
prompts/generate-master-figure-with-figma.md: generated prompt equal to the base prompt plus the Figma addendum.
prompts/generate-master-figure-with-figma-external-review.md: generated prompt equal to the base prompt plus the Figma addendum plus the Figma external-review addendum.
prompts/generate-master-figure-with-critic.md: generated prompt equal to the base prompt plus the review addendum.
prompts/generate-master-figure-with-external-review.md: generated prompt equal to the base prompt plus the external-review addendum.

Condition B is defined as a persistent two-session loop: one continuing worker session plus one continuing same-model critic session that reviews the current SVG and is reused across review rounds rather than respawned each time. Condition C is defined as one continuing worker session plus repeated external gpt-5.4-pro review calls that receive the current SVG and prior review history.

Regenerate the derived prompts with:

uv run python scripts/build_prompts.py

External review

Run the external gpt-5.4-pro reviewer with:

uv run python scripts/external_review.py \
  --proposal inputs/project_description.tex \
  --svg artifacts/master-figure/master-figure.svg \
  --history-dir runs/<run-id>/reviews \
  --output-md artifacts/master-figure-external-review/review.md \
  --output-json artifacts/master-figure-external-review/review.json

This script reads OPENAI_API_KEY from the environment, sends the proposal text plus the current SVG source to the Responses API, optionally includes prior markdown reviews from --history-dir, and writes both the raw review and a parsed JSON summary.

Detached runs

Launch an isolated background Codex run with:

uv run python scripts/launch_codex_exec.py af
uv run python scripts/launch_codex_exec.py cf --figma-file-url 'https://www.figma.com/design/...'
uv run python scripts/launch_codex_exec.py base
uv run python scripts/launch_codex_exec.py critic
uv run python scripts/launch_codex_exec.py external

Each launch creates an isolated temp workspace under /tmp/worker-critic-example-runs/<run-id>/ by seeding a minimal snapshot of this repo, initializing a fresh git repo there, writing a run-local launch script, and then starting codex exec inside a named tmux session with gpt-5.4, model_reasoning_effort="xhigh", and --dangerously-bypass-approvals-and-sandbox.

af is the Figma-native baseline. It uses the shared base prompt plus prompts/figma-addendum.md, targets the configured Figma file through the MCP server, and writes local artifacts under artifacts/master-figure-figma/ inside the temp workspace.

cf is the Figma-native external-review condition. It uses the shared base prompt plus the Figma addendum and the Figma external-review addendum, keeps the Figma frame as the editable source of truth, exports a local SVG from that frame for review, and loops with the external gpt-5.4-pro reviewer until approval.

Before af or cf launches, the harness runs scripts/check_figma_mcp.py as a one-call Figma preflight against the requested file key. If use_figma is blocked by plan limits, rate limits, or permissions, the launcher aborts immediately with the Figma error instead of starting a long run and silently degrading into a non-Figma fallback.

For Figma-native conditions, use --figma-file-url to point the run at a specific Figma file. The launcher injects that exact URL and file key into the run-local AGENTS.md, the prompt snapshot, and launch.json.

Each run saves:

the exact prompt sent to Codex;
a generated run-local AGENTS.md derived from run-AGENTS.md;
a JSON launch record, tmux session name, and pane PID;
a tmux wrapper log plus Codex exit code when the run finishes;
Codex JSONL event output;
Codex stderr;
the last assistant message;
all intermediate artifacts requested by the run-specific bookkeeping addendum.

To inspect or attach to a live run:

tmux list-sessions
tmux attach -t worker-critic-<run-id>
tmux capture-pane -pt worker-critic-<run-id>

The shared template at run-AGENTS.md is the single file used for A, B, and C. The launcher fills in the condition-specific objective and runs/<run-id>/ paths, then writes the rendered file to the temp workspace as AGENTS.md.

These temp workspaces are intentionally independent of the source repo:

each run gets its own .git directory;
no git worktree is attached to the source repo;
no parent notes/ files are exposed to the run;
uv operates inside the temp workspace and can create its own local .venv.

Use a different temp parent if needed:

uv run python scripts/launch_codex_exec.py base --workspace-root /tmp/my-run-root

Claude Code runs

The repo also has a Claude-based harness that keeps session continuity explicitly across review rounds.

Launch an isolated background Claude run with:

uv run python scripts/launch_claude_exec.py base
uv run python scripts/launch_claude_exec.py critic
uv run python scripts/launch_claude_exec.py external

This launcher:

creates an isolated temp repo under /tmp/worker-critic-example-runs/<run-id>/;
renders a run-local CLAUDE.md from run-CLAUDE.md;
starts a tmux-backed runner that uses persistent Claude sessions;
keeps one worker session across the whole run;
for Condition B, keeps one persistent Claude critic session across review rounds;
for Condition C, keeps one persistent worker session and uses scripts/anthropic_review.py for the external Foundry reviewer.

The Claude runner is implemented in scripts/run_claude_condition.py. The Azure Foundry reviewer is implemented in scripts/anthropic_review.py. The shared review prompt is prompts/review-master-figure.md.

Current Foundry note:

local smoke tests on April 6, 2026 confirmed claude-opus-4-6 on this endpoint;
the explicit Sonnet model names I tried (claude-sonnet-4-6, claude-sonnet-4-5, and claude-sonnet-4) were rejected by the current deployment;
so the Claude launcher defaults to claude-opus-4-6 for the worker and critic unless you override --worker-model or --critic-model.

Example override:

uv run python scripts/launch_claude_exec.py critic \
  --worker-model claude-opus-4-6 \
  --critic-model claude-opus-4-6

The external reviewer can be called directly with:

uv run python scripts/anthropic_review.py \
  --proposal inputs/project_description.tex \
  --svg artifacts/master-figure/master-figure.svg \
  --history-dir runs/<run-id>/reviews \
  --output-md artifacts/master-figure-external-review/review.md \
  --output-json artifacts/master-figure-external-review/review.json \
  --model claude-opus-4-6

Comparison artifacts

After the three runs finish, collect the final figures and build the comparison media with:

uv run python scripts/build_comparison_artifacts.py \
  --run-prefix 20260406-192417 \
  --output-dir artifacts/20260406-192417-comparison

This copies the final PNGs and notes from the /tmp run workspaces into a repo-local directory and generates:

final-comparison.png: labeled side-by-side final figures;
gifs/base-progress.gif: base-condition draft progression;
gifs/critic-progress.gif: same-model-critic draft progression;
gifs/external-progress.gif: external-review draft progression;
summary.md: source run roots, frame counts, and copied artifact paths.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
artifacts		artifacts
inputs		inputs
notes		notes
prompts		prompts
proposal_autoresearch		proposal_autoresearch
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
plot.py		plot.py
prepare.py		prepare.py
program.md		program.md
pyproject.toml		pyproject.toml
run-AGENTS.md		run-AGENTS.md
run-CLAUDE.md		run-CLAUDE.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

worker-critic-example

Prompt files

External review

Detached runs

Claude Code runs

Comparison artifacts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

worker-critic-example

Prompt files

External review

Detached runs

Claude Code runs

Comparison artifacts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages