Skip to content

docs: Documentation audit — outdated, incorrect, and missing content #187

@vredchenko

Description

@vredchenko

Summary

A comprehensive audit of docs/ compared against the current state of the codebase (primarily smartem-decisions, smartem-frontend, smartem-devtools). The docs are structurally sound with good organisation and useful ADRs, but they largely reflect the state of the codebase from mid-to-late 2025. Significant additions since then are undocumented, and several claims are now factually wrong.

Findings are grouped by severity.


Incorrect (will mislead readers)

These will cause confusion or errors if followed.

Doc Claim Reality
backend/api-server.md python -m smartem_backend.simulate_msg and ./tools/simulate-messages.sh exist Neither exists. The tool is tools/external_message_simulator.py
backend/api-server.md ./scripts/k8s/dev-k8s.sh up (implies it's in smartem-decisions) Script lives in smartem-devtools scripts/k8s/, not in smartem-decisions
backend/database.md Lists 3 migrations (baseline, indexes, prediction models) There are 7 migrations — missing: quality metrics, schema drift fixes, suggested acquisition index, agent log table
backend/database.md Baseline migration ID 6e6302f1ccb6 Actual baseline file is 2025_09_18_1042-001_create_core_smartem_schema_baseline.py — ID likely differs
operations/kubernetes.md ./scripts/k8s/dev-k8s.sh (implies in smartem-decisions) Lives in smartem-devtools scripts/k8s/
development/generate-docs.md tox -e docs to generate documentation No tox.ini exists in smartem-decisions. Docs generation has moved elsewhere
development/e2e-simulation.md ./tests/e2e/run-e2e-test.sh (implies in smartem-decisions) Lives in smartem-devtools tests/e2e/
glossary.md ARIA = "Automated Real-time Image Analysis" ARIA is a central metadata repository for structural biology data from multiple facilities (INSTRUCT-ERIC) — not real-time image analysis

Severely Incomplete (missing major functionality)

These documents exist but cover only a fraction of current functionality.

Doc What's documented What actually exists
backend/api-documentation.md ~8 API endpoints Reality has 60+ endpoints including atlas tiles, quality predictions, agent sessions/logs, debug endpoints, frontend SSE stream, ordered foilholes, latent representations
backend/database.md Implies ~5 core tables Reality has 22 tables including quality prediction models/weights/parameters, quality metrics/statistics, agent log/connection/session/instruction/acknowledgement, atlas tiles, overall quality predictions
agent/cli-reference.md Documents parse/validate/watch Missing the installed smartem-agent CLI entry point (via pyproject.toml [project.scripts]), and 5 other CLI entry points (smartem.agent-cleanup, smartem.register-prediction-model, smartem.init-model-weight, smartem.random-model-predictions, smartem.random-prior-updates)
operations/environment-variables.md Lists DB and RabbitMQ vars Missing CORS_ALLOWED_ORIGINS, SMARTEM_BACKEND_CONFIG, and appconfig.yml YAML-based configuration (database pool settings, particle_select_batch_size, etc.)
development/tools.md Lists 4 tools Missing db_table_totals.py, check_schema_drift.py, generate_api_docs.py, makeiso.sh

Stale / Outdated (not wrong per se, but no longer reflects current state)

Doc Issue
smartem-decisions README.md Still calls itself "proof of concept" — the system is production-grade with 60+ endpoints, 22 DB tables, CI/CD, K8s deployment, ML pipeline
backend/api-server.md Only covers basic API/consumer startup. Doesn't mention appconfig.yml, frontend_stream.py (frontend SSE), agent log submission, or the ML prediction pipeline
operations/containerization.md Documents multi-stage developer/build/runtime stages — this matches Dockerfile.dev, but the production Dockerfile is simpler (installs from PyPI). Docs don't distinguish between the two
operations/containerization.md Image name ghcr.io/diamondlightsource/smartem-decisions — CI actually uses ghcr.io/${{ github.repository }} (case-sensitive)

Entirely Missing Documentation

No docs exist for these areas:

  • ML prediction pipeline — quality prediction models, model weights, training tables, metrics aggregation
  • Frontend SSE streamGET /frontend/events/stream for real-time UI updates
  • Agent loggingPOST /agent/{agent_id}/session/{session_id}/logs and the agentlog table (added March 2026)
  • Debug endpoints — agent session management, test instruction creation
  • Image serving — atlas and gridsquare image endpoints (GET /grids/{grid_uuid}/atlas_image, GET /gridsquares/{gridsquare_uuid}/gridsquare_image)
  • appconfig.yml — YAML-based application configuration (DB pool tuning, batch sizes, log file path)
  • smartem-frontend — no documentation at all. Frontend is now a monorepo with npm workspaces (apps/legacy, apps/smartem, packages/api, packages/ui), React 19, MUI 7, TanStack Router, Tailwind CSS 4, Orval API client generation
  • smartem-devtools webui — the developer dashboard (React 19, Vite, MDX) has no documentation
  • smartem-workspace upgrade pathuvx caches tools, so uvx smartem-workspace keeps serving the previously-installed version after a new PyPI release. Users must run uvx --refresh smartem-workspace ..., uvx smartem-workspace@latest ..., or uv cache clean smartem-workspace to pick up a new version. Release notes and the smartem-workspace README should call this out whenever a new version ships.

Suggested Priority

  1. Fix incorrect claims (wrong script paths, phantom modules, wrong ARIA definition) — these actively mislead
  2. Update database.md — migration list and table inventory are significantly behind
  3. Update API documentation — endpoint coverage is ~13% of reality
  4. Add frontend docs — entire subsystem undocumented
  5. Fill in missing topics (ML pipeline, appconfig, image serving, agent logging)
  6. Refresh stale content (README "proof of concept", containerization docs)

Recycle e2e test notes (from #152)

PR #152 landed a comprehensive tests/e2e/README.md in smartem-devtools (+346 lines covering prerequisites, env config, recordings, automated and manual execution, replay speeds, multi-microscope setup). That content is now the freshest source of truth on how to run the e2e suite and should be recycled into the wider docs/skills surface so it does not drift:

  • Update repos/DiamondLightSource/smartem-decisions/docs/development/e2e-simulation.md — script path is wrong (script lives in smartem-devtools tests/e2e/, not smartem-decisions); align procedures with the new README instead of duplicating them, and link out to it as the authoritative source
  • Update the Claude skill(s) related to e2e testing — fold in the latest prerequisites (k8s NodePorts, .env.local-test-run, recordings table, uv sync --all-extras) and the single-/multi-microscope invocation forms so Claude stops referencing stale paths/procedures
  • Cross-link from smartem-devtools top-level README / docs index so the e2e README is discoverable without knowing to look under tests/e2e/
  • Decide on single source of truth — either keep the long-form guide in tests/e2e/README.md and have docs/development/e2e-simulation.md be a thin pointer, or move the content into docs/ and keep tests/e2e/README.md as the thin pointer. Avoid maintaining both in parallel.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to project documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions