Skip to content

Follow-ups from code review + FastMCP alignment (Phase 2)#15

Open
tony wants to merge 97 commits intomainfrom
2026-04-follow-ups
Open

Follow-ups from code review + FastMCP alignment (Phase 2)#15
tony wants to merge 97 commits intomainfrom
2026-04-follow-ups

Conversation

@tony
Copy link
Copy Markdown
Member

@tony tony commented Apr 13, 2026

Summary

29 commits on 2026-04-follow-ups, addressed in two waves:

  • Phase 1 (commits 1–12) — original code-review follow-ups: caller identity, observability, pane-tools split, doctest scoping, safety docs, plus four small follow-up cleanups.
  • Phase 2 (commits 13–29) — FastMCP-alignment work derived from a multi-model brainstorm-and-refine analysis. Each commit is one coherent change; quality gate (ruff, mypy, pytest --reruns 0, just build-docs) ran green before each commit landed. Test count grew from 186 → 276 (+90 regression-guard tests).

Quality gate

$ uv run ruff check . --fix --show-fixes
$ uv run ruff format .
$ uv run mypy
$ uv run py.test --reruns 0 -vvv
$ just build-docs

--reruns 0 (overriding the project default of 2) was the standard for the entire 29-commit series, so flakes are surfaced not masked. The one timing-sensitive test that surfaced (test_wait_for_content_change_timeout) was stabilized in commit 13 with an active settle loop before any new feature work landed.

Phase 1 — original review follow-ups

# Commit Change
1 fix(caller-identity) Parse TMUX env var; self-kill guards now compare the caller's socket path to the target server
2 refactor(imports) Hoisted 16 inline from fastmcp.exceptions import ToolError to module level
3 feat(observability) AuditMiddleware — one structured record per tool call, with redacted digests for keys / text / value
4 refactor(pane) Split 1315-line pane_tools.py into a pane_tools/ package (io, wait, search, copy_mode, layout, lifecycle, pipe, meta). Backwards-compatible re-exports
5 docs(agents) Scoped doctest policy to pure helpers; tmux-touching tools exempt
6 test(doctest) Removed -p no:doctest, added --doctest-modules, seeded doctests on _redact_digest and _summarize_args
7 docs(safety) Footguns section for pipe_pane / set_environment / send_keys; documented the audit log; docstring warnings surface via sphinx_autodoc_fastmcp
8 test(middleware) Use real MiddlewareContext + CallToolRequestParams
9 style Apply ruff format and drop dead mypy override
10 refactor(pane_tools[io]) Hoist subprocess and tempfile imports
11 refactor(pane_tools[wait]) Hoist time and retry_until imports
12 docs(_utils) Align _caller_is_on_server docstring with code

Phase 2 — FastMCP alignment

Driven by a multi-model brainstorm session (Claude × 3 / Gemini × 3 / GPT × 3, three refinement passes) that audited libtmux-mcp against FastMCP, libtmux, and tmux. The session produced a 17-commit plan with explicit priority tiers (P1P3) and four "deliberately not recommended" items.

Quality gate precursor

  1. test(pane_tools[wait]): stabilize test_wait_for_content_change_timeout under --reruns=0
    Active 3-streak settle loop replaces a fixed sleep so zsh async hooks (vcs_info, git prompt) don't trip the no-change assertion. Lets the rest of the series run under the strict gate without flake masking.

PR group A — Preconditions (correctness fixes)

  1. mcp(fix[_utils,annotations]): ANNOTATIONS_SHELL for open-world shell tools
    send_keys, paste_text, pipe_pane were sharing ANNOTATIONS_CREATE which set openWorldHint=False. Lying to MCP clients. Fix is a new preset, not a global flip — same shared dict is also used by create_session/create_window/split_window/swap_pane/enter_copy_mode, none of which are open-world.

  2. mcp(docs[server,instructions]): surface display_message, snapshot_pane, wait_for_* in _BASE_INSTRUCTIONS
    The most token-efficient read tools were never named in the agent-facing instructions, so agents defaulted to capture_pane retry loops.

  3. mcp(fix[pane_tools/io]): bound capture_pane with max_lines and in-band tail-preserving truncation
    Default max_lines=500. Prefix "[... truncated K lines ...]\n" when trimmed. max_lines=None opts out for callers that genuinely need full scrollback. Tail-preserving because the active prompt lives at the bottom.

  4. mcp(fix[pane_tools/meta,models]): bound snapshot_pane with max_lines and typed content_truncated fields
    Same semantics as docs: restore prompt admonition styling and rebuild copy buttons after gp-sphinx SPA swap #16 but surfaced via two new PaneSnapshot fields (content_truncated: bool, content_truncated_lines: int) rather than an in-band header — PaneSnapshot is already a Pydantic model so the typed channel is the cleaner shape.

  5. mcp(fix[pane_tools/search,models]): paginate search_panes via SearchPanesResult wrapper
    Breaking change: list[PaneContentMatch]SearchPanesResult with matches, truncated, truncated_panes, total_panes_matched, offset, limit. Per-pane cap on matched_lines plus pane-level pagination so truncation doesn't mean permanent blindness to later matches.

PR group B — Context + wait-for

  1. mcp(refactor[_utils]): add handle_tool_errors_async for Context-using tools
    Prerequisite for #20. Sync handle_tool_errors is unchanged; async variant funnels through the same exception-to-ToolError mapping table.

  2. mcp(feat[pane_tools/wait]): Context.report_progress in wait_for_text and wait_for_content_change
    Both wait tools converted to async def with a ctx: Context | None parameter. Replaces retry_until with an explicit asyncio loop that emits progress on every tick. No elicit — FastMCP 3.1 has no first-class elicitation capability check (capabilities.sampling is the wrong predicate; it would hang headless CI agents).

  3. mcp(feat[wait_for_tools]): tmux wait-for channel tools with bounded timeout
    New wait_for_channel(channel, timeout=30.0) and signal_channel(channel). subprocess.run(timeout=…) is mandatory — tmux wait-for blocks indefinitely at the OS level. Tool docstring teaches the safe status=$?; tmux wait-for -S done; exit $status composition pattern so a crashed shell command doesn't pay the full timeout penalty.

PR group C — Structured outputs + resources

  1. mcp(test[tools]): regression-guard read-heavy tools against str-return drift
    Audit found get_pane_info / get_server_info / list_* / snapshot_pane already return Pydantic models. New regression tests pin that so a future refactor can't silently drop the MCP outputSchema by flattening one back to str.

  2. mcp(refactor[resources/hierarchy]): declare mime_type on all hierarchy resources
    Five JSON resources get mime_type="application/json"; get_pane_content gets mime_type="text/plain". (FastMCP requires resources to return str | bytes | list[ResourceContent], so the json.dumps payload stays — but clients now see the correct MIME tag.)

PR group D — New tool families

  1. mcp(feat[buffer_tools]): agent-namespaced paste buffers with UUID nonce
    load_buffer / paste_buffer / show_buffer / delete_buffer. Every call allocates libtmux_mcp_<uuid4hex>_<logical> so concurrent agents (or parallel tool calls) cannot collide on the server-global buffer namespace. list_buffers is intentionally NOT exposed — OS clipboard sync into tmux buffers makes a generic enumeration a privacy footgun. _SENSITIVE_ARG_NAMES extended for content.

  2. mcp(feat[hook_tools]): read-only show_hook and show_hooks
    Wraps libtmux's HooksMixin.show_hook/show_hooks with a HookListResult model that flattens scalar / dict / SparseArray shapes. Write-hooks (set_hook/unset_hook) are deferred indefinitely — FastMCP lifespan teardown does not run on kill -9/OOM/C-extension fault, so any soft cleanup registry would leak agent-installed shell hooks into the user's persistent tmux server. The module docstring documents this and names three future paths.

  3. mcp(feat[prompts]): narrow prompts module with PromptsAsTools fallback
    Four prompts (run_and_wait, diagnose_failing_pane, build_dev_workspace, interrupt_gracefully) — the canonical recipes from docs/topics/prompting.md, encoded as protocol-level prompts. LIBTMUX_MCP_PROMPTS_AS_TOOLS=1 env-gates the PromptsAsTools transform for tool-only clients.

PR group E — Middleware + lifespan

  1. mcp(feat[middleware]): TailPreservingResponseLimitingMiddleware subclass
    FastMCP's stock ResponseLimitingMiddleware truncates the tail of oversized responses, which destroys the active shell prompt for terminal scrollback. Subclassed to drop the head and prefix the same [... truncated K bytes ...]\n header the per-tool caps use. max_size=50_000 (strictly above per-tool caps so it's a backstop, not a primary cap), scoped to capture_pane / search_panes / snapshot_pane.

  2. mcp(feat[middleware,server]): add TimingMiddleware and ErrorHandlingMiddleware
    Stack order pinned by test: Timing → TailPreservingResponseLimiting → ErrorHandling(transform_errors=True) → Safety → Audit. ErrorHandlingMiddleware(transform_errors=True) maps ResourceError from resources/hierarchy.py to MCP code -32002.

  3. mcp(feat[server,_utils]): FastMCP lifespan with tmux presence check and cache cleanup
    shutil.which("tmux") probe at startup converts a missing-tmux failure from a confusing post-handshake TmuxCommandNotFound into a clean cold-start RuntimeError. Clears _server_cache on graceful shutdown (SIGTERM/SIGINT only — by design, not relied on for any invariant that must survive hard crashes; see commit 25's docstring).

Deliberately NOT in this PR

Each was considered in the brainstorm and explicitly rejected with reasoning documented in the relevant module's docstring:

  • Write-hooks (set_hook/unset_hook). Three future paths documented in hook_tools.py module docstring; none in scope here.
  • Control mode (tmux -C event bridge). Per-byte %output events across all panes would firehose Python's asyncio loop and starve MCP protocol handlers. If pursued later, must start from refresh-client -B per-subscription with bounded queue + per-pane rate limit, not blanket attach.
  • ctx.elicit for destructive-tool confirmation. No reliable client-capability check exists in FastMCP 3.1; capabilities.sampling is the wrong predicate (it describes server→client LLM requests, not human UI). Tier gate + self-kill guards remain the hard boundary.
  • Generic list_buffers. OS clipboard sync makes it a privacy footgun.
  • ResponseCachingMiddleware, RateLimitingMiddleware, ctx.sample, HTTP transport, MCPB bundling. Architectural mis-fits; rejected in the brainstorm.

Diff size

40 files changed, 4870 insertions(+), 1438 deletions(-)
276 tests pass under --reruns 0

Test plan

  • uv run ruff check . — clean
  • uv run ruff format --check . — clean
  • uv run mypy — clean
  • uv run py.test --reruns 0 -vvv — 276/276 pass
  • just build-docs — succeeded
  • Each Phase-2 commit independently passes the same 5-step gate (verified by enforcing it before every commit landed)
  • Manual: start the server with tmux temporarily renamed on PATH; confirm clean cold-start RuntimeError from the lifespan probe
  • Manual: mcp__libtmux__capture_pane against a >50 KB scrollback pane with max_lines=None; confirm TailPreservingResponseLimitingMiddleware truncates the head, preserves the tail, prefixes the header
  • Manual: mcp__libtmux__search_panes with offset/limit pagination across multiple matching panes
  • Manual: wait_for_channel + signal_channel end-to-end in a real tmux session
  • Manual: LIBTMUX_MCP_PROMPTS_AS_TOOLS=1 and confirm prompts appear in the tool list
  • Manual: trigger a resource error (invalid session name on get_session); confirm MCP error code -32002 from ErrorHandlingMiddleware

Notes for reviewers

  • Phase 1 review remains unchanged; the original Summary section above (now folded into the table) covers commits 1–7 and the four follow-up cleanups (8–12).
  • Phase 2 was driven by a written plan at /home/d/.claude/plans/memoized-pondering-sparkle.md (local artifact, not committed). Each commit body cites the brainstorm finding it acts on.
  • Natural cut points if the PR needs to be split for review: A (commits 13–18), B (19–21), C (22–23), D (24–26), E (27–29). Every commit independently passes the quality gate, so reverts inside any group are clean.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 88.77551% with 99 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.18%. Comparing base (65e9ad4) to head (8c76b87).

Files with missing lines Patch % Lines
src/libtmux_mcp/_utils.py 84.34% 11 Missing and 7 partials ⚠️
src/libtmux_mcp/tools/server_tools.py 75.38% 15 Missing and 1 partial ⚠️
src/libtmux_mcp/tools/hook_tools.py 81.35% 8 Missing and 3 partials ⚠️
src/libtmux_mcp/tools/wait_for_tools.py 72.50% 11 Missing ⚠️
src/libtmux_mcp/tools/buffer_tools.py 89.01% 8 Missing and 2 partials ⚠️
src/libtmux_mcp/tools/pane_tools/io.py 86.20% 7 Missing and 1 partial ⚠️
src/libtmux_mcp/tools/pane_tools/layout.py 85.71% 4 Missing and 4 partials ⚠️
src/libtmux_mcp/middleware.py 90.47% 3 Missing and 3 partials ⚠️
src/libtmux_mcp/tools/pane_tools/lifecycle.py 86.36% 2 Missing and 1 partial ⚠️
src/libtmux_mcp/server.py 92.59% 2 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #15      +/-   ##
==========================================
+ Coverage   82.90%   86.18%   +3.27%     
==========================================
  Files          17       30      +13     
  Lines         983     1513     +530     
  Branches      110      182      +72     
==========================================
+ Hits          815     1304     +489     
- Misses        122      155      +33     
- Partials       46       54       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tony tony force-pushed the 2026-04-follow-ups branch from 76bec04 to f9d5388 Compare April 14, 2026 00:00
@tony tony marked this pull request as ready for review April 14, 2026 00:20
@tony
Copy link
Copy Markdown
Member Author

tony commented Apr 14, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony tony changed the title Follow-ups from code review: caller identity, observability, pane split, docs Follow-ups from code review + FastMCP alignment (Phase 2) Apr 14, 2026
tony added a commit that referenced this pull request Apr 15, 2026
…thread around capture_pane

why: In commit 67c3fc8 the wait tools became `async def` so that
     ctx.report_progress could be awaited during long polls. FastMCP
     direct-awaits async tools on the main event loop
     (`fastmcp/tools/function_tool.py:277-278`) — it only uses a
     threadpool for *sync* tools via `call_sync_fn_in_threadpool`.
     That means every blocking `pane.capture_pane()` subprocess.run
     inside the poll loop starves every other concurrent MCP request
     for the duration of the tmux roundtrip (~10s of ms per tick), for
     the full configured timeout.

     A 60-second `wait_for_text` call would pin the entire server for
     60 seconds, delaying list_sessions / capture_pane / anything else.
     Loom code review surfaced this with 100% confidence (Gemini
     Skeptic pass on PR #15); verified against FastMCP source.

what:
- Wrap the three blocking `pane.capture_pane(...)` calls in
  src/libtmux_mcp/tools/pane_tools/wait.py with `asyncio.to_thread`:
  * `wait_for_text` polling call.
  * `wait_for_content_change` initial snapshot.
  * `wait_for_content_change` polling call.
- Inline comment at the first site explains why (future maintainers
  will otherwise see "sync call inside async fn" as a bug and try to
  "fix" it by removing `async def`, reintroducing the blocking path).
- No docstring changes — the `async def` signature already promises
  non-blocking semantics; this commit actually delivers on it.
- Add `test_wait_tools_do_not_block_event_loop` in
  tests/test_pane_tools.py that runs `wait_for_text` against a
  never-matching pattern inside an asyncio.gather with a ticker
  coroutine. Asserts the ticker counter >= 5 over the 300 ms wait
  window — a blocking capture would leave it at 0 or 1. Deliberately
  a generous lower bound to stay robust on slow CI.
tony added a commit that referenced this pull request Apr 15, 2026
…_wait

why: The ``run_and_wait`` prompt hardcoded ``mcp_done`` as the tmux
     wait-for channel name. tmux channels are server-global, so two
     concurrent agents (or parallel prompt renderings from one agent)
     would race: one agent's ``tmux wait-for -S mcp_done`` would
     unblock another's pending ``wait_for_channel("mcp_done")``,
     producing a false positive completion signal.

     Flagged as Critical by GPT Builder in the Loom review of PR #15.

what:
- Import ``uuid`` in src/libtmux_mcp/prompts/recipes.py.
- Generate a fresh channel name at each call: ``libtmux_mcp_wait_<uuid4hex[:8]>``.
  * Prefix aligns with the buffer-tools namespace (libtmux_mcp_) so a
    future ``list_buffers(prefix="libtmux_mcp_")``-style operator sees
    every MCP-owned tmux artifact uniformly.
  * 8 hex characters (~32 bits) is safe against collision inside a
    single tmux server's concurrent-agent population.
- Interpolate the same channel name into both the send_keys payload
  and the wait_for_channel call so one prompt rendering remains
  internally consistent.
- Update the prompt docstring to note the per-invocation scope.
- tests/test_prompts.py:
  * Update ``test_run_and_wait_returns_string_template`` to pin the
    new prefix.
  * Add ``test_run_and_wait_channel_is_uuid_scoped`` asserting
    (a) channel matches ``libtmux_mcp_wait_[0-9a-f]{8}``,
    (b) send_keys and wait_for_channel use the same name within one
        rendering,
    (c) two separate renderings emit different channel names.
tony added a commit that referenced this pull request Apr 15, 2026
why: ``show_hook`` swallowed three tmux error substrings into an empty
     result: ``"too many arguments"`` (correctly — that's how tmux
     reports an unset hook), but also ``"unknown hook"`` and
     ``"invalid option"``. The latter two fire for *typos* and
     *wrong-scope* mistakes, not "hook is unset". Agents sending
     ``show_hook("pane-esited")`` (typo) or
     ``show_hook("pane-exited", scope="server")`` (wrong scope) got
     back an empty list indistinguishable from a correctly-named but
     unset hook — masking the real input error.

     Flagged as Important by both Gemini (Skeptic) and GPT (Builder)
     in the Loom review of PR #15. This commit also lands the upstream
     TODO comment for Suggestion #15 (libtmux's scope-kwarg argv bug).

what:
- Narrow the ``except libtmux_exc.OptionError`` branch in
  src/libtmux_mcp/tools/hook_tools.py::show_hook to only match
  ``"too many arguments"`` — the single substring all tmux builds
  supported by this project use for an unset hook. Any other
  OptionError (``"unknown hook"``, ``"invalid option"``, new future
  substrings) re-raises so ``handle_tool_errors`` surfaces it.
- Replace the block's comment with the reasoning so a future
  maintainer knows not to re-broaden the catch.
- Add a TODO(libtmux upstream) comment in ``_resolve_hook_target``
  explaining why the scope-nulling workaround exists. The fix
  belongs in libtmux's argv-assembly path, not here.
- tests/test_hook_tools.py:
  * Remove ``test_show_hook_missing_returns_empty`` (its use of
    ``"after-nonexistent-hook-cxyz"`` was actually exercising the
    now-removed broad catch).
  * Add ``test_show_hook_unset_known_hook_returns_empty`` that sets
    and then unsets a real hook (``pane-exited``) before calling
    ``show_hook`` — exercises the narrow "too many arguments" path.
  * Add ``test_show_hook_unknown_name_raises`` asserting that a
    bogus hook name now surfaces as ``ToolError`` matching
    ``r"invalid option|unknown hook"`` (regression guard against
    re-broadening the swallow).
tony added a commit that referenced this pull request Apr 15, 2026
…tmcp

why: ``fastmcp_model_classes`` in docs/conf.py is the allow-list that
     ``sphinx-autodoc-fastmcp`` uses to decide which Pydantic models
     get rendered in the generated schema docs. After PR #15 added
     seven new models, the tuple still listed only the original ten —
     so the new schemas (``SearchPanesResult``, ``PaneSnapshot``,
     ``ContentChangeResult``, ``HookEntry``, ``HookListResult``,
     ``BufferRef``, ``BufferContent``) were invisible in the built
     API docs, even though they're part of every tool surface that
     was documented.

     Flagged as Important by Gemini (Skeptic) in the Loom review.

what:
- Add the seven new model names to ``conf["fastmcp_model_classes"]``
  in docs/conf.py. Keeps the existing order for the old entries;
  new entries are inserted at semantically adjacent positions
  (e.g. ``SearchPanesResult`` next to ``PaneContentMatch``,
  ``ContentChangeResult`` next to ``WaitForTextResult``,
  ``HookEntry`` / ``HookListResult`` grouped, ``BufferRef`` /
  ``BufferContent`` grouped).
- Verified post-build: all seven names appear in the generated
  reference/api/models/index.html.
tony added a commit that referenced this pull request Apr 15, 2026
…ult shape

why: PR #15 wrapped ``search_panes`` output in a ``SearchPanesResult``
     model (matches + pagination fields) but the docs/tools/panes.md
     example still showed the pre-wrapper bare-array response shape.
     Three reviewers converged on this (Claude + Gemini + GPT in the
     Loom review, 3-way consensus).

what:
- Rewrite the response block in docs/tools/panes.md from:
      [ {pane_id: ..., matched_lines: [...]} ]
  to the actual wrapper:
      { matches: [...], truncated, truncated_panes,
        total_panes_matched, offset, limit }
- Add a one-paragraph pagination hint above the example explaining
  how to iterate larger result sets via ``offset += len(matches)``
  until ``truncated == false`` and ``truncated_panes == []``.
tony added a commit that referenced this pull request Apr 15, 2026
why: AGENTS.md §255-287 requires doctests on pure helpers. Most of
     the pure helpers introduced by PR #15 have them
     (``_validate_channel_name``, ``_validate_logical_name``,
     ``_validate_buffer_name``, ``_truncate_lines_tail``,
     ``_tmux_argv``, ``_pane_id_sort_key``). The critic pass surfaced
     ``_allocate_buffer_name`` as the lone remaining pure helper
     without a doctest — an oversight when commit 272396d introduced
     the module.

     Loom review Suggestion #16 (finalized).

what:
- Expand the docstring on
  src/libtmux_mcp/tools/buffer_tools.py::_allocate_buffer_name with
  a NumPy-style ``Examples`` block covering:
  * prefix contract (``"libtmux_mcp_"``),
  * logical-name suffix preservation,
  * 32-hex-character uuid nonce length,
  * empty-string and ``None`` both collapsing to the ``"buf"``
    fallback.
- The docstring also expands the rationale for why the name has the
  exact shape it does (privacy prefix + collision-resistant nonce
  + logical suffix), so future maintainers don't reduce the
  structure and accidentally re-introduce either the OS-clipboard
  read risk or the cross-agent collision risk that commit 272396d
  fixed.
@tony tony force-pushed the 2026-04-follow-ups branch 2 times, most recently from 65971d8 to c714044 Compare April 16, 2026 15:34
tony added a commit that referenced this pull request Apr 16, 2026
…thread around capture_pane

why: In commit 67c3fc8 the wait tools became `async def` so that
     ctx.report_progress could be awaited during long polls. FastMCP
     direct-awaits async tools on the main event loop
     (`fastmcp/tools/function_tool.py:277-278`) — it only uses a
     threadpool for *sync* tools via `call_sync_fn_in_threadpool`.
     That means every blocking `pane.capture_pane()` subprocess.run
     inside the poll loop starves every other concurrent MCP request
     for the duration of the tmux roundtrip (~10s of ms per tick), for
     the full configured timeout.

     A 60-second `wait_for_text` call would pin the entire server for
     60 seconds, delaying list_sessions / capture_pane / anything else.
     Loom code review surfaced this with 100% confidence (Gemini
     Skeptic pass on PR #15); verified against FastMCP source.

what:
- Wrap the three blocking `pane.capture_pane(...)` calls in
  src/libtmux_mcp/tools/pane_tools/wait.py with `asyncio.to_thread`:
  * `wait_for_text` polling call.
  * `wait_for_content_change` initial snapshot.
  * `wait_for_content_change` polling call.
- Inline comment at the first site explains why (future maintainers
  will otherwise see "sync call inside async fn" as a bug and try to
  "fix" it by removing `async def`, reintroducing the blocking path).
- No docstring changes — the `async def` signature already promises
  non-blocking semantics; this commit actually delivers on it.
- Add `test_wait_tools_do_not_block_event_loop` in
  tests/test_pane_tools.py that runs `wait_for_text` against a
  never-matching pattern inside an asyncio.gather with a ticker
  coroutine. Asserts the ticker counter >= 5 over the 300 ms wait
  window — a blocking capture would leave it at 0 or 1. Deliberately
  a generous lower bound to stay robust on slow CI.
tony added a commit that referenced this pull request Apr 16, 2026
…_wait

why: The ``run_and_wait`` prompt hardcoded ``mcp_done`` as the tmux
     wait-for channel name. tmux channels are server-global, so two
     concurrent agents (or parallel prompt renderings from one agent)
     would race: one agent's ``tmux wait-for -S mcp_done`` would
     unblock another's pending ``wait_for_channel("mcp_done")``,
     producing a false positive completion signal.

     Flagged as Critical by GPT Builder in the Loom review of PR #15.

what:
- Import ``uuid`` in src/libtmux_mcp/prompts/recipes.py.
- Generate a fresh channel name at each call: ``libtmux_mcp_wait_<uuid4hex[:8]>``.
  * Prefix aligns with the buffer-tools namespace (libtmux_mcp_) so a
    future ``list_buffers(prefix="libtmux_mcp_")``-style operator sees
    every MCP-owned tmux artifact uniformly.
  * 8 hex characters (~32 bits) is safe against collision inside a
    single tmux server's concurrent-agent population.
- Interpolate the same channel name into both the send_keys payload
  and the wait_for_channel call so one prompt rendering remains
  internally consistent.
- Update the prompt docstring to note the per-invocation scope.
- tests/test_prompts.py:
  * Update ``test_run_and_wait_returns_string_template`` to pin the
    new prefix.
  * Add ``test_run_and_wait_channel_is_uuid_scoped`` asserting
    (a) channel matches ``libtmux_mcp_wait_[0-9a-f]{8}``,
    (b) send_keys and wait_for_channel use the same name within one
        rendering,
    (c) two separate renderings emit different channel names.
@tony tony force-pushed the 2026-04-follow-ups branch from 94c4044 to bc4c2ed Compare April 16, 2026 22:01
tony added a commit that referenced this pull request Apr 16, 2026
why: ``show_hook`` swallowed three tmux error substrings into an empty
     result: ``"too many arguments"`` (correctly — that's how tmux
     reports an unset hook), but also ``"unknown hook"`` and
     ``"invalid option"``. The latter two fire for *typos* and
     *wrong-scope* mistakes, not "hook is unset". Agents sending
     ``show_hook("pane-esited")`` (typo) or
     ``show_hook("pane-exited", scope="server")`` (wrong scope) got
     back an empty list indistinguishable from a correctly-named but
     unset hook — masking the real input error.

     Flagged as Important by both Gemini (Skeptic) and GPT (Builder)
     in the Loom review of PR #15. This commit also lands the upstream
     TODO comment for Suggestion #15 (libtmux's scope-kwarg argv bug).

what:
- Narrow the ``except libtmux_exc.OptionError`` branch in
  src/libtmux_mcp/tools/hook_tools.py::show_hook to only match
  ``"too many arguments"`` — the single substring all tmux builds
  supported by this project use for an unset hook. Any other
  OptionError (``"unknown hook"``, ``"invalid option"``, new future
  substrings) re-raises so ``handle_tool_errors`` surfaces it.
- Replace the block's comment with the reasoning so a future
  maintainer knows not to re-broaden the catch.
- Add a TODO(libtmux upstream) comment in ``_resolve_hook_target``
  explaining why the scope-nulling workaround exists. The fix
  belongs in libtmux's argv-assembly path, not here.
- tests/test_hook_tools.py:
  * Remove ``test_show_hook_missing_returns_empty`` (its use of
    ``"after-nonexistent-hook-cxyz"`` was actually exercising the
    now-removed broad catch).
  * Add ``test_show_hook_unset_known_hook_returns_empty`` that sets
    and then unsets a real hook (``pane-exited``) before calling
    ``show_hook`` — exercises the narrow "too many arguments" path.
  * Add ``test_show_hook_unknown_name_raises`` asserting that a
    bogus hook name now surfaces as ``ToolError`` matching
    ``r"invalid option|unknown hook"`` (regression guard against
    re-broadening the swallow).
tony added a commit that referenced this pull request Apr 16, 2026
…tmcp

why: ``fastmcp_model_classes`` in docs/conf.py is the allow-list that
     ``sphinx-autodoc-fastmcp`` uses to decide which Pydantic models
     get rendered in the generated schema docs. After PR #15 added
     seven new models, the tuple still listed only the original ten —
     so the new schemas (``SearchPanesResult``, ``PaneSnapshot``,
     ``ContentChangeResult``, ``HookEntry``, ``HookListResult``,
     ``BufferRef``, ``BufferContent``) were invisible in the built
     API docs, even though they're part of every tool surface that
     was documented.

     Flagged as Important by Gemini (Skeptic) in the Loom review.

what:
- Add the seven new model names to ``conf["fastmcp_model_classes"]``
  in docs/conf.py. Keeps the existing order for the old entries;
  new entries are inserted at semantically adjacent positions
  (e.g. ``SearchPanesResult`` next to ``PaneContentMatch``,
  ``ContentChangeResult`` next to ``WaitForTextResult``,
  ``HookEntry`` / ``HookListResult`` grouped, ``BufferRef`` /
  ``BufferContent`` grouped).
- Verified post-build: all seven names appear in the generated
  reference/api/models/index.html.
tony added a commit that referenced this pull request Apr 16, 2026
…ult shape

why: PR #15 wrapped ``search_panes`` output in a ``SearchPanesResult``
     model (matches + pagination fields) but the docs/tools/panes.md
     example still showed the pre-wrapper bare-array response shape.
     Three reviewers converged on this (Claude + Gemini + GPT in the
     Loom review, 3-way consensus).

what:
- Rewrite the response block in docs/tools/panes.md from:
      [ {pane_id: ..., matched_lines: [...]} ]
  to the actual wrapper:
      { matches: [...], truncated, truncated_panes,
        total_panes_matched, offset, limit }
- Add a one-paragraph pagination hint above the example explaining
  how to iterate larger result sets via ``offset += len(matches)``
  until ``truncated == false`` and ``truncated_panes == []``.
tony added a commit that referenced this pull request Apr 16, 2026
why: AGENTS.md §255-287 requires doctests on pure helpers. Most of
     the pure helpers introduced by PR #15 have them
     (``_validate_channel_name``, ``_validate_logical_name``,
     ``_validate_buffer_name``, ``_truncate_lines_tail``,
     ``_tmux_argv``, ``_pane_id_sort_key``). The critic pass surfaced
     ``_allocate_buffer_name`` as the lone remaining pure helper
     without a doctest — an oversight when commit 272396d introduced
     the module.

     Loom review Suggestion #16 (finalized).

what:
- Expand the docstring on
  src/libtmux_mcp/tools/buffer_tools.py::_allocate_buffer_name with
  a NumPy-style ``Examples`` block covering:
  * prefix contract (``"libtmux_mcp_"``),
  * logical-name suffix preservation,
  * 32-hex-character uuid nonce length,
  * empty-string and ``None`` both collapsing to the ``"buf"``
    fallback.
- The docstring also expands the rationale for why the name has the
  exact shape it does (privacy prefix + collision-resistant nonce
  + logical suffix), so future maintainers don't reduce the
  structure and accidentally re-introduce either the OS-clipboard
  read risk or the cross-agent collision risk that commit 272396d
  fixed.
tony added 10 commits April 16, 2026 19:07
Previously _get_caller_pane_id read only TMUX_PANE. Pane IDs like %1
are only unique within a single tmux server, so the kill_server /
kill_session / kill_window / kill_pane guards could either falsely
block operations across unrelated sockets or (worse) falsely permit a
self-kill when a colliding pane id existed on the target socket.

Parse TMUX (format: socket_path,server_pid,session_id per tmux
environ.c:281) alongside TMUX_PANE to produce a CallerIdentity, and
compare socket paths via os.path.realpath against the target Server's
effective socket path. When socket info is ambiguous (e.g. TMUX_PANE
set but TMUX missing), err on the side of blocking — the error
message already directs users to run tmux manually if the guard is a
false positive.

- _utils.py: add CallerIdentity, _get_caller_identity,
  _effective_socket_path, _caller_is_on_server. Keep
  _get_caller_pane_id as a thin wrapper so _serialize_pane's is_caller
  annotation is unchanged.
- server_tools.kill_server: guard now fires only when caller's socket
  matches the target server (was: any TMUX_PANE blocked).
- session_tools.kill_session / window_tools.kill_window /
  pane_tools.kill_pane: self-kill checks gated on socket match.
- tests/conftest.py: autouse fixture clears TMUX/TMUX_PANE so host
  terminal doesn't leak into tests.
- tests/test_server_tools.py: self-kill guard test now sets TMUX to
  the target socket; new test covers cross-socket path.
- tests/test_utils.py: new coverage for parser, realpath match,
  cross-socket rejection, conservative fallback, and absent caller.
16 inline `from fastmcp.exceptions import ToolError` statements lived
inside function bodies across _utils.py, server_tools.py,
session_tools.py, window_tools.py, pane_tools.py, and
option_tools.py. This was a legacy habit that hurt grep-ability and
left the pane_tools module cluttered with identical import lines at 7
different indentation levels.

Hoisted to a single module-level import per file. No behavior change.
Prior to this commit the MCP server had no per-invocation visibility.
Operators investigating a misbehaving agent's tool loop had to rely on
indirect tmux state or the fastmcp framework's own logs, and
payload-bearing tools like send_keys / paste_text / set_environment
left no audit trail at all.

Adds AuditMiddleware, a fastmcp Middleware that emits one structured
INFO record per tool call to the libtmux_mcp.audit logger. Each
record carries tool name, outcome (ok/error), error_type on failure,
duration_ms, the fastmcp client_id and request_id, and a summary of
the arguments. Arguments whose names are in the sensitive-name set
({keys, text, value}) are replaced with {len, sha256_prefix} so raw
send_keys payloads, pasted text, and environment values never reach
log files. Non-sensitive strings over 200 chars get truncated.

Registered alongside SafetyMiddleware in server.py so it runs for
every invocation whether or not the safety gate fires. The logger is
namespaced separately so deployments can route audit records to a
dedicated JSON sink without touching the module logger.
pane_tools.py had grown to 1315 lines — a kitchen-sink module
covering I/O, capture, wait/poll, search, copy-mode, paste, resize,
kill, select, swap, pipe, display, and snapshot all in one file.
Large modules in this project are especially costly because each tool
is AI-surface: reviewers need to audit behavior and docstrings
together, and a flat 1.3k-line file makes the diff graph noisy when
any one tool changes.

Convert the module into a ``pane_tools/`` package with one file per
operation domain:

  io.py         send_keys, capture_pane, clear_pane, paste_text
  wait.py       wait_for_text, wait_for_content_change
  search.py     search_panes
  copy_mode.py  enter_copy_mode, exit_copy_mode
  layout.py     resize_pane, select_pane, swap_pane
  lifecycle.py  kill_pane, set_pane_title, get_pane_info
  pipe.py       pipe_pane
  meta.py       display_message, snapshot_pane

``__init__.py`` re-exports every tool by name so
``from libtmux_mcp.tools.pane_tools import send_keys`` still works,
preserving existing test imports and any downstream code that pinned
to the flat namespace. The ``register()`` function is unchanged in
behavior — all tool names, titles, annotations, and safety tags are
identical so MCP clients see no surface change.

No tool bodies were modified; this is a pure move. All 188 existing
tests pass unchanged.
The prior mandate — "All functions and methods MUST have working
doctests" — doesn't match this repo. Most functions here are MCP tool
implementations that require a live tmux server; a doctest for
``send_keys`` or ``kill_pane`` has no way to run offline. That
contradiction produced zero actual doctests in the codebase (the
policy was aspirational rather than enforced).

Rewrite the section to state plainly where doctests fit:

* Required: pure helpers (parsers, formatters, redaction logic, small
  utilities) that run with no external state.
* Exempt: anything touching tmux, subprocess, the filesystem, or env.
  Write a unit test with fixtures instead.

Also soften the ``# doctest: +SKIP`` rule — rather than banning it
outright, point readers at the right alternative (write a unit test)
so the guidance matches actual practice.

This is a policy-only change; pyproject.toml's doctest config will be
updated separately so the infra and the doc don't get bundled.
The prior ``-p no:doctest`` in pyproject.toml's addopts explicitly
disabled pytest's standard doctest plugin, even though AGENTS.md
required doctests. That contradiction is now resolved — the policy
in 7a scopes doctests to pure helpers, and this commit turns on
collection and adds the first two.

* pyproject.toml: swap ``-p no:doctest`` for ``--doctest-modules`` so
  `>>>` blocks in docstrings are actually executed. The pre-existing
  ``--doctest-docutils-modules`` flag (for sphinx-style ``.. doctest::``
  directives) stays in place.
* middleware.py: add doctests to ``_redact_digest`` and
  ``_summarize_args`` — both are pure, deterministic, and the natural
  place for an inline example of what the audit log will contain.

Verified: collector shows two new DoctestItem entries, both pass; the
broader suite goes from 188 to 190 tests with no existing failures.
The safety-tier page existed but didn't call out the two mutating-tier
tools with broader-than-expected reach: pipe_pane can write arbitrary
files, and set_environment mutates state that persists into every
shell tmux later spawns. Both share a safety tier with tools like
rename_window even though their blast radius is much larger.

Additions to docs/topics/safety.md:

* A new "Footguns inside the `mutating` tier" section covering
  pipe_pane (arbitrary filesystem writes), set_environment (scope-wide
  env mutation), and send_keys/paste_text (arbitrary shell execution).
  Each entry states the risk plainly and suggests mitigations
  (readonly tier, scoped home, `env VAR=value` via send_keys).
* An "Audit log" section documenting the libtmux_mcp.audit logger
  added in the AuditMiddleware commit — what fields are recorded, how
  sensitive payloads are redacted to a {len, sha256_prefix} digest,
  and that operators should route the logger to a dedicated sink for
  a durable trail.
* Corrected the self-kill-protection paragraph: it previously claimed
  the guard used only TMUX_PANE. After the socket-aware fix it also
  reads and parses TMUX, and this is now described so the doc matches
  the code.

Docstring changes feed sphinx_autodoc_fastmcp so warnings also surface
in the auto-generated tool reference:

* src/libtmux_mcp/tools/pane_tools/pipe.py::pipe_pane — warns about
  arbitrary filesystem writes and cross-links to /topics/safety.
* src/libtmux_mcp/tools/env_tools.py::set_environment — warns about
  scope-wide env propagation and points at `env VAR=value command` via
  send_keys as the narrower alternative.

Per the design decision: no safety-tier reclassification. pipe_pane
and set_environment remain in the `mutating` tier to avoid breaking
deployments that rely on the default; the warnings make the extra
reach visible so operators can pick `LIBTMUX_SAFETY=readonly` when
appropriate.
The prior patch landed ``t.cast("MiddlewareContext[t.Any]", ...)`` in
the audit-middleware tests to paper over four mypy errors. That's a
lie — ``SimpleNamespace`` isn't a ``MiddlewareContext``, and ``Any``
hides whatever breaks when fastmcp's internal shape shifts.

Investigation of the actual types:

* ``fastmcp.server.middleware.MiddlewareContext`` is a
  ``@dataclass(kw_only=True, frozen=True)`` generic over the message
  type. Only ``message`` is required; everything else has a default.
* ``mcp.types.CallToolRequestParams`` is a pydantic BaseModel with
  required ``name: str`` and optional ``arguments``. Direct
  construction works.
* ``fastmcp.Context`` cannot be built without a live FastMCP server +
  MCP request context. There is no lightweight fake — ``client_id``
  and ``request_id`` are properties that read from the live request
  context.

So: construct the real ``MiddlewareContext(message=CallToolRequestParams(...))``
and leave ``fastmcp_context=None``. The audit middleware already
handles that path (``if context.fastmcp_context is not None:``), so
logs carry ``client_id=None`` / ``request_id=None`` — which is the
exact production behavior when no MCP context is attached. Update the
``test_audit_middleware_logs_success`` assertion accordingly.

The only remaining ``t.Any`` is on ``args: dict[str, t.Any]`` in
``test_summarize_args_redacts_sensitive_keys`` and in the
``_fake_context`` ``arguments`` parameter. Both mirror the upstream
pydantic type ``CallToolRequestParams.arguments: dict[str, Any] | None``
— inherited from an external API, not ours to tighten.

Mypy across the full tree is clean. 190 tests pass.
CI reported 11 files failing ``ruff format --check`` — mostly blank
lines inside multi-line import blocks left over from the earlier
hoist / pane split. Running ``ruff format .`` normalizes them.

Also drop the ``sphinx_autodoc_fastmcp`` entry from the mypy override
block in pyproject.toml. Nothing in src/, tests/, or docs/ imports it
(it appears only as a string in ``conf.py::extensions``), so the
override was dead config. Mypy was flagging it on every run. The
docutils entry stays since mypy still lists it as used when checking
the full tree.
AGENTS.md prescribes top-level namespace imports for standard library
modules. ``paste_text`` was importing ``subprocess`` and ``tempfile``
inside the function body — carried over from the old flat
``pane_tools.py`` through the recent split. An earlier commit
(``c459378``) fixed an identical pattern for ``pathlib`` and cited
the same rule; this closes the remaining two cases in io.py.

No call-site changes: the body already uses namespace form
(``subprocess.run``, ``tempfile.NamedTemporaryFile``,
``subprocess.CalledProcessError``).
tony added 29 commits April 16, 2026 19:07
…ffers

Agents are supposed to call ``delete_buffer`` after use, but an
interrupted call chain can leave ``libtmux_mcp_<uuid>_*`` entries
lingering on the tmux server. The server-global paste-buffer
namespace is shared across clients, so accumulated leaks are
observable by the human user via ``tmux list-buffers``.

Adds a ``_gc_mcp_buffers`` helper that runs on lifespan teardown:
iterates every cached Server, lists buffer names, and deletes
anything matching ``_MCP_BUFFER_PREFIX``. Best-effort and fully
silent — tmux may be unreachable, buffers may vanish mid-scan,
and none of that should block lifespan shutdown. Logs at debug
level so operators can still surface leaks via verbose logging.

Also adds "when to use" cross-references to the ``paste_text``
and ``load_buffer`` docstrings so agents can pick the right tool
from the schema alone instead of having to read the narrative
docs. paste_text is fire-and-forget; load_buffer is
stage-then-fire with a handle for multi-paste or inspection.

Note: FastMCP lifespan teardown runs on SIGTERM / SIGINT only;
``kill -9`` and OOM bypass it, so this GC is an ergonomics
improvement, not a security boundary.
Two guards for the lifespan GC:

* Seeds a real MCP-owned buffer via ``load_buffer`` alongside a
  manually set ``human_buffer`` and asserts only the prefixed one
  is deleted. The human-owned buffer MUST survive — that is where
  the OS clipboard sync lives, and deleting it would be a privacy
  regression.
* A ``_BrokenServer`` stub whose ``cmd`` always raises proves the
  GC swallows exceptions rather than crashing lifespan shutdown.
Every tool already accepts ``socket_name`` for multi-server work
but there was no way to discover which servers actually exist —
agents had to guess names or rely on the caller to supply them.

Scans ``${TMUX_TMPDIR:-/tmp}/tmux-<uid>/`` (the canonical socket
location per tmux.c:209) and returns a ``list[ServerInfo]`` of
live servers. Stale socket inodes — the common case on
long-running machines where ``$TMUX_TMPDIR`` accumulates
thousands of orphans — are filtered via a cheap UNIX ``connect()``
liveness probe before the more expensive ``get_server_info``
call. Local testing: 17k stale inodes plus one live server, 50s
probe-every-socket ➜ 0.4s with connect-first filter.
Two guards:

* The pytest-libtmux fixture socket appears in the discovery
  result with ``is_alive=True`` — proves the live-probe path.
* A nonexistent ``$TMUX_TMPDIR`` returns an empty list without
  raising — proves graceful degradation on a freshly provisioned
  box where the user has never run tmux.
…apes

Mirror of the read-heavy tool shape guards in
``test_server_tools.py`` and ``test_pane_tools.py`` — prompts
and hierarchy resources wire to MCP clients as string bodies
(prompt templates, JSON text, plain text), so a future refactor
that accidentally returns a Pydantic model, dict, or None would
silently break the MCP surface. Parametrized so every recipe and
every hierarchy resource is checked without per-case boilerplate.
…ter screen-grabbing launches

An agent following ``build_dev_workspace`` literally would deadlock:
step 4 launched vim and a long-running log command in separate panes,
then said "wait for the prompt via ``wait_for_text`` before moving on
between each step." vim and ``watch`` / ``tail -f`` take over the
full terminal — they never draw a shell prompt, so the wait blocks
until timeout.

Restructures the recipe around when a prompt IS a meaningful signal
vs. when it is not:

* New step 4: wait for the shell prompt in each pane BEFORE sending
  the launch command. This is a valid readiness check — the new
  panes start in the user's login shell.
* New step 5: send the launch commands.
* New step 6 (optional): use ``wait_for_content_change`` after
  launch to confirm the program drew its UI. Explicitly flagged as
  "strictly different from waiting for a shell prompt" so future
  editors don't revert to the deadlock wording.

The prompt-alignment test from the previous round did not catch this
because it validates parameter NAMES, not runtime semantics.
…mpt waits

Regression test asserts the rendered recipe:

* does NOT contain the stale "wait for the prompt … between each
  step" guidance that would deadlock agents.
* DOES reference ``wait_for_content_change`` for post-launch UI
  confirmation.
* still references ``wait_for_text`` (for the pre-launch shell-
  readiness check, where prompt matching is meaningful).

Complements the existing prompt-alignment test, which only covers
parameter names, by validating runtime guidance semantics.
The docstring claimed ``list_servers`` discovers "every live tmux
server on this machine", but the implementation only scans
``${TMUX_TMPDIR}/tmux-<uid>/``. Custom ``tmux -S /arbitrary/path``
daemons (systemd user units, CI runners, per-project ``.envrc``
paths) live outside that directory and were invisible — the
docstring overclaimed.

Two changes:

* Correct the docstring to say "under the current user's
  ``$TMUX_TMPDIR``" and spell out the custom-path caveat so agents
  do not assume canonical-only sockets exist.
* Add an optional ``extra_socket_paths: list[str] | None`` parameter
  so callers who know about arbitrary paths can supplement the scan.
  Each path runs through the same ``_is_tmux_socket_live`` probe and
  socket-metadata query as canonical entries. Nonexistent or
  non-socket paths are silently skipped so stale user-supplied lists
  don't crash the call.

Factors the server-info construction into a private
``_probe_server_by_path`` helper that mirrors ``get_server_info`` but
keys the Server by socket_path (-S) instead of socket_name (-L), so
the extras path can use an arbitrary filesystem location.
…afety

Two regression guards for the new ``extra_socket_paths`` surface:

* Supplying the fixture socket as an extra path makes it appear
  in the result even when the canonical ``$TMUX_TMPDIR`` scan is
  pointed at an empty directory — proves the extras code path is
  exercised in isolation from the canonical scan.
* A list containing a nonexistent path and a regular file (not a
  socket) returns an empty result without raising — proves stale
  user-supplied entries degrade gracefully.
The L1 fix narrowed the macOS ``$TMUX_TMPDIR`` gap by preferring
``tmux display-message -p '#{socket_path}'`` over env-reconstruction,
but ``server.cmd`` invokes tmux with the MCP process's own
environment. When the MCP process's ``$TMUX_TMPDIR`` diverges from
the real tmux server's (macOS launchd case), the query fails:

* MCP process env: ``TMUX_TMPDIR=/wrong/path``
* Real tmux lives at ``/correct/path/tmux-1000/mysocket``
* Caller's ``$TMUX=/correct/path/tmux-1000/mysocket,…``
* Query invokes ``tmux -L mysocket display-message``; tmux uses
  ``/wrong/path`` and can't find the server; query raises.
* Reconstruction fills in ``/wrong/path/tmux-1000/mysocket``.
* ``_caller_is_on_server`` realpath compares bogus vs. real path:
  different. Guard degrades to no-op.

Adds a conservative basename fallback after the realpath check: if
the basename of the caller's socket path equals the target's declared
``socket_name`` (or ``"default"``), the guard still fires. Both names
are authoritative on their respective sides and neither is subject to
``$TMUX_TMPDIR`` divergence, so this is a safety-preserving
last-chance match.

Trade-off acknowledged in the docstring: exotic false positive when
two daemons share a socket_name under different tmpdirs. Prefer that
over silently permitting self-kill under env mismatch.
…gence

Regression guard for the macOS launchd scenario: MCP process has a
wrong ``$TMUX_TMPDIR``, display-message query fails, reconstruction
produces a path that mismatches the caller's ``$TMUX`` realpath, but
the basename (socket name) still matches. The guard must still fire.

The test forces:

* ``$TMUX_TMPDIR`` on the MCP process points at a nonexistent dir.
* Monkeypatches ``server.cmd`` so the display-message query raises.
* Sets ``$TMUX`` to a path in a "real" tmpdir whose basename is the
  target's socket_name.

Asserts ``_caller_is_on_server`` returns True — proving the
basename fallback closes the env-mismatch gap without requiring the
realpath comparison to succeed.
…r from build_dev_workspace

Two residual ergonomic problems in the recipe that F1's rewrite did
not catch, both confirmed by reading the underlying source:

**Fancy-shell hang.** Step 4 emitted a literal
``wait_for_text(pattern=r"\$ |\# |\% ", regex=True, timeout=5.0)``.
The regex only matches default bash/zsh prompts; zsh+oh-my-zsh
(``➜``), starship (``❯``), pure / p10k (Unicode glyphs), and fish
(``>``) all miss. Agents on these setups burn 5s × 3 panes = 15s
of ceremony every time the recipe runs.

Reading tmux source confirms the wait was unnecessary defensive
ceremony: ``cmd-split-window.c:151`` → ``spawn.c:382,494`` allocate
the PTY and wire the bufferevent before split-window returns.
``input-keys.c:418`` writes send-keys bytes straight into the
kernel PTY buffer via ``bufferevent_write`` with no readiness
check. The kernel holds bytes until the slave reads. A
``send_keys("vim")`` right after ``split_window`` is safe whether
or not the shell has finished drawing its prompt — the shell will
pick up the queued bytes on its first stdin read.

**Stray Enter in pane B.** Step 5 emitted
``send_keys(pane_id="%B", keys="")`` with a comment
"(leave the shell idle)". libtmux's ``Pane.send_keys``
(``pane.py:423-475``) runs ``send-keys ""`` *then* ``self.enter()``
which is ``send-keys Enter``. libtmux's own test
(``tests/test_pane.py:128``) confirms an Enter is sent. The shell
was already idle after the split; the line did nothing useful
while actively misleading anyone reading it.

Net change: ~15 LOC removed, ~3 LOC added. Recipe body is shorter,
shell-agnostic, and honest about what it does. The step-6
``wait_for_content_change`` block is kept (renumbered to step 5) —
it is still the right primitive for post-launch UI confirmation.

Also updates the existing
``test_build_dev_workspace_does_not_deadlock_on_screen_grabbers``
test to drop the now-obsolete ``wait_for_text`` assertion; a
targeted regression guard for the two removed lines follows in a
separate test commit.
…regex or stray Enter

Targeted regression guard for the two residual ergonomic issues
fixed in the preceding commit:

* ``\$ |\# |\% `` regex must be absent — hangs starship / oh-my-zsh
  / pure / p10k / fish users for 5s × 3 panes = 15s of ceremony
  with no benefit.
* ``send_keys(pane_id="%B", keys="")`` must be absent — libtmux
  emits an Enter keystroke when ``enter=True`` (the default), so
  the line lied about being a no-op.

Positive pin on ``wait_for_content_change`` so the post-launch UI
confirmation guidance stays in the recipe; it's a shell-agnostic
"did the screen change since we captured it?" check that works
across every shell, editor, and tailing program combination.

Follows the established ``in``/``not in`` assertion style from
the neighbouring
``test_build_dev_workspace_does_not_deadlock_on_screen_grabbers``
test.
``get_server_info`` swallowed any ``display-message #{version}``
failure with a bare ``except Exception: pass`` — pre-existing from
the initial release, not from this branch. The sibling helper
``_probe_server_by_path`` in the same file now logs the analogous
failure at debug level, so the two were drifting in style.

Adds a single ``logger.debug`` so operators debugging "why does my
tmux server report version=None?" get a uniform signal across both
code paths. Semantics are unchanged (version still falls back to
None on failure, callers never see the exception).
…_session

The ``del mcp_session`` pattern (12 sites across four test files)
signalled "I need this fixture's side effect but not its value."
That is the textbook use case for ``@pytest.mark.usefixtures``,
which pytest's own test suite uses 16+ times and recommends as the
canonical form — cross-checked in ~/study/python/pytest/testing/
and in upstream pytest docs. libtmux and fastmcp also follow the
decorator form; libtmux-mcp was an outlier.

Each converted test now:

* Declares the required fixture via a decorator (explicit intent
  at function definition).
* Drops the unused ``mcp_session: Session`` parameter.
* Drops the ``del mcp_session`` ceremony from the body.

Three files also drop the now-unused ``Session`` import from their
``TYPE_CHECKING`` block (``test_buffer_tools.py``,
``test_server.py``, ``test_wait_for_tools.py``).
``test_server_tools.py`` keeps the import — ``test_list_sessions``
genuinely uses ``mcp_session``.

Behaviour unchanged; this is pure readability + idiom cleanup.
…t loops

Regression guard for MCP cancellation semantics. The two wait tools
share a ``while True:`` poll-and-sleep pattern wrapped by
``handle_tool_errors_async`` (``_utils.py:827-850``), which catches
``Exception`` (not ``BaseException``). ``asyncio.CancelledError``
inherits from ``BaseException`` (Python 3.8+), so it propagates
through the decorator today.

This test locks that in: if a future change broadens the decorator
to ``BaseException`` it would silently swallow MCP cancellation
requests, agents would burn tokens waiting for cancelled tools to
return, and operators would have no signal that anything went wrong.

Uses ``task.cancel()`` rather than ``asyncio.wait_for`` so the
raised exception is the inner ``CancelledError`` directly, not
``wait_for``'s ``TimeoutError`` wrapper. ``wait_for_text`` and
``wait_for_content_change`` get sibling tests since both go through
the same decorator path.
…elled

MCP cancellation works correctly today — ``CancelledError`` is a
``BaseException`` (Python 3.8+) and ``handle_tool_errors_async``
catches ``Exception``, so cancellation propagates untouched. But
operators have no signal that a wait was cancelled vs. timed out —
the two outcomes look identical at the log layer.

Wraps each ``while True:`` poll loop in ``wait_for_text`` and
``wait_for_content_change`` with an explicit
``except asyncio.CancelledError: logger.debug(...); raise``. Three
benefits:

* Documents intent — readers see that cancellation is handled
  intentionally, not by accident of inheritance.
* Records the cancel-time elapsed seconds and pane id at debug
  level so post-mortem log analysis can distinguish "cancelled at
  150 ms" from "timed out at 8 s".
* Re-raise preserves MCP semantics — never returns a partial
  ``WaitForTextResult`` / ``ContentChangeResult`` that would mask
  the cancellation as a timed-out wait.

The per-pair regression test added in the preceding commit pins
``CancelledError`` propagation; this commit only adds diagnostic
visibility.
…t timeouts

Adopts FastMCP's structured client logging
(``ctx.info/warning/error`` at ``fastmcp/server/context.py:716-778``)
so MCP clients receive ``notifications/message`` events for the two
diagnostic moments operators currently can't see:

* Invalid regex pattern compiled in ``wait_for_text`` — previously
  the agent saw only a generic ``ToolError``; now the client also
  receives a structured warning, which surfaces in client log
  panels independent of the tool result.
* Wait loop reaches its timeout in either ``wait_for_text`` or
  ``wait_for_content_change`` — previously identical to a "found"
  result at the log layer (only the boolean ``timed_out`` field
  distinguished them); now the client gets an explicit warning
  with pane id and timeout duration.

Adds a ``_maybe_log`` helper sibling to ``_maybe_report_progress``
sharing the same ``_TRANSPORT_CLOSED_EXCEPTIONS`` suppression
contract so a hung-up client doesn't take down the tool, while
programming errors stay loud per the regression guard
``test_wait_for_text_propagates_unexpected_progress_error``.

Updates the existing ``_StubContext`` and ``_BrokenContext`` test
stubs in ``tests/test_pane_tools.py`` to declare ``warning``
methods — the stubs need to mock the full Context surface the wait
loop now touches. ``_BrokenContext.warning`` raises the same
``anyio.BrokenResourceError`` as its ``report_progress`` so the
existing transport-closed regression guard also covers the new log
path.

Per-pair regression test in the next commit asserts the warnings
fire at both sites with the right level.
… and timeout

Three regression guards for the new ``_maybe_log`` call sites added
in the preceding commit. Each builds a tiny ``_RecordingContext``
that captures ``(level, message)`` tuples — the same recording-stub
style used by the existing progress tests, kept inline per test so
each assertion stays self-contained:

* ``test_wait_for_text_warns_on_invalid_regex`` — ``regex=True`` with
  an unclosed bracket raises ``ToolError`` AND records a warning
  containing "Invalid regex".
* ``test_wait_for_text_warns_on_timeout`` — never-matching pattern
  with a 200 ms timeout records a warning containing "timeout".
* ``test_wait_for_content_change_warns_on_timeout`` — settled pane
  with a 500 ms timeout records the equivalent warning. Reuses the
  active settle-loop pattern from
  ``test_wait_for_content_change_timeout`` so the test stays
  deterministic on slow CI rather than relying on a fixed sleep.

The tests assert on substring presence rather than exact message
text so the warning copy can evolve without forcing a test update.
…AG_READONLY

Adopts FastMCP's ``RetryMiddleware``
(``fastmcp/server/middleware/error_handling.py:136``) but bounded
by the existing safety-tier annotation. The ``_server_cache``
already evicts dead Server instances on ``is_alive() == False``,
but the *first* call after a tmux daemon crash still fails before
the cache learns about it. A single retry on the freshly-cached
Server closes that window without changing tool semantics.

Critical safety constraint: only readonly tools are retried.
Mutating tools like ``send_keys``, ``create_session``,
``split_window`` are NOT idempotent — re-running them on a
transient socket error would silently double side effects, which
is unacceptable. The wrapper ``ReadonlyRetryMiddleware`` reads
``tool.tags`` via ``context.fastmcp_context.fastmcp.get_tool(name)``
(same pattern ``SafetyMiddleware`` uses) and only delegates to
the upstream retry when ``TAG_READONLY`` is present.

Defaults are deliberately small: ``max_retries=1`` keeps audit
log noise minimal (one extra attempt, not three); ``base_delay
0.1s`` / ``max_delay 1.0s`` matches the duration window of a
transient libtmux socket hiccup. ``retry_exceptions=
(libtmux.exc.LibTmuxException,)`` is narrow on purpose — the
upstream default ``(ConnectionError, TimeoutError)`` does NOT
match libtmux's wrapped subprocess errors and would be a silent
no-op.

Wired into the middleware stack between ``AuditMiddleware`` and
``SafetyMiddleware`` so retries are audited once each and
tier-denied tools never reach retry. Updates the existing
``test_server_middleware_stack_order`` to lock in the new
position.

Per-pair regression test in the next commit asserts: readonly
tool retries on ``LibTmuxException``, mutating tool does not
retry, non-libtmux exception does not retry.
…y tools only

Three regression guards for the new safety-scoped retry wrapper:

* ``test_readonly_retry_recovers_from_libtmux_exception`` — readonly
  tool raises ``LibTmuxException`` once then succeeds; assert the
  middleware retried (call count 2) and returned the success.
  Models the production scenario: a transient socket error on the
  first call after a tmux daemon restart, recovered by the retry
  on the freshly-cached Server.
* ``test_readonly_retry_skips_mutating_tool`` — mutating tool with
  the same exception; assert NO retry (call count 1) and the
  exception propagates. This is the critical safety property —
  re-running ``send_keys`` or ``create_session`` on a transient
  error would silently double the side effect.
* ``test_readonly_retry_skips_non_libtmux_exception`` — readonly
  tool raises ``ValueError``; assert NO retry. Pins the narrow
  ``retry_exceptions=(LibTmuxException,)`` default so a future
  refactor that broadens the trigger set would fail loudly.

Tests use minimal ``_StubTool`` / ``_StubFastMCP`` /
``_StubFastMCPContext`` stand-ins to satisfy the
``context.fastmcp_context.fastmcp.get_tool(name)`` lookup the
middleware uses for tag inspection, plus a ``_FlakyCallNext``
helper that raises N times before succeeding.
… fix CI race

CI failure on `8dc7c81`:
``tests/test_pane_tools.py::test_wait_for_content_change_propagates_cancellation``
``Failed: DID NOT RAISE <class 'asyncio.exceptions.CancelledError'>``
across all six tmux versions (3.2a, 3.3a, 3.4, 3.5, 3.6, master).

Root cause: ``wait_for_content_change`` exits the poll loop when
``current != initial_content``. On the GitHub Actions Ubuntu runner
the shell prompt redraws (cursor blink, zsh async hooks like
vcs_info) within the test's 100 ms ``asyncio.sleep`` window, so
``changed=True`` fires and the tool returns a normal
``ContentChangeResult`` before the ``task.cancel()`` arrives. Local
runs on an idle pane never tripped this — classic CI-only race.

Fix: monkeypatch ``Pane.capture_pane`` to always return the same
line list. The diff branch can never trigger, so the loop stays in
``await asyncio.sleep(interval)`` indefinitely and the cancel
unambiguously interrupts it. The test still exercises the production
code path that matters — that ``CancelledError`` propagates through
``handle_tool_errors_async`` without being mapped to ``ToolError``.
The shipped ``ReadonlyRetryMiddleware`` (commit ``bd37f29``) is a
production no-op under fastmcp 3.2.3:

- ``RetryMiddleware._should_retry`` only checks ``isinstance(error,
  retry_exceptions)`` — does NOT walk ``__cause__``.
- ``handle_tool_errors_async`` (``_utils.py:842-850``) wraps every
  ``LibTmuxException`` as ``ToolError(...) from LibTmuxException``.
- At the middleware layer the exception type is ``ToolError``, not
  ``LibTmuxException``, so ``_should_retry`` returns ``False`` and
  the retry never fires in production.

The shipped tests in ``tests/test_middleware.py:475+`` only pass
because ``_FlakyCallNext`` raises ``LibTmuxException`` directly,
bypassing the production decorator wrap path.

fastmcp v3.2.4 commit ``031c7e03`` (PR #3858 "Fix RetryMiddleware
not retrying tool errors") adds the ``__cause__`` walk to
``_should_retry``. Reproduced the production-flow bug under 3.2.3
with a 25-line harness (calls=1) and verified the fix on 3.2.4
(calls=2 with max_retries=1).

Free win that ships in the same bump: PR #3872 (commit
``f3c00ba1`` "Extract parameter descriptions from docstrings")
adds an automatic NumPy/Google/Sphinx docstring parser that
populates per-parameter ``inputSchema.description`` fields. The
project's tools already use NumPy ``Parameters\n----`` style
throughout (verified ``wait.py``, ``_utils.py``, ``middleware.py``),
so per-parameter descriptions become visible to LLM clients with
zero code changes.

Next commit adds an integration-style regression test that calls a
real ``@handle_tool_errors``-decorated tool through the middleware
stack — catches the production no-op AND any future regression
where someone changes the decorator's exception wrapping.
… decorator

Integration-style regression guard for the production code path the
existing unit tests miss. The three sibling tests above use
``_FlakyCallNext`` to raise ``LibTmuxException`` directly; in
production every tool is wrapped by ``handle_tool_errors`` /
``handle_tool_errors_async`` which converts ``LibTmuxException`` to
``ToolError(...) from LibTmuxException``. At the middleware layer
the exception type is ``ToolError``, not ``LibTmuxException`` — so
the retry decision must walk ``__cause__`` to see the real failure
type.

The new test invokes the real ``list_sessions`` tool through
``ReadonlyRetryMiddleware.on_call_tool``, exercising the decorator
wrap path that broke under fastmcp 3.2.3. Monkeypatches
``Server.sessions`` to raise ``LibTmuxException`` once then return
``[]`` so the test runs without a live tmux. Asserts ``calls == 2``
with a friendly error message naming the likely cause (fastmcp
older than 3.2.4) so an accidental dep downgrade fails loudly.

Catches both: (a) the production no-op the previous fastmcp pin
allowed, and (b) any future regression where someone changes the
decorator's exception wrapping in a way that defeats ``__cause__``
walking.
…tream consistency

All three round-2 weave:review reviewers flagged that
``ReadonlyRetryMiddleware``'s internal logger defaulted to
``fastmcp.retry`` (the upstream stock at ``error_handling.py:181``)
instead of the project's ``libtmux_mcp.*`` namespace. Operators
routing the audit stream by namespace prefix would silently miss
retry events.

Set the default explicitly: when ``logger_`` is ``None``, construct
``logging.getLogger("libtmux_mcp.retry")`` before delegating to
fastmcp's ``RetryMiddleware``. The ``if logger_ is None:`` pattern
avoids evaluating ``getLogger`` in the default expression.

Pinned in tests via ``test_readonly_retry_logger_uses_project_namespace``
asserting ``middleware._retry.logger.name == "libtmux_mcp.retry"``,
so a future refactor that drops the default fails loudly.
The new ``list_servers`` tool (added earlier in the branch) wasn't
documented anywhere under ``docs/`` — agents and human readers
discovering the tool from a client's tool list had no docs page to
land on. The conf.py area-map already routes ``server_tools`` →
``sessions``, so the natural home is ``docs/tools/sessions.md``
between ``get_server_info`` and ``create_session``.

Adds the standard four-part section: ``{fastmcp-tool}`` directive,
Use-when / Avoid-when / Side-effects narrative, an example JSON
request + response (including a second example with
``extra_socket_paths`` for the custom-path case), and the
``{fastmcp-tool-input}`` schema directive. Calls out the scope
caveat (custom ``tmux -S /path/...`` daemons are not in the
canonical scan) up front so readers don't reach for the tool in
the wrong situation.

Also adds a grid-item-card to ``docs/tools/index.md`` so the
discovery overview links to the new section.
The four prompts shipped this branch (``run_and_wait``,
``diagnose_failing_pane``, ``build_dev_workspace``,
``interrupt_gracefully``) had zero docs coverage — no
``{fastmcp-prompt}`` autodoc directive (the
sphinx-autodoc-fastmcp extension only ships ``{fastmcp-tool}``,
``{fastmcp-tool-input}``, ``{fastmcp-tool-summary}``; verified
by reading the installed extension's ``__init__.py``), and
``docs/topics/prompting.md`` covers general agent-prompting
heuristics rather than the project's MCP-registered prompts.

Adds ``docs/tools/prompts.md`` — a hand-written page (since
autodoc isn't available for prompts) with one section per recipe
covering: when to use it, why it exists vs. the obvious
alternative tool calls, parameters, and a sample render at one
representative argument set. The samples use stable
representative output (``<uuid>`` placeholder for the wait-for
channel) so the page doesn't churn on every regeneration.

Wires the new page into the docs toctree under "Use it" between
``tools/index`` and ``recipes`` (the latter is the longer
narrative agent-reasoning page, conceptually distinct from these
short MCP-protocol prompts). Adds a one-paragraph cross-reference
from ``docs/tools/index.md``'s decision tree so readers
discovering the tool surface find the prompts naturally.
…he new pane id

The ``create_session`` response gained an ``active_pane_id`` field
earlier in this branch (commit ``3f092bb``), but the docs still
showed the pre-change response shape — readers landing on
``sessions.md`` had no signal that this field exists or what
they'd do with it.

Updates the example response to include the field, then adds a
brief tip explaining the workflow win: callers can target
subsequent ``send_keys`` / ``split_window`` / ``capture_pane``
calls against ``active_pane_id`` directly without a follow-up
``list_panes`` round-trip. Saves one MCP call in the most common
"create a session, then act on it" workflow that prompts like
``build_dev_workspace`` already exploit.

The full field surface is auto-rendered in
``docs/reference/api/models.md`` via the ``fastmcp_model_classes``
registration — this commit just adds the narrative discoverability
that explains why the field exists.
@tony tony force-pushed the 2026-04-follow-ups branch from bc4c2ed to 8c76b87 Compare April 17, 2026 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants