Follow-ups from code review + FastMCP alignment (Phase 2) by tony · Pull Request #15 · tmux-python/libtmux-mcp

tony · 2026-04-13T23:37:23Z

Summary

29 commits on 2026-04-follow-ups, addressed in two waves:

Phase 1 (commits 1–12) — original code-review follow-ups: caller identity, observability, pane-tools split, doctest scoping, safety docs, plus four small follow-up cleanups.
Phase 2 (commits 13–29) — FastMCP-alignment work derived from a multi-model brainstorm-and-refine analysis. Each commit is one coherent change; quality gate (ruff, mypy, pytest --reruns 0, just build-docs) ran green before each commit landed. Test count grew from 186 → 276 (+90 regression-guard tests).

Quality gate

$ uv run ruff check . --fix --show-fixes
$ uv run ruff format .
$ uv run mypy
$ uv run py.test --reruns 0 -vvv
$ just build-docs

--reruns 0 (overriding the project default of 2) was the standard for the entire 29-commit series, so flakes are surfaced not masked. The one timing-sensitive test that surfaced (test_wait_for_content_change_timeout) was stabilized in commit 13 with an active settle loop before any new feature work landed.

Phase 1 — original review follow-ups

#	Commit	Change
1	`fix(caller-identity)`	Parse `TMUX` env var; self-kill guards now compare the caller's socket path to the target server
2	`refactor(imports)`	Hoisted 16 inline `from fastmcp.exceptions import ToolError` to module level
3	`feat(observability)`	`AuditMiddleware` — one structured record per tool call, with redacted digests for `keys` / `text` / `value`
4	`refactor(pane)`	Split 1315-line `pane_tools.py` into a `pane_tools/` package (io, wait, search, copy_mode, layout, lifecycle, pipe, meta). Backwards-compatible re-exports
5	`docs(agents)`	Scoped doctest policy to pure helpers; tmux-touching tools exempt
6	`test(doctest)`	Removed `-p no:doctest`, added `--doctest-modules`, seeded doctests on `_redact_digest` and `_summarize_args`
7	`docs(safety)`	Footguns section for `pipe_pane` / `set_environment` / `send_keys`; documented the audit log; docstring warnings surface via `sphinx_autodoc_fastmcp`
8	`test(middleware)`	Use real `MiddlewareContext` + `CallToolRequestParams`
9	`style`	Apply ruff format and drop dead mypy override
10	`refactor(pane_tools[io])`	Hoist subprocess and tempfile imports
11	`refactor(pane_tools[wait])`	Hoist time and retry_until imports
12	`docs(_utils)`	Align `_caller_is_on_server` docstring with code

Phase 2 — FastMCP alignment

Driven by a multi-model brainstorm session (Claude × 3 / Gemini × 3 / GPT × 3, three refinement passes) that audited libtmux-mcp against FastMCP, libtmux, and tmux. The session produced a 17-commit plan with explicit priority tiers (P1–P3) and four "deliberately not recommended" items.

Quality gate precursor

test(pane_tools[wait]): stabilize test_wait_for_content_change_timeout under --reruns=0
Active 3-streak settle loop replaces a fixed sleep so zsh async hooks (vcs_info, git prompt) don't trip the no-change assertion. Lets the rest of the series run under the strict gate without flake masking.

PR group A — Preconditions (correctness fixes)

mcp(fix[_utils,annotations]): ANNOTATIONS_SHELL for open-world shell tools
send_keys, paste_text, pipe_pane were sharing ANNOTATIONS_CREATE which set openWorldHint=False. Lying to MCP clients. Fix is a new preset, not a global flip — same shared dict is also used by create_session/create_window/split_window/swap_pane/enter_copy_mode, none of which are open-world.
mcp(docs[server,instructions]): surface display_message, snapshot_pane, wait_for_* in _BASE_INSTRUCTIONS
The most token-efficient read tools were never named in the agent-facing instructions, so agents defaulted to capture_pane retry loops.
mcp(fix[pane_tools/io]): bound capture_pane with max_lines and in-band tail-preserving truncation
Default max_lines=500. Prefix "[... truncated K lines ...]\n" when trimmed. max_lines=None opts out for callers that genuinely need full scrollback. Tail-preserving because the active prompt lives at the bottom.
mcp(fix[pane_tools/meta,models]): bound snapshot_pane with max_lines and typed content_truncated fields
Same semantics as docs: restore prompt admonition styling and rebuild copy buttons after gp-sphinx SPA swap #16 but surfaced via two new PaneSnapshot fields (content_truncated: bool, content_truncated_lines: int) rather than an in-band header — PaneSnapshot is already a Pydantic model so the typed channel is the cleaner shape.
mcp(fix[pane_tools/search,models]): paginate search_panes via SearchPanesResult wrapper
Breaking change: list[PaneContentMatch] → SearchPanesResult with matches, truncated, truncated_panes, total_panes_matched, offset, limit. Per-pane cap on matched_lines plus pane-level pagination so truncation doesn't mean permanent blindness to later matches.

PR group B — Context + wait-for

mcp(refactor[_utils]): add handle_tool_errors_async for Context-using tools
Prerequisite for #20. Sync handle_tool_errors is unchanged; async variant funnels through the same exception-to-ToolError mapping table.
mcp(feat[pane_tools/wait]): Context.report_progress in wait_for_text and wait_for_content_change
Both wait tools converted to async def with a ctx: Context | None parameter. Replaces retry_until with an explicit asyncio loop that emits progress on every tick. No elicit — FastMCP 3.1 has no first-class elicitation capability check (capabilities.sampling is the wrong predicate; it would hang headless CI agents).
mcp(feat[wait_for_tools]): tmux wait-for channel tools with bounded timeout
New wait_for_channel(channel, timeout=30.0) and signal_channel(channel). subprocess.run(timeout=…) is mandatory — tmux wait-for blocks indefinitely at the OS level. Tool docstring teaches the safe status=$?; tmux wait-for -S done; exit $status composition pattern so a crashed shell command doesn't pay the full timeout penalty.

PR group C — Structured outputs + resources

mcp(test[tools]): regression-guard read-heavy tools against str-return drift
Audit found get_pane_info / get_server_info / list_* / snapshot_pane already return Pydantic models. New regression tests pin that so a future refactor can't silently drop the MCP outputSchema by flattening one back to str.
mcp(refactor[resources/hierarchy]): declare mime_type on all hierarchy resources
Five JSON resources get mime_type="application/json"; get_pane_content gets mime_type="text/plain". (FastMCP requires resources to return str | bytes | list[ResourceContent], so the json.dumps payload stays — but clients now see the correct MIME tag.)

PR group D — New tool families

mcp(feat[buffer_tools]): agent-namespaced paste buffers with UUID nonce
load_buffer / paste_buffer / show_buffer / delete_buffer. Every call allocates libtmux_mcp_<uuid4hex>_<logical> so concurrent agents (or parallel tool calls) cannot collide on the server-global buffer namespace. list_buffers is intentionally NOT exposed — OS clipboard sync into tmux buffers makes a generic enumeration a privacy footgun. _SENSITIVE_ARG_NAMES extended for content.
mcp(feat[hook_tools]): read-only show_hook and show_hooks
Wraps libtmux's HooksMixin.show_hook/show_hooks with a HookListResult model that flattens scalar / dict / SparseArray shapes. Write-hooks (set_hook/unset_hook) are deferred indefinitely — FastMCP lifespan teardown does not run on kill -9/OOM/C-extension fault, so any soft cleanup registry would leak agent-installed shell hooks into the user's persistent tmux server. The module docstring documents this and names three future paths.
mcp(feat[prompts]): narrow prompts module with PromptsAsTools fallback
Four prompts (run_and_wait, diagnose_failing_pane, build_dev_workspace, interrupt_gracefully) — the canonical recipes from docs/topics/prompting.md, encoded as protocol-level prompts. LIBTMUX_MCP_PROMPTS_AS_TOOLS=1 env-gates the PromptsAsTools transform for tool-only clients.

PR group E — Middleware + lifespan

mcp(feat[middleware]): TailPreservingResponseLimitingMiddleware subclass
FastMCP's stock ResponseLimitingMiddleware truncates the tail of oversized responses, which destroys the active shell prompt for terminal scrollback. Subclassed to drop the head and prefix the same [... truncated K bytes ...]\n header the per-tool caps use. max_size=50_000 (strictly above per-tool caps so it's a backstop, not a primary cap), scoped to capture_pane / search_panes / snapshot_pane.
mcp(feat[middleware,server]): add TimingMiddleware and ErrorHandlingMiddleware
Stack order pinned by test: Timing → TailPreservingResponseLimiting → ErrorHandling(transform_errors=True) → Safety → Audit. ErrorHandlingMiddleware(transform_errors=True) maps ResourceError from resources/hierarchy.py to MCP code -32002.
mcp(feat[server,_utils]): FastMCP lifespan with tmux presence check and cache cleanup
shutil.which("tmux") probe at startup converts a missing-tmux failure from a confusing post-handshake TmuxCommandNotFound into a clean cold-start RuntimeError. Clears _server_cache on graceful shutdown (SIGTERM/SIGINT only — by design, not relied on for any invariant that must survive hard crashes; see commit 25's docstring).

Deliberately NOT in this PR

Each was considered in the brainstorm and explicitly rejected with reasoning documented in the relevant module's docstring:

Write-hooks (set_hook/unset_hook). Three future paths documented in hook_tools.py module docstring; none in scope here.
Control mode (tmux -C event bridge). Per-byte %output events across all panes would firehose Python's asyncio loop and starve MCP protocol handlers. If pursued later, must start from refresh-client -B per-subscription with bounded queue + per-pane rate limit, not blanket attach.
ctx.elicit for destructive-tool confirmation. No reliable client-capability check exists in FastMCP 3.1; capabilities.sampling is the wrong predicate (it describes server→client LLM requests, not human UI). Tier gate + self-kill guards remain the hard boundary.
Generic list_buffers. OS clipboard sync makes it a privacy footgun.
ResponseCachingMiddleware, RateLimitingMiddleware, ctx.sample, HTTP transport, MCPB bundling. Architectural mis-fits; rejected in the brainstorm.

Diff size

40 files changed, 4870 insertions(+), 1438 deletions(-)
276 tests pass under --reruns 0

Test plan

Notes for reviewers

Phase 1 review remains unchanged; the original Summary section above (now folded into the table) covers commits 1–7 and the four follow-up cleanups (8–12).
Phase 2 was driven by a written plan at /home/d/.claude/plans/memoized-pondering-sparkle.md (local artifact, not committed). Each commit body cites the brainstorm finding it acts on.
Natural cut points if the PR needs to be split for review: A (commits 13–18), B (19–21), C (22–23), D (24–26), E (27–29). Every commit independently passes the quality gate, so reverts inside any group are clean.

codecov-commenter · 2026-04-13T23:54:24Z

Codecov Report

❌ Patch coverage is 88.77551% with 99 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.18%. Comparing base (65e9ad4) to head (8c76b87).

Files with missing lines	Patch %	Lines
src/libtmux_mcp/_utils.py	84.34%	11 Missing and 7 partials ⚠️
src/libtmux_mcp/tools/server_tools.py	75.38%	15 Missing and 1 partial ⚠️
src/libtmux_mcp/tools/hook_tools.py	81.35%	8 Missing and 3 partials ⚠️
src/libtmux_mcp/tools/wait_for_tools.py	72.50%	11 Missing ⚠️
src/libtmux_mcp/tools/buffer_tools.py	89.01%	8 Missing and 2 partials ⚠️
src/libtmux_mcp/tools/pane_tools/io.py	86.20%	7 Missing and 1 partial ⚠️
src/libtmux_mcp/tools/pane_tools/layout.py	85.71%	4 Missing and 4 partials ⚠️
src/libtmux_mcp/middleware.py	90.47%	3 Missing and 3 partials ⚠️
src/libtmux_mcp/tools/pane_tools/lifecycle.py	86.36%	2 Missing and 1 partial ⚠️
src/libtmux_mcp/server.py	92.59%	2 Missing ⚠️
... and 3 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #15      +/-   ##
==========================================
+ Coverage   82.90%   86.18%   +3.27%     
==========================================
  Files          17       30      +13     
  Lines         983     1513     +530     
  Branches      110      182      +72     
==========================================
+ Hits          815     1304     +489     
- Misses        122      155      +33     
- Partials       46       54       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tony · 2026-04-14T00:31:03Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

…thread around capture_pane why: In commit 67c3fc8 the wait tools became `async def` so that ctx.report_progress could be awaited during long polls. FastMCP direct-awaits async tools on the main event loop (`fastmcp/tools/function_tool.py:277-278`) — it only uses a threadpool for *sync* tools via `call_sync_fn_in_threadpool`. That means every blocking `pane.capture_pane()` subprocess.run inside the poll loop starves every other concurrent MCP request for the duration of the tmux roundtrip (~10s of ms per tick), for the full configured timeout. A 60-second `wait_for_text` call would pin the entire server for 60 seconds, delaying list_sessions / capture_pane / anything else. Loom code review surfaced this with 100% confidence (Gemini Skeptic pass on PR #15); verified against FastMCP source. what: - Wrap the three blocking `pane.capture_pane(...)` calls in src/libtmux_mcp/tools/pane_tools/wait.py with `asyncio.to_thread`: * `wait_for_text` polling call. * `wait_for_content_change` initial snapshot. * `wait_for_content_change` polling call. - Inline comment at the first site explains why (future maintainers will otherwise see "sync call inside async fn" as a bug and try to "fix" it by removing `async def`, reintroducing the blocking path). - No docstring changes — the `async def` signature already promises non-blocking semantics; this commit actually delivers on it. - Add `test_wait_tools_do_not_block_event_loop` in tests/test_pane_tools.py that runs `wait_for_text` against a never-matching pattern inside an asyncio.gather with a ticker coroutine. Asserts the ticker counter >= 5 over the 300 ms wait window — a blocking capture would leave it at 0 or 1. Deliberately a generous lower bound to stay robust on slow CI.

…_wait why: The ``run_and_wait`` prompt hardcoded ``mcp_done`` as the tmux wait-for channel name. tmux channels are server-global, so two concurrent agents (or parallel prompt renderings from one agent) would race: one agent's ``tmux wait-for -S mcp_done`` would unblock another's pending ``wait_for_channel("mcp_done")``, producing a false positive completion signal. Flagged as Critical by GPT Builder in the Loom review of PR #15. what: - Import ``uuid`` in src/libtmux_mcp/prompts/recipes.py. - Generate a fresh channel name at each call: ``libtmux_mcp_wait_<uuid4hex[:8]>``. * Prefix aligns with the buffer-tools namespace (libtmux_mcp_) so a future ``list_buffers(prefix="libtmux_mcp_")``-style operator sees every MCP-owned tmux artifact uniformly. * 8 hex characters (~32 bits) is safe against collision inside a single tmux server's concurrent-agent population. - Interpolate the same channel name into both the send_keys payload and the wait_for_channel call so one prompt rendering remains internally consistent. - Update the prompt docstring to note the per-invocation scope. - tests/test_prompts.py: * Update ``test_run_and_wait_returns_string_template`` to pin the new prefix. * Add ``test_run_and_wait_channel_is_uuid_scoped`` asserting (a) channel matches ``libtmux_mcp_wait_[0-9a-f]{8}``, (b) send_keys and wait_for_channel use the same name within one rendering, (c) two separate renderings emit different channel names.

why: ``show_hook`` swallowed three tmux error substrings into an empty result: ``"too many arguments"`` (correctly — that's how tmux reports an unset hook), but also ``"unknown hook"`` and ``"invalid option"``. The latter two fire for *typos* and *wrong-scope* mistakes, not "hook is unset". Agents sending ``show_hook("pane-esited")`` (typo) or ``show_hook("pane-exited", scope="server")`` (wrong scope) got back an empty list indistinguishable from a correctly-named but unset hook — masking the real input error. Flagged as Important by both Gemini (Skeptic) and GPT (Builder) in the Loom review of PR #15. This commit also lands the upstream TODO comment for Suggestion #15 (libtmux's scope-kwarg argv bug). what: - Narrow the ``except libtmux_exc.OptionError`` branch in src/libtmux_mcp/tools/hook_tools.py::show_hook to only match ``"too many arguments"`` — the single substring all tmux builds supported by this project use for an unset hook. Any other OptionError (``"unknown hook"``, ``"invalid option"``, new future substrings) re-raises so ``handle_tool_errors`` surfaces it. - Replace the block's comment with the reasoning so a future maintainer knows not to re-broaden the catch. - Add a TODO(libtmux upstream) comment in ``_resolve_hook_target`` explaining why the scope-nulling workaround exists. The fix belongs in libtmux's argv-assembly path, not here. - tests/test_hook_tools.py: * Remove ``test_show_hook_missing_returns_empty`` (its use of ``"after-nonexistent-hook-cxyz"`` was actually exercising the now-removed broad catch). * Add ``test_show_hook_unset_known_hook_returns_empty`` that sets and then unsets a real hook (``pane-exited``) before calling ``show_hook`` — exercises the narrow "too many arguments" path. * Add ``test_show_hook_unknown_name_raises`` asserting that a bogus hook name now surfaces as ``ToolError`` matching ``r"invalid option|unknown hook"`` (regression guard against re-broadening the swallow).

…tmcp why: ``fastmcp_model_classes`` in docs/conf.py is the allow-list that ``sphinx-autodoc-fastmcp`` uses to decide which Pydantic models get rendered in the generated schema docs. After PR #15 added seven new models, the tuple still listed only the original ten — so the new schemas (``SearchPanesResult``, ``PaneSnapshot``, ``ContentChangeResult``, ``HookEntry``, ``HookListResult``, ``BufferRef``, ``BufferContent``) were invisible in the built API docs, even though they're part of every tool surface that was documented. Flagged as Important by Gemini (Skeptic) in the Loom review. what: - Add the seven new model names to ``conf["fastmcp_model_classes"]`` in docs/conf.py. Keeps the existing order for the old entries; new entries are inserted at semantically adjacent positions (e.g. ``SearchPanesResult`` next to ``PaneContentMatch``, ``ContentChangeResult`` next to ``WaitForTextResult``, ``HookEntry`` / ``HookListResult`` grouped, ``BufferRef`` / ``BufferContent`` grouped). - Verified post-build: all seven names appear in the generated reference/api/models/index.html.

…ult shape why: PR #15 wrapped ``search_panes`` output in a ``SearchPanesResult`` model (matches + pagination fields) but the docs/tools/panes.md example still showed the pre-wrapper bare-array response shape. Three reviewers converged on this (Claude + Gemini + GPT in the Loom review, 3-way consensus). what: - Rewrite the response block in docs/tools/panes.md from: [ {pane_id: ..., matched_lines: [...]} ] to the actual wrapper: { matches: [...], truncated, truncated_panes, total_panes_matched, offset, limit } - Add a one-paragraph pagination hint above the example explaining how to iterate larger result sets via ``offset += len(matches)`` until ``truncated == false`` and ``truncated_panes == []``.

why: AGENTS.md §255-287 requires doctests on pure helpers. Most of the pure helpers introduced by PR #15 have them (``_validate_channel_name``, ``_validate_logical_name``, ``_validate_buffer_name``, ``_truncate_lines_tail``, ``_tmux_argv``, ``_pane_id_sort_key``). The critic pass surfaced ``_allocate_buffer_name`` as the lone remaining pure helper without a doctest — an oversight when commit 272396d introduced the module. Loom review Suggestion #16 (finalized). what: - Expand the docstring on src/libtmux_mcp/tools/buffer_tools.py::_allocate_buffer_name with a NumPy-style ``Examples`` block covering: * prefix contract (``"libtmux_mcp_"``), * logical-name suffix preservation, * 32-hex-character uuid nonce length, * empty-string and ``None`` both collapsing to the ``"buf"`` fallback. - The docstring also expands the rationale for why the name has the exact shape it does (privacy prefix + collision-resistant nonce + logical suffix), so future maintainers don't reduce the structure and accidentally re-introduce either the OS-clipboard read risk or the cross-agent collision risk that commit 272396d fixed.

…thread around capture_pane why: In commit 67c3fc8 the wait tools became `async def` so that ctx.report_progress could be awaited during long polls. FastMCP direct-awaits async tools on the main event loop (`fastmcp/tools/function_tool.py:277-278`) — it only uses a threadpool for *sync* tools via `call_sync_fn_in_threadpool`. That means every blocking `pane.capture_pane()` subprocess.run inside the poll loop starves every other concurrent MCP request for the duration of the tmux roundtrip (~10s of ms per tick), for the full configured timeout. A 60-second `wait_for_text` call would pin the entire server for 60 seconds, delaying list_sessions / capture_pane / anything else. Loom code review surfaced this with 100% confidence (Gemini Skeptic pass on PR #15); verified against FastMCP source. what: - Wrap the three blocking `pane.capture_pane(...)` calls in src/libtmux_mcp/tools/pane_tools/wait.py with `asyncio.to_thread`: * `wait_for_text` polling call. * `wait_for_content_change` initial snapshot. * `wait_for_content_change` polling call. - Inline comment at the first site explains why (future maintainers will otherwise see "sync call inside async fn" as a bug and try to "fix" it by removing `async def`, reintroducing the blocking path). - No docstring changes — the `async def` signature already promises non-blocking semantics; this commit actually delivers on it. - Add `test_wait_tools_do_not_block_event_loop` in tests/test_pane_tools.py that runs `wait_for_text` against a never-matching pattern inside an asyncio.gather with a ticker coroutine. Asserts the ticker counter >= 5 over the 300 ms wait window — a blocking capture would leave it at 0 or 1. Deliberately a generous lower bound to stay robust on slow CI.

…_wait why: The ``run_and_wait`` prompt hardcoded ``mcp_done`` as the tmux wait-for channel name. tmux channels are server-global, so two concurrent agents (or parallel prompt renderings from one agent) would race: one agent's ``tmux wait-for -S mcp_done`` would unblock another's pending ``wait_for_channel("mcp_done")``, producing a false positive completion signal. Flagged as Critical by GPT Builder in the Loom review of PR #15. what: - Import ``uuid`` in src/libtmux_mcp/prompts/recipes.py. - Generate a fresh channel name at each call: ``libtmux_mcp_wait_<uuid4hex[:8]>``. * Prefix aligns with the buffer-tools namespace (libtmux_mcp_) so a future ``list_buffers(prefix="libtmux_mcp_")``-style operator sees every MCP-owned tmux artifact uniformly. * 8 hex characters (~32 bits) is safe against collision inside a single tmux server's concurrent-agent population. - Interpolate the same channel name into both the send_keys payload and the wait_for_channel call so one prompt rendering remains internally consistent. - Update the prompt docstring to note the per-invocation scope. - tests/test_prompts.py: * Update ``test_run_and_wait_returns_string_template`` to pin the new prefix. * Add ``test_run_and_wait_channel_is_uuid_scoped`` asserting (a) channel matches ``libtmux_mcp_wait_[0-9a-f]{8}``, (b) send_keys and wait_for_channel use the same name within one rendering, (c) two separate renderings emit different channel names.

why: ``show_hook`` swallowed three tmux error substrings into an empty result: ``"too many arguments"`` (correctly — that's how tmux reports an unset hook), but also ``"unknown hook"`` and ``"invalid option"``. The latter two fire for *typos* and *wrong-scope* mistakes, not "hook is unset". Agents sending ``show_hook("pane-esited")`` (typo) or ``show_hook("pane-exited", scope="server")`` (wrong scope) got back an empty list indistinguishable from a correctly-named but unset hook — masking the real input error. Flagged as Important by both Gemini (Skeptic) and GPT (Builder) in the Loom review of PR #15. This commit also lands the upstream TODO comment for Suggestion #15 (libtmux's scope-kwarg argv bug). what: - Narrow the ``except libtmux_exc.OptionError`` branch in src/libtmux_mcp/tools/hook_tools.py::show_hook to only match ``"too many arguments"`` — the single substring all tmux builds supported by this project use for an unset hook. Any other OptionError (``"unknown hook"``, ``"invalid option"``, new future substrings) re-raises so ``handle_tool_errors`` surfaces it. - Replace the block's comment with the reasoning so a future maintainer knows not to re-broaden the catch. - Add a TODO(libtmux upstream) comment in ``_resolve_hook_target`` explaining why the scope-nulling workaround exists. The fix belongs in libtmux's argv-assembly path, not here. - tests/test_hook_tools.py: * Remove ``test_show_hook_missing_returns_empty`` (its use of ``"after-nonexistent-hook-cxyz"`` was actually exercising the now-removed broad catch). * Add ``test_show_hook_unset_known_hook_returns_empty`` that sets and then unsets a real hook (``pane-exited``) before calling ``show_hook`` — exercises the narrow "too many arguments" path. * Add ``test_show_hook_unknown_name_raises`` asserting that a bogus hook name now surfaces as ``ToolError`` matching ``r"invalid option|unknown hook"`` (regression guard against re-broadening the swallow).

…tmcp why: ``fastmcp_model_classes`` in docs/conf.py is the allow-list that ``sphinx-autodoc-fastmcp`` uses to decide which Pydantic models get rendered in the generated schema docs. After PR #15 added seven new models, the tuple still listed only the original ten — so the new schemas (``SearchPanesResult``, ``PaneSnapshot``, ``ContentChangeResult``, ``HookEntry``, ``HookListResult``, ``BufferRef``, ``BufferContent``) were invisible in the built API docs, even though they're part of every tool surface that was documented. Flagged as Important by Gemini (Skeptic) in the Loom review. what: - Add the seven new model names to ``conf["fastmcp_model_classes"]`` in docs/conf.py. Keeps the existing order for the old entries; new entries are inserted at semantically adjacent positions (e.g. ``SearchPanesResult`` next to ``PaneContentMatch``, ``ContentChangeResult`` next to ``WaitForTextResult``, ``HookEntry`` / ``HookListResult`` grouped, ``BufferRef`` / ``BufferContent`` grouped). - Verified post-build: all seven names appear in the generated reference/api/models/index.html.

…ult shape why: PR #15 wrapped ``search_panes`` output in a ``SearchPanesResult`` model (matches + pagination fields) but the docs/tools/panes.md example still showed the pre-wrapper bare-array response shape. Three reviewers converged on this (Claude + Gemini + GPT in the Loom review, 3-way consensus). what: - Rewrite the response block in docs/tools/panes.md from: [ {pane_id: ..., matched_lines: [...]} ] to the actual wrapper: { matches: [...], truncated, truncated_panes, total_panes_matched, offset, limit } - Add a one-paragraph pagination hint above the example explaining how to iterate larger result sets via ``offset += len(matches)`` until ``truncated == false`` and ``truncated_panes == []``.

why: AGENTS.md §255-287 requires doctests on pure helpers. Most of the pure helpers introduced by PR #15 have them (``_validate_channel_name``, ``_validate_logical_name``, ``_validate_buffer_name``, ``_truncate_lines_tail``, ``_tmux_argv``, ``_pane_id_sort_key``). The critic pass surfaced ``_allocate_buffer_name`` as the lone remaining pure helper without a doctest — an oversight when commit 272396d introduced the module. Loom review Suggestion #16 (finalized). what: - Expand the docstring on src/libtmux_mcp/tools/buffer_tools.py::_allocate_buffer_name with a NumPy-style ``Examples`` block covering: * prefix contract (``"libtmux_mcp_"``), * logical-name suffix preservation, * 32-hex-character uuid nonce length, * empty-string and ``None`` both collapsing to the ``"buf"`` fallback. - The docstring also expands the rationale for why the name has the exact shape it does (privacy prefix + collision-resistant nonce + logical suffix), so future maintainers don't reduce the structure and accidentally re-introduce either the OS-clipboard read risk or the cross-agent collision risk that commit 272396d fixed.

Previously _get_caller_pane_id read only TMUX_PANE. Pane IDs like %1 are only unique within a single tmux server, so the kill_server / kill_session / kill_window / kill_pane guards could either falsely block operations across unrelated sockets or (worse) falsely permit a self-kill when a colliding pane id existed on the target socket. Parse TMUX (format: socket_path,server_pid,session_id per tmux environ.c:281) alongside TMUX_PANE to produce a CallerIdentity, and compare socket paths via os.path.realpath against the target Server's effective socket path. When socket info is ambiguous (e.g. TMUX_PANE set but TMUX missing), err on the side of blocking — the error message already directs users to run tmux manually if the guard is a false positive. - _utils.py: add CallerIdentity, _get_caller_identity, _effective_socket_path, _caller_is_on_server. Keep _get_caller_pane_id as a thin wrapper so _serialize_pane's is_caller annotation is unchanged. - server_tools.kill_server: guard now fires only when caller's socket matches the target server (was: any TMUX_PANE blocked). - session_tools.kill_session / window_tools.kill_window / pane_tools.kill_pane: self-kill checks gated on socket match. - tests/conftest.py: autouse fixture clears TMUX/TMUX_PANE so host terminal doesn't leak into tests. - tests/test_server_tools.py: self-kill guard test now sets TMUX to the target socket; new test covers cross-socket path. - tests/test_utils.py: new coverage for parser, realpath match, cross-socket rejection, conservative fallback, and absent caller.

16 inline `from fastmcp.exceptions import ToolError` statements lived inside function bodies across _utils.py, server_tools.py, session_tools.py, window_tools.py, pane_tools.py, and option_tools.py. This was a legacy habit that hurt grep-ability and left the pane_tools module cluttered with identical import lines at 7 different indentation levels. Hoisted to a single module-level import per file. No behavior change.

Prior to this commit the MCP server had no per-invocation visibility. Operators investigating a misbehaving agent's tool loop had to rely on indirect tmux state or the fastmcp framework's own logs, and payload-bearing tools like send_keys / paste_text / set_environment left no audit trail at all. Adds AuditMiddleware, a fastmcp Middleware that emits one structured INFO record per tool call to the libtmux_mcp.audit logger. Each record carries tool name, outcome (ok/error), error_type on failure, duration_ms, the fastmcp client_id and request_id, and a summary of the arguments. Arguments whose names are in the sensitive-name set ({keys, text, value}) are replaced with {len, sha256_prefix} so raw send_keys payloads, pasted text, and environment values never reach log files. Non-sensitive strings over 200 chars get truncated. Registered alongside SafetyMiddleware in server.py so it runs for every invocation whether or not the safety gate fires. The logger is namespaced separately so deployments can route audit records to a dedicated JSON sink without touching the module logger.

pane_tools.py had grown to 1315 lines — a kitchen-sink module covering I/O, capture, wait/poll, search, copy-mode, paste, resize, kill, select, swap, pipe, display, and snapshot all in one file. Large modules in this project are especially costly because each tool is AI-surface: reviewers need to audit behavior and docstrings together, and a flat 1.3k-line file makes the diff graph noisy when any one tool changes. Convert the module into a ``pane_tools/`` package with one file per operation domain: io.py send_keys, capture_pane, clear_pane, paste_text wait.py wait_for_text, wait_for_content_change search.py search_panes copy_mode.py enter_copy_mode, exit_copy_mode layout.py resize_pane, select_pane, swap_pane lifecycle.py kill_pane, set_pane_title, get_pane_info pipe.py pipe_pane meta.py display_message, snapshot_pane ``__init__.py`` re-exports every tool by name so ``from libtmux_mcp.tools.pane_tools import send_keys`` still works, preserving existing test imports and any downstream code that pinned to the flat namespace. The ``register()`` function is unchanged in behavior — all tool names, titles, annotations, and safety tags are identical so MCP clients see no surface change. No tool bodies were modified; this is a pure move. All 188 existing tests pass unchanged.

The prior mandate — "All functions and methods MUST have working doctests" — doesn't match this repo. Most functions here are MCP tool implementations that require a live tmux server; a doctest for ``send_keys`` or ``kill_pane`` has no way to run offline. That contradiction produced zero actual doctests in the codebase (the policy was aspirational rather than enforced). Rewrite the section to state plainly where doctests fit: * Required: pure helpers (parsers, formatters, redaction logic, small utilities) that run with no external state. * Exempt: anything touching tmux, subprocess, the filesystem, or env. Write a unit test with fixtures instead. Also soften the ``# doctest: +SKIP`` rule — rather than banning it outright, point readers at the right alternative (write a unit test) so the guidance matches actual practice. This is a policy-only change; pyproject.toml's doctest config will be updated separately so the infra and the doc don't get bundled.

The prior ``-p no:doctest`` in pyproject.toml's addopts explicitly disabled pytest's standard doctest plugin, even though AGENTS.md required doctests. That contradiction is now resolved — the policy in 7a scopes doctests to pure helpers, and this commit turns on collection and adds the first two. * pyproject.toml: swap ``-p no:doctest`` for ``--doctest-modules`` so `>>>` blocks in docstrings are actually executed. The pre-existing ``--doctest-docutils-modules`` flag (for sphinx-style ``.. doctest::`` directives) stays in place. * middleware.py: add doctests to ``_redact_digest`` and ``_summarize_args`` — both are pure, deterministic, and the natural place for an inline example of what the audit log will contain. Verified: collector shows two new DoctestItem entries, both pass; the broader suite goes from 188 to 190 tests with no existing failures.

The safety-tier page existed but didn't call out the two mutating-tier tools with broader-than-expected reach: pipe_pane can write arbitrary files, and set_environment mutates state that persists into every shell tmux later spawns. Both share a safety tier with tools like rename_window even though their blast radius is much larger. Additions to docs/topics/safety.md: * A new "Footguns inside the `mutating` tier" section covering pipe_pane (arbitrary filesystem writes), set_environment (scope-wide env mutation), and send_keys/paste_text (arbitrary shell execution). Each entry states the risk plainly and suggests mitigations (readonly tier, scoped home, `env VAR=value` via send_keys). * An "Audit log" section documenting the libtmux_mcp.audit logger added in the AuditMiddleware commit — what fields are recorded, how sensitive payloads are redacted to a {len, sha256_prefix} digest, and that operators should route the logger to a dedicated sink for a durable trail. * Corrected the self-kill-protection paragraph: it previously claimed the guard used only TMUX_PANE. After the socket-aware fix it also reads and parses TMUX, and this is now described so the doc matches the code. Docstring changes feed sphinx_autodoc_fastmcp so warnings also surface in the auto-generated tool reference: * src/libtmux_mcp/tools/pane_tools/pipe.py::pipe_pane — warns about arbitrary filesystem writes and cross-links to /topics/safety. * src/libtmux_mcp/tools/env_tools.py::set_environment — warns about scope-wide env propagation and points at `env VAR=value command` via send_keys as the narrower alternative. Per the design decision: no safety-tier reclassification. pipe_pane and set_environment remain in the `mutating` tier to avoid breaking deployments that rely on the default; the warnings make the extra reach visible so operators can pick `LIBTMUX_SAFETY=readonly` when appropriate.

The prior patch landed ``t.cast("MiddlewareContext[t.Any]", ...)`` in the audit-middleware tests to paper over four mypy errors. That's a lie — ``SimpleNamespace`` isn't a ``MiddlewareContext``, and ``Any`` hides whatever breaks when fastmcp's internal shape shifts. Investigation of the actual types: * ``fastmcp.server.middleware.MiddlewareContext`` is a ``@dataclass(kw_only=True, frozen=True)`` generic over the message type. Only ``message`` is required; everything else has a default. * ``mcp.types.CallToolRequestParams`` is a pydantic BaseModel with required ``name: str`` and optional ``arguments``. Direct construction works. * ``fastmcp.Context`` cannot be built without a live FastMCP server + MCP request context. There is no lightweight fake — ``client_id`` and ``request_id`` are properties that read from the live request context. So: construct the real ``MiddlewareContext(message=CallToolRequestParams(...))`` and leave ``fastmcp_context=None``. The audit middleware already handles that path (``if context.fastmcp_context is not None:``), so logs carry ``client_id=None`` / ``request_id=None`` — which is the exact production behavior when no MCP context is attached. Update the ``test_audit_middleware_logs_success`` assertion accordingly. The only remaining ``t.Any`` is on ``args: dict[str, t.Any]`` in ``test_summarize_args_redacts_sensitive_keys`` and in the ``_fake_context`` ``arguments`` parameter. Both mirror the upstream pydantic type ``CallToolRequestParams.arguments: dict[str, Any] | None`` — inherited from an external API, not ours to tighten. Mypy across the full tree is clean. 190 tests pass.

CI reported 11 files failing ``ruff format --check`` — mostly blank lines inside multi-line import blocks left over from the earlier hoist / pane split. Running ``ruff format .`` normalizes them. Also drop the ``sphinx_autodoc_fastmcp`` entry from the mypy override block in pyproject.toml. Nothing in src/, tests/, or docs/ imports it (it appears only as a string in ``conf.py::extensions``), so the override was dead config. Mypy was flagging it on every run. The docutils entry stays since mypy still lists it as used when checking the full tree.

AGENTS.md prescribes top-level namespace imports for standard library modules. ``paste_text`` was importing ``subprocess`` and ``tempfile`` inside the function body — carried over from the old flat ``pane_tools.py`` through the recent split. An earlier commit (``c459378``) fixed an identical pattern for ``pathlib`` and cited the same rule; this closes the remaining two cases in io.py. No call-site changes: the body already uses namespace form (``subprocess.run``, ``tempfile.NamedTemporaryFile``, ``subprocess.CalledProcessError``).

…ffers Agents are supposed to call ``delete_buffer`` after use, but an interrupted call chain can leave ``libtmux_mcp_<uuid>_*`` entries lingering on the tmux server. The server-global paste-buffer namespace is shared across clients, so accumulated leaks are observable by the human user via ``tmux list-buffers``. Adds a ``_gc_mcp_buffers`` helper that runs on lifespan teardown: iterates every cached Server, lists buffer names, and deletes anything matching ``_MCP_BUFFER_PREFIX``. Best-effort and fully silent — tmux may be unreachable, buffers may vanish mid-scan, and none of that should block lifespan shutdown. Logs at debug level so operators can still surface leaks via verbose logging. Also adds "when to use" cross-references to the ``paste_text`` and ``load_buffer`` docstrings so agents can pick the right tool from the schema alone instead of having to read the narrative docs. paste_text is fire-and-forget; load_buffer is stage-then-fire with a handle for multi-paste or inspection. Note: FastMCP lifespan teardown runs on SIGTERM / SIGINT only; ``kill -9`` and OOM bypass it, so this GC is an ergonomics improvement, not a security boundary.

Two guards for the lifespan GC: * Seeds a real MCP-owned buffer via ``load_buffer`` alongside a manually set ``human_buffer`` and asserts only the prefixed one is deleted. The human-owned buffer MUST survive — that is where the OS clipboard sync lives, and deleting it would be a privacy regression. * A ``_BrokenServer`` stub whose ``cmd`` always raises proves the GC swallows exceptions rather than crashing lifespan shutdown.

Every tool already accepts ``socket_name`` for multi-server work but there was no way to discover which servers actually exist — agents had to guess names or rely on the caller to supply them. Scans ``${TMUX_TMPDIR:-/tmp}/tmux-<uid>/`` (the canonical socket location per tmux.c:209) and returns a ``list[ServerInfo]`` of live servers. Stale socket inodes — the common case on long-running machines where ``$TMUX_TMPDIR`` accumulates thousands of orphans — are filtered via a cheap UNIX ``connect()`` liveness probe before the more expensive ``get_server_info`` call. Local testing: 17k stale inodes plus one live server, 50s probe-every-socket ➜ 0.4s with connect-first filter.

Two guards: * The pytest-libtmux fixture socket appears in the discovery result with ``is_alive=True`` — proves the live-probe path. * A nonexistent ``$TMUX_TMPDIR`` returns an empty list without raising — proves graceful degradation on a freshly provisioned box where the user has never run tmux.

…apes Mirror of the read-heavy tool shape guards in ``test_server_tools.py`` and ``test_pane_tools.py`` — prompts and hierarchy resources wire to MCP clients as string bodies (prompt templates, JSON text, plain text), so a future refactor that accidentally returns a Pydantic model, dict, or None would silently break the MCP surface. Parametrized so every recipe and every hierarchy resource is checked without per-case boilerplate.

…ter screen-grabbing launches An agent following ``build_dev_workspace`` literally would deadlock: step 4 launched vim and a long-running log command in separate panes, then said "wait for the prompt via ``wait_for_text`` before moving on between each step." vim and ``watch`` / ``tail -f`` take over the full terminal — they never draw a shell prompt, so the wait blocks until timeout. Restructures the recipe around when a prompt IS a meaningful signal vs. when it is not: * New step 4: wait for the shell prompt in each pane BEFORE sending the launch command. This is a valid readiness check — the new panes start in the user's login shell. * New step 5: send the launch commands. * New step 6 (optional): use ``wait_for_content_change`` after launch to confirm the program drew its UI. Explicitly flagged as "strictly different from waiting for a shell prompt" so future editors don't revert to the deadlock wording. The prompt-alignment test from the previous round did not catch this because it validates parameter NAMES, not runtime semantics.

…mpt waits Regression test asserts the rendered recipe: * does NOT contain the stale "wait for the prompt … between each step" guidance that would deadlock agents. * DOES reference ``wait_for_content_change`` for post-launch UI confirmation. * still references ``wait_for_text`` (for the pre-launch shell- readiness check, where prompt matching is meaningful). Complements the existing prompt-alignment test, which only covers parameter names, by validating runtime guidance semantics.

The docstring claimed ``list_servers`` discovers "every live tmux server on this machine", but the implementation only scans ``${TMUX_TMPDIR}/tmux-<uid>/``. Custom ``tmux -S /arbitrary/path`` daemons (systemd user units, CI runners, per-project ``.envrc`` paths) live outside that directory and were invisible — the docstring overclaimed. Two changes: * Correct the docstring to say "under the current user's ``$TMUX_TMPDIR``" and spell out the custom-path caveat so agents do not assume canonical-only sockets exist. * Add an optional ``extra_socket_paths: list[str] | None`` parameter so callers who know about arbitrary paths can supplement the scan. Each path runs through the same ``_is_tmux_socket_live`` probe and socket-metadata query as canonical entries. Nonexistent or non-socket paths are silently skipped so stale user-supplied lists don't crash the call. Factors the server-info construction into a private ``_probe_server_by_path`` helper that mirrors ``get_server_info`` but keys the Server by socket_path (-S) instead of socket_name (-L), so the extras path can use an arbitrary filesystem location.

…afety Two regression guards for the new ``extra_socket_paths`` surface: * Supplying the fixture socket as an extra path makes it appear in the result even when the canonical ``$TMUX_TMPDIR`` scan is pointed at an empty directory — proves the extras code path is exercised in isolation from the canonical scan. * A list containing a nonexistent path and a regular file (not a socket) returns an empty result without raising — proves stale user-supplied entries degrade gracefully.

The L1 fix narrowed the macOS ``$TMUX_TMPDIR`` gap by preferring ``tmux display-message -p '#{socket_path}'`` over env-reconstruction, but ``server.cmd`` invokes tmux with the MCP process's own environment. When the MCP process's ``$TMUX_TMPDIR`` diverges from the real tmux server's (macOS launchd case), the query fails: * MCP process env: ``TMUX_TMPDIR=/wrong/path`` * Real tmux lives at ``/correct/path/tmux-1000/mysocket`` * Caller's ``$TMUX=/correct/path/tmux-1000/mysocket,…`` * Query invokes ``tmux -L mysocket display-message``; tmux uses ``/wrong/path`` and can't find the server; query raises. * Reconstruction fills in ``/wrong/path/tmux-1000/mysocket``. * ``_caller_is_on_server`` realpath compares bogus vs. real path: different. Guard degrades to no-op. Adds a conservative basename fallback after the realpath check: if the basename of the caller's socket path equals the target's declared ``socket_name`` (or ``"default"``), the guard still fires. Both names are authoritative on their respective sides and neither is subject to ``$TMUX_TMPDIR`` divergence, so this is a safety-preserving last-chance match. Trade-off acknowledged in the docstring: exotic false positive when two daemons share a socket_name under different tmpdirs. Prefer that over silently permitting self-kill under env mismatch.

…gence Regression guard for the macOS launchd scenario: MCP process has a wrong ``$TMUX_TMPDIR``, display-message query fails, reconstruction produces a path that mismatches the caller's ``$TMUX`` realpath, but the basename (socket name) still matches. The guard must still fire. The test forces: * ``$TMUX_TMPDIR`` on the MCP process points at a nonexistent dir. * Monkeypatches ``server.cmd`` so the display-message query raises. * Sets ``$TMUX`` to a path in a "real" tmpdir whose basename is the target's socket_name. Asserts ``_caller_is_on_server`` returns True — proving the basename fallback closes the env-mismatch gap without requiring the realpath comparison to succeed.

…r from build_dev_workspace Two residual ergonomic problems in the recipe that F1's rewrite did not catch, both confirmed by reading the underlying source: **Fancy-shell hang.** Step 4 emitted a literal ``wait_for_text(pattern=r"\$ |\# |\% ", regex=True, timeout=5.0)``. The regex only matches default bash/zsh prompts; zsh+oh-my-zsh (``➜``), starship (``❯``), pure / p10k (Unicode glyphs), and fish (``>``) all miss. Agents on these setups burn 5s × 3 panes = 15s of ceremony every time the recipe runs. Reading tmux source confirms the wait was unnecessary defensive ceremony: ``cmd-split-window.c:151`` → ``spawn.c:382,494`` allocate the PTY and wire the bufferevent before split-window returns. ``input-keys.c:418`` writes send-keys bytes straight into the kernel PTY buffer via ``bufferevent_write`` with no readiness check. The kernel holds bytes until the slave reads. A ``send_keys("vim")`` right after ``split_window`` is safe whether or not the shell has finished drawing its prompt — the shell will pick up the queued bytes on its first stdin read. **Stray Enter in pane B.** Step 5 emitted ``send_keys(pane_id="%B", keys="")`` with a comment "(leave the shell idle)". libtmux's ``Pane.send_keys`` (``pane.py:423-475``) runs ``send-keys ""`` *then* ``self.enter()`` which is ``send-keys Enter``. libtmux's own test (``tests/test_pane.py:128``) confirms an Enter is sent. The shell was already idle after the split; the line did nothing useful while actively misleading anyone reading it. Net change: ~15 LOC removed, ~3 LOC added. Recipe body is shorter, shell-agnostic, and honest about what it does. The step-6 ``wait_for_content_change`` block is kept (renumbered to step 5) — it is still the right primitive for post-launch UI confirmation. Also updates the existing ``test_build_dev_workspace_does_not_deadlock_on_screen_grabbers`` test to drop the now-obsolete ``wait_for_text`` assertion; a targeted regression guard for the two removed lines follows in a separate test commit.

…regex or stray Enter Targeted regression guard for the two residual ergonomic issues fixed in the preceding commit: * ``\$ |\# |\% `` regex must be absent — hangs starship / oh-my-zsh / pure / p10k / fish users for 5s × 3 panes = 15s of ceremony with no benefit. * ``send_keys(pane_id="%B", keys="")`` must be absent — libtmux emits an Enter keystroke when ``enter=True`` (the default), so the line lied about being a no-op. Positive pin on ``wait_for_content_change`` so the post-launch UI confirmation guidance stays in the recipe; it's a shell-agnostic "did the screen change since we captured it?" check that works across every shell, editor, and tailing program combination. Follows the established ``in``/``not in`` assertion style from the neighbouring ``test_build_dev_workspace_does_not_deadlock_on_screen_grabbers`` test.

``get_server_info`` swallowed any ``display-message #{version}`` failure with a bare ``except Exception: pass`` — pre-existing from the initial release, not from this branch. The sibling helper ``_probe_server_by_path`` in the same file now logs the analogous failure at debug level, so the two were drifting in style. Adds a single ``logger.debug`` so operators debugging "why does my tmux server report version=None?" get a uniform signal across both code paths. Semantics are unchanged (version still falls back to None on failure, callers never see the exception).

…_session The ``del mcp_session`` pattern (12 sites across four test files) signalled "I need this fixture's side effect but not its value." That is the textbook use case for ``@pytest.mark.usefixtures``, which pytest's own test suite uses 16+ times and recommends as the canonical form — cross-checked in ~/study/python/pytest/testing/ and in upstream pytest docs. libtmux and fastmcp also follow the decorator form; libtmux-mcp was an outlier. Each converted test now: * Declares the required fixture via a decorator (explicit intent at function definition). * Drops the unused ``mcp_session: Session`` parameter. * Drops the ``del mcp_session`` ceremony from the body. Three files also drop the now-unused ``Session`` import from their ``TYPE_CHECKING`` block (``test_buffer_tools.py``, ``test_server.py``, ``test_wait_for_tools.py``). ``test_server_tools.py`` keeps the import — ``test_list_sessions`` genuinely uses ``mcp_session``. Behaviour unchanged; this is pure readability + idiom cleanup.

…t loops Regression guard for MCP cancellation semantics. The two wait tools share a ``while True:`` poll-and-sleep pattern wrapped by ``handle_tool_errors_async`` (``_utils.py:827-850``), which catches ``Exception`` (not ``BaseException``). ``asyncio.CancelledError`` inherits from ``BaseException`` (Python 3.8+), so it propagates through the decorator today. This test locks that in: if a future change broadens the decorator to ``BaseException`` it would silently swallow MCP cancellation requests, agents would burn tokens waiting for cancelled tools to return, and operators would have no signal that anything went wrong. Uses ``task.cancel()`` rather than ``asyncio.wait_for`` so the raised exception is the inner ``CancelledError`` directly, not ``wait_for``'s ``TimeoutError`` wrapper. ``wait_for_text`` and ``wait_for_content_change`` get sibling tests since both go through the same decorator path.

…elled MCP cancellation works correctly today — ``CancelledError`` is a ``BaseException`` (Python 3.8+) and ``handle_tool_errors_async`` catches ``Exception``, so cancellation propagates untouched. But operators have no signal that a wait was cancelled vs. timed out — the two outcomes look identical at the log layer. Wraps each ``while True:`` poll loop in ``wait_for_text`` and ``wait_for_content_change`` with an explicit ``except asyncio.CancelledError: logger.debug(...); raise``. Three benefits: * Documents intent — readers see that cancellation is handled intentionally, not by accident of inheritance. * Records the cancel-time elapsed seconds and pane id at debug level so post-mortem log analysis can distinguish "cancelled at 150 ms" from "timed out at 8 s". * Re-raise preserves MCP semantics — never returns a partial ``WaitForTextResult`` / ``ContentChangeResult`` that would mask the cancellation as a timed-out wait. The per-pair regression test added in the preceding commit pins ``CancelledError`` propagation; this commit only adds diagnostic visibility.

…t timeouts Adopts FastMCP's structured client logging (``ctx.info/warning/error`` at ``fastmcp/server/context.py:716-778``) so MCP clients receive ``notifications/message`` events for the two diagnostic moments operators currently can't see: * Invalid regex pattern compiled in ``wait_for_text`` — previously the agent saw only a generic ``ToolError``; now the client also receives a structured warning, which surfaces in client log panels independent of the tool result. * Wait loop reaches its timeout in either ``wait_for_text`` or ``wait_for_content_change`` — previously identical to a "found" result at the log layer (only the boolean ``timed_out`` field distinguished them); now the client gets an explicit warning with pane id and timeout duration. Adds a ``_maybe_log`` helper sibling to ``_maybe_report_progress`` sharing the same ``_TRANSPORT_CLOSED_EXCEPTIONS`` suppression contract so a hung-up client doesn't take down the tool, while programming errors stay loud per the regression guard ``test_wait_for_text_propagates_unexpected_progress_error``. Updates the existing ``_StubContext`` and ``_BrokenContext`` test stubs in ``tests/test_pane_tools.py`` to declare ``warning`` methods — the stubs need to mock the full Context surface the wait loop now touches. ``_BrokenContext.warning`` raises the same ``anyio.BrokenResourceError`` as its ``report_progress`` so the existing transport-closed regression guard also covers the new log path. Per-pair regression test in the next commit asserts the warnings fire at both sites with the right level.

… and timeout Three regression guards for the new ``_maybe_log`` call sites added in the preceding commit. Each builds a tiny ``_RecordingContext`` that captures ``(level, message)`` tuples — the same recording-stub style used by the existing progress tests, kept inline per test so each assertion stays self-contained: * ``test_wait_for_text_warns_on_invalid_regex`` — ``regex=True`` with an unclosed bracket raises ``ToolError`` AND records a warning containing "Invalid regex". * ``test_wait_for_text_warns_on_timeout`` — never-matching pattern with a 200 ms timeout records a warning containing "timeout". * ``test_wait_for_content_change_warns_on_timeout`` — settled pane with a 500 ms timeout records the equivalent warning. Reuses the active settle-loop pattern from ``test_wait_for_content_change_timeout`` so the test stays deterministic on slow CI rather than relying on a fixed sleep. The tests assert on substring presence rather than exact message text so the warning copy can evolve without forcing a test update.

…AG_READONLY Adopts FastMCP's ``RetryMiddleware`` (``fastmcp/server/middleware/error_handling.py:136``) but bounded by the existing safety-tier annotation. The ``_server_cache`` already evicts dead Server instances on ``is_alive() == False``, but the *first* call after a tmux daemon crash still fails before the cache learns about it. A single retry on the freshly-cached Server closes that window without changing tool semantics. Critical safety constraint: only readonly tools are retried. Mutating tools like ``send_keys``, ``create_session``, ``split_window`` are NOT idempotent — re-running them on a transient socket error would silently double side effects, which is unacceptable. The wrapper ``ReadonlyRetryMiddleware`` reads ``tool.tags`` via ``context.fastmcp_context.fastmcp.get_tool(name)`` (same pattern ``SafetyMiddleware`` uses) and only delegates to the upstream retry when ``TAG_READONLY`` is present. Defaults are deliberately small: ``max_retries=1`` keeps audit log noise minimal (one extra attempt, not three); ``base_delay 0.1s`` / ``max_delay 1.0s`` matches the duration window of a transient libtmux socket hiccup. ``retry_exceptions= (libtmux.exc.LibTmuxException,)`` is narrow on purpose — the upstream default ``(ConnectionError, TimeoutError)`` does NOT match libtmux's wrapped subprocess errors and would be a silent no-op. Wired into the middleware stack between ``AuditMiddleware`` and ``SafetyMiddleware`` so retries are audited once each and tier-denied tools never reach retry. Updates the existing ``test_server_middleware_stack_order`` to lock in the new position. Per-pair regression test in the next commit asserts: readonly tool retries on ``LibTmuxException``, mutating tool does not retry, non-libtmux exception does not retry.

…y tools only Three regression guards for the new safety-scoped retry wrapper: * ``test_readonly_retry_recovers_from_libtmux_exception`` — readonly tool raises ``LibTmuxException`` once then succeeds; assert the middleware retried (call count 2) and returned the success. Models the production scenario: a transient socket error on the first call after a tmux daemon restart, recovered by the retry on the freshly-cached Server. * ``test_readonly_retry_skips_mutating_tool`` — mutating tool with the same exception; assert NO retry (call count 1) and the exception propagates. This is the critical safety property — re-running ``send_keys`` or ``create_session`` on a transient error would silently double the side effect. * ``test_readonly_retry_skips_non_libtmux_exception`` — readonly tool raises ``ValueError``; assert NO retry. Pins the narrow ``retry_exceptions=(LibTmuxException,)`` default so a future refactor that broadens the trigger set would fail loudly. Tests use minimal ``_StubTool`` / ``_StubFastMCP`` / ``_StubFastMCPContext`` stand-ins to satisfy the ``context.fastmcp_context.fastmcp.get_tool(name)`` lookup the middleware uses for tag inspection, plus a ``_FlakyCallNext`` helper that raises N times before succeeding.

… fix CI race CI failure on `8dc7c81`: ``tests/test_pane_tools.py::test_wait_for_content_change_propagates_cancellation`` ``Failed: DID NOT RAISE <class 'asyncio.exceptions.CancelledError'>`` across all six tmux versions (3.2a, 3.3a, 3.4, 3.5, 3.6, master). Root cause: ``wait_for_content_change`` exits the poll loop when ``current != initial_content``. On the GitHub Actions Ubuntu runner the shell prompt redraws (cursor blink, zsh async hooks like vcs_info) within the test's 100 ms ``asyncio.sleep`` window, so ``changed=True`` fires and the tool returns a normal ``ContentChangeResult`` before the ``task.cancel()`` arrives. Local runs on an idle pane never tripped this — classic CI-only race. Fix: monkeypatch ``Pane.capture_pane`` to always return the same line list. The diff branch can never trigger, so the loop stays in ``await asyncio.sleep(interval)`` indefinitely and the cancel unambiguously interrupts it. The test still exercises the production code path that matters — that ``CancelledError`` propagates through ``handle_tool_errors_async`` without being mapped to ``ToolError``.

The shipped ``ReadonlyRetryMiddleware`` (commit ``bd37f29``) is a production no-op under fastmcp 3.2.3: - ``RetryMiddleware._should_retry`` only checks ``isinstance(error, retry_exceptions)`` — does NOT walk ``__cause__``. - ``handle_tool_errors_async`` (``_utils.py:842-850``) wraps every ``LibTmuxException`` as ``ToolError(...) from LibTmuxException``. - At the middleware layer the exception type is ``ToolError``, not ``LibTmuxException``, so ``_should_retry`` returns ``False`` and the retry never fires in production. The shipped tests in ``tests/test_middleware.py:475+`` only pass because ``_FlakyCallNext`` raises ``LibTmuxException`` directly, bypassing the production decorator wrap path. fastmcp v3.2.4 commit ``031c7e03`` (PR #3858 "Fix RetryMiddleware not retrying tool errors") adds the ``__cause__`` walk to ``_should_retry``. Reproduced the production-flow bug under 3.2.3 with a 25-line harness (calls=1) and verified the fix on 3.2.4 (calls=2 with max_retries=1). Free win that ships in the same bump: PR #3872 (commit ``f3c00ba1`` "Extract parameter descriptions from docstrings") adds an automatic NumPy/Google/Sphinx docstring parser that populates per-parameter ``inputSchema.description`` fields. The project's tools already use NumPy ``Parameters\n----`` style throughout (verified ``wait.py``, ``_utils.py``, ``middleware.py``), so per-parameter descriptions become visible to LLM clients with zero code changes. Next commit adds an integration-style regression test that calls a real ``@handle_tool_errors``-decorated tool through the middleware stack — catches the production no-op AND any future regression where someone changes the decorator's exception wrapping.

… decorator Integration-style regression guard for the production code path the existing unit tests miss. The three sibling tests above use ``_FlakyCallNext`` to raise ``LibTmuxException`` directly; in production every tool is wrapped by ``handle_tool_errors`` / ``handle_tool_errors_async`` which converts ``LibTmuxException`` to ``ToolError(...) from LibTmuxException``. At the middleware layer the exception type is ``ToolError``, not ``LibTmuxException`` — so the retry decision must walk ``__cause__`` to see the real failure type. The new test invokes the real ``list_sessions`` tool through ``ReadonlyRetryMiddleware.on_call_tool``, exercising the decorator wrap path that broke under fastmcp 3.2.3. Monkeypatches ``Server.sessions`` to raise ``LibTmuxException`` once then return ``[]`` so the test runs without a live tmux. Asserts ``calls == 2`` with a friendly error message naming the likely cause (fastmcp older than 3.2.4) so an accidental dep downgrade fails loudly. Catches both: (a) the production no-op the previous fastmcp pin allowed, and (b) any future regression where someone changes the decorator's exception wrapping in a way that defeats ``__cause__`` walking.

…tream consistency All three round-2 weave:review reviewers flagged that ``ReadonlyRetryMiddleware``'s internal logger defaulted to ``fastmcp.retry`` (the upstream stock at ``error_handling.py:181``) instead of the project's ``libtmux_mcp.*`` namespace. Operators routing the audit stream by namespace prefix would silently miss retry events. Set the default explicitly: when ``logger_`` is ``None``, construct ``logging.getLogger("libtmux_mcp.retry")`` before delegating to fastmcp's ``RetryMiddleware``. The ``if logger_ is None:`` pattern avoids evaluating ``getLogger`` in the default expression. Pinned in tests via ``test_readonly_retry_logger_uses_project_namespace`` asserting ``middleware._retry.logger.name == "libtmux_mcp.retry"``, so a future refactor that drops the default fails loudly.

The new ``list_servers`` tool (added earlier in the branch) wasn't documented anywhere under ``docs/`` — agents and human readers discovering the tool from a client's tool list had no docs page to land on. The conf.py area-map already routes ``server_tools`` → ``sessions``, so the natural home is ``docs/tools/sessions.md`` between ``get_server_info`` and ``create_session``. Adds the standard four-part section: ``{fastmcp-tool}`` directive, Use-when / Avoid-when / Side-effects narrative, an example JSON request + response (including a second example with ``extra_socket_paths`` for the custom-path case), and the ``{fastmcp-tool-input}`` schema directive. Calls out the scope caveat (custom ``tmux -S /path/...`` daemons are not in the canonical scan) up front so readers don't reach for the tool in the wrong situation. Also adds a grid-item-card to ``docs/tools/index.md`` so the discovery overview links to the new section.

The four prompts shipped this branch (``run_and_wait``, ``diagnose_failing_pane``, ``build_dev_workspace``, ``interrupt_gracefully``) had zero docs coverage — no ``{fastmcp-prompt}`` autodoc directive (the sphinx-autodoc-fastmcp extension only ships ``{fastmcp-tool}``, ``{fastmcp-tool-input}``, ``{fastmcp-tool-summary}``; verified by reading the installed extension's ``__init__.py``), and ``docs/topics/prompting.md`` covers general agent-prompting heuristics rather than the project's MCP-registered prompts. Adds ``docs/tools/prompts.md`` — a hand-written page (since autodoc isn't available for prompts) with one section per recipe covering: when to use it, why it exists vs. the obvious alternative tool calls, parameters, and a sample render at one representative argument set. The samples use stable representative output (``<uuid>`` placeholder for the wait-for channel) so the page doesn't churn on every regeneration. Wires the new page into the docs toctree under "Use it" between ``tools/index`` and ``recipes`` (the latter is the longer narrative agent-reasoning page, conceptually distinct from these short MCP-protocol prompts). Adds a one-paragraph cross-reference from ``docs/tools/index.md``'s decision tree so readers discovering the tool surface find the prompts naturally.

…he new pane id The ``create_session`` response gained an ``active_pane_id`` field earlier in this branch (commit ``3f092bb``), but the docs still showed the pre-change response shape — readers landing on ``sessions.md`` had no signal that this field exists or what they'd do with it. Updates the example response to include the field, then adds a brief tip explaining the workflow win: callers can target subsequent ``send_keys`` / ``split_window`` / ``capture_pane`` calls against ``active_pane_id`` directly without a follow-up ``list_panes`` round-trip. Saves one MCP call in the most common "create a session, then act on it" workflow that prompts like ``build_dev_workspace`` already exploit. The full field surface is auto-rendered in ``docs/reference/api/models.md`` via the ``fastmcp_model_classes`` registration — this commit just adds the narrative discoverability that explains why the field exists.

tony force-pushed the 2026-04-follow-ups branch from 76bec04 to f9d5388 Compare April 14, 2026 00:00

tony marked this pull request as ready for review April 14, 2026 00:20

tony changed the title ~~Follow-ups from code review: caller identity, observability, pane split, docs~~ Follow-ups from code review + FastMCP alignment (Phase 2) Apr 14, 2026

tony force-pushed the 2026-04-follow-ups branch 2 times, most recently from 65971d8 to c714044 Compare April 16, 2026 15:34

tony force-pushed the 2026-04-follow-ups branch from 94c4044 to bc4c2ed Compare April 16, 2026 22:01

tony added 10 commits April 16, 2026 19:07

tony added 29 commits April 16, 2026 19:07

docs(CHANGES) FastMCP alignment, new tool surface, and middleware stack

6190391

tony force-pushed the 2026-04-follow-ups branch from bc4c2ed to 8c76b87 Compare April 17, 2026 00:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-ups from code review + FastMCP alignment (Phase 2)#15

Follow-ups from code review + FastMCP alignment (Phase 2)#15
tony wants to merge 97 commits intomainfrom
2026-04-follow-ups

tony commented Apr 13, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 13, 2026 •

edited

Loading

Uh oh!

tony commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tony commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Quality gate

Phase 1 — original review follow-ups

Phase 2 — FastMCP alignment

Quality gate precursor

PR group A — Preconditions (correctness fixes)

PR group B — Context + wait-for

PR group C — Structured outputs + resources

PR group D — New tool families

PR group E — Middleware + lifespan

Deliberately NOT in this PR

Diff size

Test plan

Notes for reviewers

Uh oh!

codecov-commenter commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tony commented Apr 14, 2026

Code review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tony commented Apr 13, 2026 •

edited

Loading

codecov-commenter commented Apr 13, 2026 •

edited

Loading