Follow-ups from code review + FastMCP alignment (Phase 2)#15
Open
Follow-ups from code review + FastMCP alignment (Phase 2)#15
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #15 +/- ##
==========================================
+ Coverage 82.90% 86.18% +3.27%
==========================================
Files 17 30 +13
Lines 983 1513 +530
Branches 110 182 +72
==========================================
+ Hits 815 1304 +489
- Misses 122 155 +33
- Partials 46 54 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
76bec04 to
f9d5388
Compare
Member
Author
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
tony
added a commit
that referenced
this pull request
Apr 15, 2026
…thread around capture_pane why: In commit 67c3fc8 the wait tools became `async def` so that ctx.report_progress could be awaited during long polls. FastMCP direct-awaits async tools on the main event loop (`fastmcp/tools/function_tool.py:277-278`) — it only uses a threadpool for *sync* tools via `call_sync_fn_in_threadpool`. That means every blocking `pane.capture_pane()` subprocess.run inside the poll loop starves every other concurrent MCP request for the duration of the tmux roundtrip (~10s of ms per tick), for the full configured timeout. A 60-second `wait_for_text` call would pin the entire server for 60 seconds, delaying list_sessions / capture_pane / anything else. Loom code review surfaced this with 100% confidence (Gemini Skeptic pass on PR #15); verified against FastMCP source. what: - Wrap the three blocking `pane.capture_pane(...)` calls in src/libtmux_mcp/tools/pane_tools/wait.py with `asyncio.to_thread`: * `wait_for_text` polling call. * `wait_for_content_change` initial snapshot. * `wait_for_content_change` polling call. - Inline comment at the first site explains why (future maintainers will otherwise see "sync call inside async fn" as a bug and try to "fix" it by removing `async def`, reintroducing the blocking path). - No docstring changes — the `async def` signature already promises non-blocking semantics; this commit actually delivers on it. - Add `test_wait_tools_do_not_block_event_loop` in tests/test_pane_tools.py that runs `wait_for_text` against a never-matching pattern inside an asyncio.gather with a ticker coroutine. Asserts the ticker counter >= 5 over the 300 ms wait window — a blocking capture would leave it at 0 or 1. Deliberately a generous lower bound to stay robust on slow CI.
tony
added a commit
that referenced
this pull request
Apr 15, 2026
…_wait
why: The ``run_and_wait`` prompt hardcoded ``mcp_done`` as the tmux
wait-for channel name. tmux channels are server-global, so two
concurrent agents (or parallel prompt renderings from one agent)
would race: one agent's ``tmux wait-for -S mcp_done`` would
unblock another's pending ``wait_for_channel("mcp_done")``,
producing a false positive completion signal.
Flagged as Critical by GPT Builder in the Loom review of PR #15.
what:
- Import ``uuid`` in src/libtmux_mcp/prompts/recipes.py.
- Generate a fresh channel name at each call: ``libtmux_mcp_wait_<uuid4hex[:8]>``.
* Prefix aligns with the buffer-tools namespace (libtmux_mcp_) so a
future ``list_buffers(prefix="libtmux_mcp_")``-style operator sees
every MCP-owned tmux artifact uniformly.
* 8 hex characters (~32 bits) is safe against collision inside a
single tmux server's concurrent-agent population.
- Interpolate the same channel name into both the send_keys payload
and the wait_for_channel call so one prompt rendering remains
internally consistent.
- Update the prompt docstring to note the per-invocation scope.
- tests/test_prompts.py:
* Update ``test_run_and_wait_returns_string_template`` to pin the
new prefix.
* Add ``test_run_and_wait_channel_is_uuid_scoped`` asserting
(a) channel matches ``libtmux_mcp_wait_[0-9a-f]{8}``,
(b) send_keys and wait_for_channel use the same name within one
rendering,
(c) two separate renderings emit different channel names.
tony
added a commit
that referenced
this pull request
Apr 15, 2026
why: ``show_hook`` swallowed three tmux error substrings into an empty
result: ``"too many arguments"`` (correctly — that's how tmux
reports an unset hook), but also ``"unknown hook"`` and
``"invalid option"``. The latter two fire for *typos* and
*wrong-scope* mistakes, not "hook is unset". Agents sending
``show_hook("pane-esited")`` (typo) or
``show_hook("pane-exited", scope="server")`` (wrong scope) got
back an empty list indistinguishable from a correctly-named but
unset hook — masking the real input error.
Flagged as Important by both Gemini (Skeptic) and GPT (Builder)
in the Loom review of PR #15. This commit also lands the upstream
TODO comment for Suggestion #15 (libtmux's scope-kwarg argv bug).
what:
- Narrow the ``except libtmux_exc.OptionError`` branch in
src/libtmux_mcp/tools/hook_tools.py::show_hook to only match
``"too many arguments"`` — the single substring all tmux builds
supported by this project use for an unset hook. Any other
OptionError (``"unknown hook"``, ``"invalid option"``, new future
substrings) re-raises so ``handle_tool_errors`` surfaces it.
- Replace the block's comment with the reasoning so a future
maintainer knows not to re-broaden the catch.
- Add a TODO(libtmux upstream) comment in ``_resolve_hook_target``
explaining why the scope-nulling workaround exists. The fix
belongs in libtmux's argv-assembly path, not here.
- tests/test_hook_tools.py:
* Remove ``test_show_hook_missing_returns_empty`` (its use of
``"after-nonexistent-hook-cxyz"`` was actually exercising the
now-removed broad catch).
* Add ``test_show_hook_unset_known_hook_returns_empty`` that sets
and then unsets a real hook (``pane-exited``) before calling
``show_hook`` — exercises the narrow "too many arguments" path.
* Add ``test_show_hook_unknown_name_raises`` asserting that a
bogus hook name now surfaces as ``ToolError`` matching
``r"invalid option|unknown hook"`` (regression guard against
re-broadening the swallow).
tony
added a commit
that referenced
this pull request
Apr 15, 2026
…tmcp
why: ``fastmcp_model_classes`` in docs/conf.py is the allow-list that
``sphinx-autodoc-fastmcp`` uses to decide which Pydantic models
get rendered in the generated schema docs. After PR #15 added
seven new models, the tuple still listed only the original ten —
so the new schemas (``SearchPanesResult``, ``PaneSnapshot``,
``ContentChangeResult``, ``HookEntry``, ``HookListResult``,
``BufferRef``, ``BufferContent``) were invisible in the built
API docs, even though they're part of every tool surface that
was documented.
Flagged as Important by Gemini (Skeptic) in the Loom review.
what:
- Add the seven new model names to ``conf["fastmcp_model_classes"]``
in docs/conf.py. Keeps the existing order for the old entries;
new entries are inserted at semantically adjacent positions
(e.g. ``SearchPanesResult`` next to ``PaneContentMatch``,
``ContentChangeResult`` next to ``WaitForTextResult``,
``HookEntry`` / ``HookListResult`` grouped, ``BufferRef`` /
``BufferContent`` grouped).
- Verified post-build: all seven names appear in the generated
reference/api/models/index.html.
tony
added a commit
that referenced
this pull request
Apr 15, 2026
…ult shape why: PR #15 wrapped ``search_panes`` output in a ``SearchPanesResult`` model (matches + pagination fields) but the docs/tools/panes.md example still showed the pre-wrapper bare-array response shape. Three reviewers converged on this (Claude + Gemini + GPT in the Loom review, 3-way consensus). what: - Rewrite the response block in docs/tools/panes.md from: [ {pane_id: ..., matched_lines: [...]} ] to the actual wrapper: { matches: [...], truncated, truncated_panes, total_panes_matched, offset, limit } - Add a one-paragraph pagination hint above the example explaining how to iterate larger result sets via ``offset += len(matches)`` until ``truncated == false`` and ``truncated_panes == []``.
tony
added a commit
that referenced
this pull request
Apr 15, 2026
why: AGENTS.md §255-287 requires doctests on pure helpers. Most of
the pure helpers introduced by PR #15 have them
(``_validate_channel_name``, ``_validate_logical_name``,
``_validate_buffer_name``, ``_truncate_lines_tail``,
``_tmux_argv``, ``_pane_id_sort_key``). The critic pass surfaced
``_allocate_buffer_name`` as the lone remaining pure helper
without a doctest — an oversight when commit 272396d introduced
the module.
Loom review Suggestion #16 (finalized).
what:
- Expand the docstring on
src/libtmux_mcp/tools/buffer_tools.py::_allocate_buffer_name with
a NumPy-style ``Examples`` block covering:
* prefix contract (``"libtmux_mcp_"``),
* logical-name suffix preservation,
* 32-hex-character uuid nonce length,
* empty-string and ``None`` both collapsing to the ``"buf"``
fallback.
- The docstring also expands the rationale for why the name has the
exact shape it does (privacy prefix + collision-resistant nonce
+ logical suffix), so future maintainers don't reduce the
structure and accidentally re-introduce either the OS-clipboard
read risk or the cross-agent collision risk that commit 272396d
fixed.
65971d8 to
c714044
Compare
tony
added a commit
that referenced
this pull request
Apr 16, 2026
…thread around capture_pane why: In commit 67c3fc8 the wait tools became `async def` so that ctx.report_progress could be awaited during long polls. FastMCP direct-awaits async tools on the main event loop (`fastmcp/tools/function_tool.py:277-278`) — it only uses a threadpool for *sync* tools via `call_sync_fn_in_threadpool`. That means every blocking `pane.capture_pane()` subprocess.run inside the poll loop starves every other concurrent MCP request for the duration of the tmux roundtrip (~10s of ms per tick), for the full configured timeout. A 60-second `wait_for_text` call would pin the entire server for 60 seconds, delaying list_sessions / capture_pane / anything else. Loom code review surfaced this with 100% confidence (Gemini Skeptic pass on PR #15); verified against FastMCP source. what: - Wrap the three blocking `pane.capture_pane(...)` calls in src/libtmux_mcp/tools/pane_tools/wait.py with `asyncio.to_thread`: * `wait_for_text` polling call. * `wait_for_content_change` initial snapshot. * `wait_for_content_change` polling call. - Inline comment at the first site explains why (future maintainers will otherwise see "sync call inside async fn" as a bug and try to "fix" it by removing `async def`, reintroducing the blocking path). - No docstring changes — the `async def` signature already promises non-blocking semantics; this commit actually delivers on it. - Add `test_wait_tools_do_not_block_event_loop` in tests/test_pane_tools.py that runs `wait_for_text` against a never-matching pattern inside an asyncio.gather with a ticker coroutine. Asserts the ticker counter >= 5 over the 300 ms wait window — a blocking capture would leave it at 0 or 1. Deliberately a generous lower bound to stay robust on slow CI.
tony
added a commit
that referenced
this pull request
Apr 16, 2026
…_wait
why: The ``run_and_wait`` prompt hardcoded ``mcp_done`` as the tmux
wait-for channel name. tmux channels are server-global, so two
concurrent agents (or parallel prompt renderings from one agent)
would race: one agent's ``tmux wait-for -S mcp_done`` would
unblock another's pending ``wait_for_channel("mcp_done")``,
producing a false positive completion signal.
Flagged as Critical by GPT Builder in the Loom review of PR #15.
what:
- Import ``uuid`` in src/libtmux_mcp/prompts/recipes.py.
- Generate a fresh channel name at each call: ``libtmux_mcp_wait_<uuid4hex[:8]>``.
* Prefix aligns with the buffer-tools namespace (libtmux_mcp_) so a
future ``list_buffers(prefix="libtmux_mcp_")``-style operator sees
every MCP-owned tmux artifact uniformly.
* 8 hex characters (~32 bits) is safe against collision inside a
single tmux server's concurrent-agent population.
- Interpolate the same channel name into both the send_keys payload
and the wait_for_channel call so one prompt rendering remains
internally consistent.
- Update the prompt docstring to note the per-invocation scope.
- tests/test_prompts.py:
* Update ``test_run_and_wait_returns_string_template`` to pin the
new prefix.
* Add ``test_run_and_wait_channel_is_uuid_scoped`` asserting
(a) channel matches ``libtmux_mcp_wait_[0-9a-f]{8}``,
(b) send_keys and wait_for_channel use the same name within one
rendering,
(c) two separate renderings emit different channel names.
94c4044 to
bc4c2ed
Compare
tony
added a commit
that referenced
this pull request
Apr 16, 2026
why: ``show_hook`` swallowed three tmux error substrings into an empty
result: ``"too many arguments"`` (correctly — that's how tmux
reports an unset hook), but also ``"unknown hook"`` and
``"invalid option"``. The latter two fire for *typos* and
*wrong-scope* mistakes, not "hook is unset". Agents sending
``show_hook("pane-esited")`` (typo) or
``show_hook("pane-exited", scope="server")`` (wrong scope) got
back an empty list indistinguishable from a correctly-named but
unset hook — masking the real input error.
Flagged as Important by both Gemini (Skeptic) and GPT (Builder)
in the Loom review of PR #15. This commit also lands the upstream
TODO comment for Suggestion #15 (libtmux's scope-kwarg argv bug).
what:
- Narrow the ``except libtmux_exc.OptionError`` branch in
src/libtmux_mcp/tools/hook_tools.py::show_hook to only match
``"too many arguments"`` — the single substring all tmux builds
supported by this project use for an unset hook. Any other
OptionError (``"unknown hook"``, ``"invalid option"``, new future
substrings) re-raises so ``handle_tool_errors`` surfaces it.
- Replace the block's comment with the reasoning so a future
maintainer knows not to re-broaden the catch.
- Add a TODO(libtmux upstream) comment in ``_resolve_hook_target``
explaining why the scope-nulling workaround exists. The fix
belongs in libtmux's argv-assembly path, not here.
- tests/test_hook_tools.py:
* Remove ``test_show_hook_missing_returns_empty`` (its use of
``"after-nonexistent-hook-cxyz"`` was actually exercising the
now-removed broad catch).
* Add ``test_show_hook_unset_known_hook_returns_empty`` that sets
and then unsets a real hook (``pane-exited``) before calling
``show_hook`` — exercises the narrow "too many arguments" path.
* Add ``test_show_hook_unknown_name_raises`` asserting that a
bogus hook name now surfaces as ``ToolError`` matching
``r"invalid option|unknown hook"`` (regression guard against
re-broadening the swallow).
tony
added a commit
that referenced
this pull request
Apr 16, 2026
…tmcp
why: ``fastmcp_model_classes`` in docs/conf.py is the allow-list that
``sphinx-autodoc-fastmcp`` uses to decide which Pydantic models
get rendered in the generated schema docs. After PR #15 added
seven new models, the tuple still listed only the original ten —
so the new schemas (``SearchPanesResult``, ``PaneSnapshot``,
``ContentChangeResult``, ``HookEntry``, ``HookListResult``,
``BufferRef``, ``BufferContent``) were invisible in the built
API docs, even though they're part of every tool surface that
was documented.
Flagged as Important by Gemini (Skeptic) in the Loom review.
what:
- Add the seven new model names to ``conf["fastmcp_model_classes"]``
in docs/conf.py. Keeps the existing order for the old entries;
new entries are inserted at semantically adjacent positions
(e.g. ``SearchPanesResult`` next to ``PaneContentMatch``,
``ContentChangeResult`` next to ``WaitForTextResult``,
``HookEntry`` / ``HookListResult`` grouped, ``BufferRef`` /
``BufferContent`` grouped).
- Verified post-build: all seven names appear in the generated
reference/api/models/index.html.
tony
added a commit
that referenced
this pull request
Apr 16, 2026
…ult shape why: PR #15 wrapped ``search_panes`` output in a ``SearchPanesResult`` model (matches + pagination fields) but the docs/tools/panes.md example still showed the pre-wrapper bare-array response shape. Three reviewers converged on this (Claude + Gemini + GPT in the Loom review, 3-way consensus). what: - Rewrite the response block in docs/tools/panes.md from: [ {pane_id: ..., matched_lines: [...]} ] to the actual wrapper: { matches: [...], truncated, truncated_panes, total_panes_matched, offset, limit } - Add a one-paragraph pagination hint above the example explaining how to iterate larger result sets via ``offset += len(matches)`` until ``truncated == false`` and ``truncated_panes == []``.
tony
added a commit
that referenced
this pull request
Apr 16, 2026
why: AGENTS.md §255-287 requires doctests on pure helpers. Most of
the pure helpers introduced by PR #15 have them
(``_validate_channel_name``, ``_validate_logical_name``,
``_validate_buffer_name``, ``_truncate_lines_tail``,
``_tmux_argv``, ``_pane_id_sort_key``). The critic pass surfaced
``_allocate_buffer_name`` as the lone remaining pure helper
without a doctest — an oversight when commit 272396d introduced
the module.
Loom review Suggestion #16 (finalized).
what:
- Expand the docstring on
src/libtmux_mcp/tools/buffer_tools.py::_allocate_buffer_name with
a NumPy-style ``Examples`` block covering:
* prefix contract (``"libtmux_mcp_"``),
* logical-name suffix preservation,
* 32-hex-character uuid nonce length,
* empty-string and ``None`` both collapsing to the ``"buf"``
fallback.
- The docstring also expands the rationale for why the name has the
exact shape it does (privacy prefix + collision-resistant nonce
+ logical suffix), so future maintainers don't reduce the
structure and accidentally re-introduce either the OS-clipboard
read risk or the cross-agent collision risk that commit 272396d
fixed.
Previously _get_caller_pane_id read only TMUX_PANE. Pane IDs like %1 are only unique within a single tmux server, so the kill_server / kill_session / kill_window / kill_pane guards could either falsely block operations across unrelated sockets or (worse) falsely permit a self-kill when a colliding pane id existed on the target socket. Parse TMUX (format: socket_path,server_pid,session_id per tmux environ.c:281) alongside TMUX_PANE to produce a CallerIdentity, and compare socket paths via os.path.realpath against the target Server's effective socket path. When socket info is ambiguous (e.g. TMUX_PANE set but TMUX missing), err on the side of blocking — the error message already directs users to run tmux manually if the guard is a false positive. - _utils.py: add CallerIdentity, _get_caller_identity, _effective_socket_path, _caller_is_on_server. Keep _get_caller_pane_id as a thin wrapper so _serialize_pane's is_caller annotation is unchanged. - server_tools.kill_server: guard now fires only when caller's socket matches the target server (was: any TMUX_PANE blocked). - session_tools.kill_session / window_tools.kill_window / pane_tools.kill_pane: self-kill checks gated on socket match. - tests/conftest.py: autouse fixture clears TMUX/TMUX_PANE so host terminal doesn't leak into tests. - tests/test_server_tools.py: self-kill guard test now sets TMUX to the target socket; new test covers cross-socket path. - tests/test_utils.py: new coverage for parser, realpath match, cross-socket rejection, conservative fallback, and absent caller.
16 inline `from fastmcp.exceptions import ToolError` statements lived inside function bodies across _utils.py, server_tools.py, session_tools.py, window_tools.py, pane_tools.py, and option_tools.py. This was a legacy habit that hurt grep-ability and left the pane_tools module cluttered with identical import lines at 7 different indentation levels. Hoisted to a single module-level import per file. No behavior change.
Prior to this commit the MCP server had no per-invocation visibility.
Operators investigating a misbehaving agent's tool loop had to rely on
indirect tmux state or the fastmcp framework's own logs, and
payload-bearing tools like send_keys / paste_text / set_environment
left no audit trail at all.
Adds AuditMiddleware, a fastmcp Middleware that emits one structured
INFO record per tool call to the libtmux_mcp.audit logger. Each
record carries tool name, outcome (ok/error), error_type on failure,
duration_ms, the fastmcp client_id and request_id, and a summary of
the arguments. Arguments whose names are in the sensitive-name set
({keys, text, value}) are replaced with {len, sha256_prefix} so raw
send_keys payloads, pasted text, and environment values never reach
log files. Non-sensitive strings over 200 chars get truncated.
Registered alongside SafetyMiddleware in server.py so it runs for
every invocation whether or not the safety gate fires. The logger is
namespaced separately so deployments can route audit records to a
dedicated JSON sink without touching the module logger.
pane_tools.py had grown to 1315 lines — a kitchen-sink module covering I/O, capture, wait/poll, search, copy-mode, paste, resize, kill, select, swap, pipe, display, and snapshot all in one file. Large modules in this project are especially costly because each tool is AI-surface: reviewers need to audit behavior and docstrings together, and a flat 1.3k-line file makes the diff graph noisy when any one tool changes. Convert the module into a ``pane_tools/`` package with one file per operation domain: io.py send_keys, capture_pane, clear_pane, paste_text wait.py wait_for_text, wait_for_content_change search.py search_panes copy_mode.py enter_copy_mode, exit_copy_mode layout.py resize_pane, select_pane, swap_pane lifecycle.py kill_pane, set_pane_title, get_pane_info pipe.py pipe_pane meta.py display_message, snapshot_pane ``__init__.py`` re-exports every tool by name so ``from libtmux_mcp.tools.pane_tools import send_keys`` still works, preserving existing test imports and any downstream code that pinned to the flat namespace. The ``register()`` function is unchanged in behavior — all tool names, titles, annotations, and safety tags are identical so MCP clients see no surface change. No tool bodies were modified; this is a pure move. All 188 existing tests pass unchanged.
The prior mandate — "All functions and methods MUST have working doctests" — doesn't match this repo. Most functions here are MCP tool implementations that require a live tmux server; a doctest for ``send_keys`` or ``kill_pane`` has no way to run offline. That contradiction produced zero actual doctests in the codebase (the policy was aspirational rather than enforced). Rewrite the section to state plainly where doctests fit: * Required: pure helpers (parsers, formatters, redaction logic, small utilities) that run with no external state. * Exempt: anything touching tmux, subprocess, the filesystem, or env. Write a unit test with fixtures instead. Also soften the ``# doctest: +SKIP`` rule — rather than banning it outright, point readers at the right alternative (write a unit test) so the guidance matches actual practice. This is a policy-only change; pyproject.toml's doctest config will be updated separately so the infra and the doc don't get bundled.
The prior ``-p no:doctest`` in pyproject.toml's addopts explicitly disabled pytest's standard doctest plugin, even though AGENTS.md required doctests. That contradiction is now resolved — the policy in 7a scopes doctests to pure helpers, and this commit turns on collection and adds the first two. * pyproject.toml: swap ``-p no:doctest`` for ``--doctest-modules`` so `>>>` blocks in docstrings are actually executed. The pre-existing ``--doctest-docutils-modules`` flag (for sphinx-style ``.. doctest::`` directives) stays in place. * middleware.py: add doctests to ``_redact_digest`` and ``_summarize_args`` — both are pure, deterministic, and the natural place for an inline example of what the audit log will contain. Verified: collector shows two new DoctestItem entries, both pass; the broader suite goes from 188 to 190 tests with no existing failures.
The safety-tier page existed but didn't call out the two mutating-tier
tools with broader-than-expected reach: pipe_pane can write arbitrary
files, and set_environment mutates state that persists into every
shell tmux later spawns. Both share a safety tier with tools like
rename_window even though their blast radius is much larger.
Additions to docs/topics/safety.md:
* A new "Footguns inside the `mutating` tier" section covering
pipe_pane (arbitrary filesystem writes), set_environment (scope-wide
env mutation), and send_keys/paste_text (arbitrary shell execution).
Each entry states the risk plainly and suggests mitigations
(readonly tier, scoped home, `env VAR=value` via send_keys).
* An "Audit log" section documenting the libtmux_mcp.audit logger
added in the AuditMiddleware commit — what fields are recorded, how
sensitive payloads are redacted to a {len, sha256_prefix} digest,
and that operators should route the logger to a dedicated sink for
a durable trail.
* Corrected the self-kill-protection paragraph: it previously claimed
the guard used only TMUX_PANE. After the socket-aware fix it also
reads and parses TMUX, and this is now described so the doc matches
the code.
Docstring changes feed sphinx_autodoc_fastmcp so warnings also surface
in the auto-generated tool reference:
* src/libtmux_mcp/tools/pane_tools/pipe.py::pipe_pane — warns about
arbitrary filesystem writes and cross-links to /topics/safety.
* src/libtmux_mcp/tools/env_tools.py::set_environment — warns about
scope-wide env propagation and points at `env VAR=value command` via
send_keys as the narrower alternative.
Per the design decision: no safety-tier reclassification. pipe_pane
and set_environment remain in the `mutating` tier to avoid breaking
deployments that rely on the default; the warnings make the extra
reach visible so operators can pick `LIBTMUX_SAFETY=readonly` when
appropriate.
The prior patch landed ``t.cast("MiddlewareContext[t.Any]", ...)`` in
the audit-middleware tests to paper over four mypy errors. That's a
lie — ``SimpleNamespace`` isn't a ``MiddlewareContext``, and ``Any``
hides whatever breaks when fastmcp's internal shape shifts.
Investigation of the actual types:
* ``fastmcp.server.middleware.MiddlewareContext`` is a
``@dataclass(kw_only=True, frozen=True)`` generic over the message
type. Only ``message`` is required; everything else has a default.
* ``mcp.types.CallToolRequestParams`` is a pydantic BaseModel with
required ``name: str`` and optional ``arguments``. Direct
construction works.
* ``fastmcp.Context`` cannot be built without a live FastMCP server +
MCP request context. There is no lightweight fake — ``client_id``
and ``request_id`` are properties that read from the live request
context.
So: construct the real ``MiddlewareContext(message=CallToolRequestParams(...))``
and leave ``fastmcp_context=None``. The audit middleware already
handles that path (``if context.fastmcp_context is not None:``), so
logs carry ``client_id=None`` / ``request_id=None`` — which is the
exact production behavior when no MCP context is attached. Update the
``test_audit_middleware_logs_success`` assertion accordingly.
The only remaining ``t.Any`` is on ``args: dict[str, t.Any]`` in
``test_summarize_args_redacts_sensitive_keys`` and in the
``_fake_context`` ``arguments`` parameter. Both mirror the upstream
pydantic type ``CallToolRequestParams.arguments: dict[str, Any] | None``
— inherited from an external API, not ours to tighten.
Mypy across the full tree is clean. 190 tests pass.
CI reported 11 files failing ``ruff format --check`` — mostly blank lines inside multi-line import blocks left over from the earlier hoist / pane split. Running ``ruff format .`` normalizes them. Also drop the ``sphinx_autodoc_fastmcp`` entry from the mypy override block in pyproject.toml. Nothing in src/, tests/, or docs/ imports it (it appears only as a string in ``conf.py::extensions``), so the override was dead config. Mypy was flagging it on every run. The docutils entry stays since mypy still lists it as used when checking the full tree.
AGENTS.md prescribes top-level namespace imports for standard library modules. ``paste_text`` was importing ``subprocess`` and ``tempfile`` inside the function body — carried over from the old flat ``pane_tools.py`` through the recent split. An earlier commit (``c459378``) fixed an identical pattern for ``pathlib`` and cited the same rule; this closes the remaining two cases in io.py. No call-site changes: the body already uses namespace form (``subprocess.run``, ``tempfile.NamedTemporaryFile``, ``subprocess.CalledProcessError``).
…ffers Agents are supposed to call ``delete_buffer`` after use, but an interrupted call chain can leave ``libtmux_mcp_<uuid>_*`` entries lingering on the tmux server. The server-global paste-buffer namespace is shared across clients, so accumulated leaks are observable by the human user via ``tmux list-buffers``. Adds a ``_gc_mcp_buffers`` helper that runs on lifespan teardown: iterates every cached Server, lists buffer names, and deletes anything matching ``_MCP_BUFFER_PREFIX``. Best-effort and fully silent — tmux may be unreachable, buffers may vanish mid-scan, and none of that should block lifespan shutdown. Logs at debug level so operators can still surface leaks via verbose logging. Also adds "when to use" cross-references to the ``paste_text`` and ``load_buffer`` docstrings so agents can pick the right tool from the schema alone instead of having to read the narrative docs. paste_text is fire-and-forget; load_buffer is stage-then-fire with a handle for multi-paste or inspection. Note: FastMCP lifespan teardown runs on SIGTERM / SIGINT only; ``kill -9`` and OOM bypass it, so this GC is an ergonomics improvement, not a security boundary.
Two guards for the lifespan GC: * Seeds a real MCP-owned buffer via ``load_buffer`` alongside a manually set ``human_buffer`` and asserts only the prefixed one is deleted. The human-owned buffer MUST survive — that is where the OS clipboard sync lives, and deleting it would be a privacy regression. * A ``_BrokenServer`` stub whose ``cmd`` always raises proves the GC swallows exceptions rather than crashing lifespan shutdown.
Every tool already accepts ``socket_name`` for multi-server work
but there was no way to discover which servers actually exist —
agents had to guess names or rely on the caller to supply them.
Scans ``${TMUX_TMPDIR:-/tmp}/tmux-<uid>/`` (the canonical socket
location per tmux.c:209) and returns a ``list[ServerInfo]`` of
live servers. Stale socket inodes — the common case on
long-running machines where ``$TMUX_TMPDIR`` accumulates
thousands of orphans — are filtered via a cheap UNIX ``connect()``
liveness probe before the more expensive ``get_server_info``
call. Local testing: 17k stale inodes plus one live server, 50s
probe-every-socket ➜ 0.4s with connect-first filter.
Two guards: * The pytest-libtmux fixture socket appears in the discovery result with ``is_alive=True`` — proves the live-probe path. * A nonexistent ``$TMUX_TMPDIR`` returns an empty list without raising — proves graceful degradation on a freshly provisioned box where the user has never run tmux.
…apes Mirror of the read-heavy tool shape guards in ``test_server_tools.py`` and ``test_pane_tools.py`` — prompts and hierarchy resources wire to MCP clients as string bodies (prompt templates, JSON text, plain text), so a future refactor that accidentally returns a Pydantic model, dict, or None would silently break the MCP surface. Parametrized so every recipe and every hierarchy resource is checked without per-case boilerplate.
…ter screen-grabbing launches An agent following ``build_dev_workspace`` literally would deadlock: step 4 launched vim and a long-running log command in separate panes, then said "wait for the prompt via ``wait_for_text`` before moving on between each step." vim and ``watch`` / ``tail -f`` take over the full terminal — they never draw a shell prompt, so the wait blocks until timeout. Restructures the recipe around when a prompt IS a meaningful signal vs. when it is not: * New step 4: wait for the shell prompt in each pane BEFORE sending the launch command. This is a valid readiness check — the new panes start in the user's login shell. * New step 5: send the launch commands. * New step 6 (optional): use ``wait_for_content_change`` after launch to confirm the program drew its UI. Explicitly flagged as "strictly different from waiting for a shell prompt" so future editors don't revert to the deadlock wording. The prompt-alignment test from the previous round did not catch this because it validates parameter NAMES, not runtime semantics.
…mpt waits Regression test asserts the rendered recipe: * does NOT contain the stale "wait for the prompt … between each step" guidance that would deadlock agents. * DOES reference ``wait_for_content_change`` for post-launch UI confirmation. * still references ``wait_for_text`` (for the pre-launch shell- readiness check, where prompt matching is meaningful). Complements the existing prompt-alignment test, which only covers parameter names, by validating runtime guidance semantics.
The docstring claimed ``list_servers`` discovers "every live tmux
server on this machine", but the implementation only scans
``${TMUX_TMPDIR}/tmux-<uid>/``. Custom ``tmux -S /arbitrary/path``
daemons (systemd user units, CI runners, per-project ``.envrc``
paths) live outside that directory and were invisible — the
docstring overclaimed.
Two changes:
* Correct the docstring to say "under the current user's
``$TMUX_TMPDIR``" and spell out the custom-path caveat so agents
do not assume canonical-only sockets exist.
* Add an optional ``extra_socket_paths: list[str] | None`` parameter
so callers who know about arbitrary paths can supplement the scan.
Each path runs through the same ``_is_tmux_socket_live`` probe and
socket-metadata query as canonical entries. Nonexistent or
non-socket paths are silently skipped so stale user-supplied lists
don't crash the call.
Factors the server-info construction into a private
``_probe_server_by_path`` helper that mirrors ``get_server_info`` but
keys the Server by socket_path (-S) instead of socket_name (-L), so
the extras path can use an arbitrary filesystem location.
…afety Two regression guards for the new ``extra_socket_paths`` surface: * Supplying the fixture socket as an extra path makes it appear in the result even when the canonical ``$TMUX_TMPDIR`` scan is pointed at an empty directory — proves the extras code path is exercised in isolation from the canonical scan. * A list containing a nonexistent path and a regular file (not a socket) returns an empty result without raising — proves stale user-supplied entries degrade gracefully.
The L1 fix narrowed the macOS ``$TMUX_TMPDIR`` gap by preferring
``tmux display-message -p '#{socket_path}'`` over env-reconstruction,
but ``server.cmd`` invokes tmux with the MCP process's own
environment. When the MCP process's ``$TMUX_TMPDIR`` diverges from
the real tmux server's (macOS launchd case), the query fails:
* MCP process env: ``TMUX_TMPDIR=/wrong/path``
* Real tmux lives at ``/correct/path/tmux-1000/mysocket``
* Caller's ``$TMUX=/correct/path/tmux-1000/mysocket,…``
* Query invokes ``tmux -L mysocket display-message``; tmux uses
``/wrong/path`` and can't find the server; query raises.
* Reconstruction fills in ``/wrong/path/tmux-1000/mysocket``.
* ``_caller_is_on_server`` realpath compares bogus vs. real path:
different. Guard degrades to no-op.
Adds a conservative basename fallback after the realpath check: if
the basename of the caller's socket path equals the target's declared
``socket_name`` (or ``"default"``), the guard still fires. Both names
are authoritative on their respective sides and neither is subject to
``$TMUX_TMPDIR`` divergence, so this is a safety-preserving
last-chance match.
Trade-off acknowledged in the docstring: exotic false positive when
two daemons share a socket_name under different tmpdirs. Prefer that
over silently permitting self-kill under env mismatch.
…gence Regression guard for the macOS launchd scenario: MCP process has a wrong ``$TMUX_TMPDIR``, display-message query fails, reconstruction produces a path that mismatches the caller's ``$TMUX`` realpath, but the basename (socket name) still matches. The guard must still fire. The test forces: * ``$TMUX_TMPDIR`` on the MCP process points at a nonexistent dir. * Monkeypatches ``server.cmd`` so the display-message query raises. * Sets ``$TMUX`` to a path in a "real" tmpdir whose basename is the target's socket_name. Asserts ``_caller_is_on_server`` returns True — proving the basename fallback closes the env-mismatch gap without requiring the realpath comparison to succeed.
…r from build_dev_workspace
Two residual ergonomic problems in the recipe that F1's rewrite did
not catch, both confirmed by reading the underlying source:
**Fancy-shell hang.** Step 4 emitted a literal
``wait_for_text(pattern=r"\$ |\# |\% ", regex=True, timeout=5.0)``.
The regex only matches default bash/zsh prompts; zsh+oh-my-zsh
(``➜``), starship (``❯``), pure / p10k (Unicode glyphs), and fish
(``>``) all miss. Agents on these setups burn 5s × 3 panes = 15s
of ceremony every time the recipe runs.
Reading tmux source confirms the wait was unnecessary defensive
ceremony: ``cmd-split-window.c:151`` → ``spawn.c:382,494`` allocate
the PTY and wire the bufferevent before split-window returns.
``input-keys.c:418`` writes send-keys bytes straight into the
kernel PTY buffer via ``bufferevent_write`` with no readiness
check. The kernel holds bytes until the slave reads. A
``send_keys("vim")`` right after ``split_window`` is safe whether
or not the shell has finished drawing its prompt — the shell will
pick up the queued bytes on its first stdin read.
**Stray Enter in pane B.** Step 5 emitted
``send_keys(pane_id="%B", keys="")`` with a comment
"(leave the shell idle)". libtmux's ``Pane.send_keys``
(``pane.py:423-475``) runs ``send-keys ""`` *then* ``self.enter()``
which is ``send-keys Enter``. libtmux's own test
(``tests/test_pane.py:128``) confirms an Enter is sent. The shell
was already idle after the split; the line did nothing useful
while actively misleading anyone reading it.
Net change: ~15 LOC removed, ~3 LOC added. Recipe body is shorter,
shell-agnostic, and honest about what it does. The step-6
``wait_for_content_change`` block is kept (renumbered to step 5) —
it is still the right primitive for post-launch UI confirmation.
Also updates the existing
``test_build_dev_workspace_does_not_deadlock_on_screen_grabbers``
test to drop the now-obsolete ``wait_for_text`` assertion; a
targeted regression guard for the two removed lines follows in a
separate test commit.
…regex or stray Enter Targeted regression guard for the two residual ergonomic issues fixed in the preceding commit: * ``\$ |\# |\% `` regex must be absent — hangs starship / oh-my-zsh / pure / p10k / fish users for 5s × 3 panes = 15s of ceremony with no benefit. * ``send_keys(pane_id="%B", keys="")`` must be absent — libtmux emits an Enter keystroke when ``enter=True`` (the default), so the line lied about being a no-op. Positive pin on ``wait_for_content_change`` so the post-launch UI confirmation guidance stays in the recipe; it's a shell-agnostic "did the screen change since we captured it?" check that works across every shell, editor, and tailing program combination. Follows the established ``in``/``not in`` assertion style from the neighbouring ``test_build_dev_workspace_does_not_deadlock_on_screen_grabbers`` test.
``get_server_info`` swallowed any ``display-message #{version}``
failure with a bare ``except Exception: pass`` — pre-existing from
the initial release, not from this branch. The sibling helper
``_probe_server_by_path`` in the same file now logs the analogous
failure at debug level, so the two were drifting in style.
Adds a single ``logger.debug`` so operators debugging "why does my
tmux server report version=None?" get a uniform signal across both
code paths. Semantics are unchanged (version still falls back to
None on failure, callers never see the exception).
…_session The ``del mcp_session`` pattern (12 sites across four test files) signalled "I need this fixture's side effect but not its value." That is the textbook use case for ``@pytest.mark.usefixtures``, which pytest's own test suite uses 16+ times and recommends as the canonical form — cross-checked in ~/study/python/pytest/testing/ and in upstream pytest docs. libtmux and fastmcp also follow the decorator form; libtmux-mcp was an outlier. Each converted test now: * Declares the required fixture via a decorator (explicit intent at function definition). * Drops the unused ``mcp_session: Session`` parameter. * Drops the ``del mcp_session`` ceremony from the body. Three files also drop the now-unused ``Session`` import from their ``TYPE_CHECKING`` block (``test_buffer_tools.py``, ``test_server.py``, ``test_wait_for_tools.py``). ``test_server_tools.py`` keeps the import — ``test_list_sessions`` genuinely uses ``mcp_session``. Behaviour unchanged; this is pure readability + idiom cleanup.
…t loops Regression guard for MCP cancellation semantics. The two wait tools share a ``while True:`` poll-and-sleep pattern wrapped by ``handle_tool_errors_async`` (``_utils.py:827-850``), which catches ``Exception`` (not ``BaseException``). ``asyncio.CancelledError`` inherits from ``BaseException`` (Python 3.8+), so it propagates through the decorator today. This test locks that in: if a future change broadens the decorator to ``BaseException`` it would silently swallow MCP cancellation requests, agents would burn tokens waiting for cancelled tools to return, and operators would have no signal that anything went wrong. Uses ``task.cancel()`` rather than ``asyncio.wait_for`` so the raised exception is the inner ``CancelledError`` directly, not ``wait_for``'s ``TimeoutError`` wrapper. ``wait_for_text`` and ``wait_for_content_change`` get sibling tests since both go through the same decorator path.
…elled MCP cancellation works correctly today — ``CancelledError`` is a ``BaseException`` (Python 3.8+) and ``handle_tool_errors_async`` catches ``Exception``, so cancellation propagates untouched. But operators have no signal that a wait was cancelled vs. timed out — the two outcomes look identical at the log layer. Wraps each ``while True:`` poll loop in ``wait_for_text`` and ``wait_for_content_change`` with an explicit ``except asyncio.CancelledError: logger.debug(...); raise``. Three benefits: * Documents intent — readers see that cancellation is handled intentionally, not by accident of inheritance. * Records the cancel-time elapsed seconds and pane id at debug level so post-mortem log analysis can distinguish "cancelled at 150 ms" from "timed out at 8 s". * Re-raise preserves MCP semantics — never returns a partial ``WaitForTextResult`` / ``ContentChangeResult`` that would mask the cancellation as a timed-out wait. The per-pair regression test added in the preceding commit pins ``CancelledError`` propagation; this commit only adds diagnostic visibility.
…t timeouts Adopts FastMCP's structured client logging (``ctx.info/warning/error`` at ``fastmcp/server/context.py:716-778``) so MCP clients receive ``notifications/message`` events for the two diagnostic moments operators currently can't see: * Invalid regex pattern compiled in ``wait_for_text`` — previously the agent saw only a generic ``ToolError``; now the client also receives a structured warning, which surfaces in client log panels independent of the tool result. * Wait loop reaches its timeout in either ``wait_for_text`` or ``wait_for_content_change`` — previously identical to a "found" result at the log layer (only the boolean ``timed_out`` field distinguished them); now the client gets an explicit warning with pane id and timeout duration. Adds a ``_maybe_log`` helper sibling to ``_maybe_report_progress`` sharing the same ``_TRANSPORT_CLOSED_EXCEPTIONS`` suppression contract so a hung-up client doesn't take down the tool, while programming errors stay loud per the regression guard ``test_wait_for_text_propagates_unexpected_progress_error``. Updates the existing ``_StubContext`` and ``_BrokenContext`` test stubs in ``tests/test_pane_tools.py`` to declare ``warning`` methods — the stubs need to mock the full Context surface the wait loop now touches. ``_BrokenContext.warning`` raises the same ``anyio.BrokenResourceError`` as its ``report_progress`` so the existing transport-closed regression guard also covers the new log path. Per-pair regression test in the next commit asserts the warnings fire at both sites with the right level.
… and timeout Three regression guards for the new ``_maybe_log`` call sites added in the preceding commit. Each builds a tiny ``_RecordingContext`` that captures ``(level, message)`` tuples — the same recording-stub style used by the existing progress tests, kept inline per test so each assertion stays self-contained: * ``test_wait_for_text_warns_on_invalid_regex`` — ``regex=True`` with an unclosed bracket raises ``ToolError`` AND records a warning containing "Invalid regex". * ``test_wait_for_text_warns_on_timeout`` — never-matching pattern with a 200 ms timeout records a warning containing "timeout". * ``test_wait_for_content_change_warns_on_timeout`` — settled pane with a 500 ms timeout records the equivalent warning. Reuses the active settle-loop pattern from ``test_wait_for_content_change_timeout`` so the test stays deterministic on slow CI rather than relying on a fixed sleep. The tests assert on substring presence rather than exact message text so the warning copy can evolve without forcing a test update.
…AG_READONLY Adopts FastMCP's ``RetryMiddleware`` (``fastmcp/server/middleware/error_handling.py:136``) but bounded by the existing safety-tier annotation. The ``_server_cache`` already evicts dead Server instances on ``is_alive() == False``, but the *first* call after a tmux daemon crash still fails before the cache learns about it. A single retry on the freshly-cached Server closes that window without changing tool semantics. Critical safety constraint: only readonly tools are retried. Mutating tools like ``send_keys``, ``create_session``, ``split_window`` are NOT idempotent — re-running them on a transient socket error would silently double side effects, which is unacceptable. The wrapper ``ReadonlyRetryMiddleware`` reads ``tool.tags`` via ``context.fastmcp_context.fastmcp.get_tool(name)`` (same pattern ``SafetyMiddleware`` uses) and only delegates to the upstream retry when ``TAG_READONLY`` is present. Defaults are deliberately small: ``max_retries=1`` keeps audit log noise minimal (one extra attempt, not three); ``base_delay 0.1s`` / ``max_delay 1.0s`` matches the duration window of a transient libtmux socket hiccup. ``retry_exceptions= (libtmux.exc.LibTmuxException,)`` is narrow on purpose — the upstream default ``(ConnectionError, TimeoutError)`` does NOT match libtmux's wrapped subprocess errors and would be a silent no-op. Wired into the middleware stack between ``AuditMiddleware`` and ``SafetyMiddleware`` so retries are audited once each and tier-denied tools never reach retry. Updates the existing ``test_server_middleware_stack_order`` to lock in the new position. Per-pair regression test in the next commit asserts: readonly tool retries on ``LibTmuxException``, mutating tool does not retry, non-libtmux exception does not retry.
…y tools only Three regression guards for the new safety-scoped retry wrapper: * ``test_readonly_retry_recovers_from_libtmux_exception`` — readonly tool raises ``LibTmuxException`` once then succeeds; assert the middleware retried (call count 2) and returned the success. Models the production scenario: a transient socket error on the first call after a tmux daemon restart, recovered by the retry on the freshly-cached Server. * ``test_readonly_retry_skips_mutating_tool`` — mutating tool with the same exception; assert NO retry (call count 1) and the exception propagates. This is the critical safety property — re-running ``send_keys`` or ``create_session`` on a transient error would silently double the side effect. * ``test_readonly_retry_skips_non_libtmux_exception`` — readonly tool raises ``ValueError``; assert NO retry. Pins the narrow ``retry_exceptions=(LibTmuxException,)`` default so a future refactor that broadens the trigger set would fail loudly. Tests use minimal ``_StubTool`` / ``_StubFastMCP`` / ``_StubFastMCPContext`` stand-ins to satisfy the ``context.fastmcp_context.fastmcp.get_tool(name)`` lookup the middleware uses for tag inspection, plus a ``_FlakyCallNext`` helper that raises N times before succeeding.
… fix CI race CI failure on `8dc7c81`: ``tests/test_pane_tools.py::test_wait_for_content_change_propagates_cancellation`` ``Failed: DID NOT RAISE <class 'asyncio.exceptions.CancelledError'>`` across all six tmux versions (3.2a, 3.3a, 3.4, 3.5, 3.6, master). Root cause: ``wait_for_content_change`` exits the poll loop when ``current != initial_content``. On the GitHub Actions Ubuntu runner the shell prompt redraws (cursor blink, zsh async hooks like vcs_info) within the test's 100 ms ``asyncio.sleep`` window, so ``changed=True`` fires and the tool returns a normal ``ContentChangeResult`` before the ``task.cancel()`` arrives. Local runs on an idle pane never tripped this — classic CI-only race. Fix: monkeypatch ``Pane.capture_pane`` to always return the same line list. The diff branch can never trigger, so the loop stays in ``await asyncio.sleep(interval)`` indefinitely and the cancel unambiguously interrupts it. The test still exercises the production code path that matters — that ``CancelledError`` propagates through ``handle_tool_errors_async`` without being mapped to ``ToolError``.
The shipped ``ReadonlyRetryMiddleware`` (commit ``bd37f29``) is a production no-op under fastmcp 3.2.3: - ``RetryMiddleware._should_retry`` only checks ``isinstance(error, retry_exceptions)`` — does NOT walk ``__cause__``. - ``handle_tool_errors_async`` (``_utils.py:842-850``) wraps every ``LibTmuxException`` as ``ToolError(...) from LibTmuxException``. - At the middleware layer the exception type is ``ToolError``, not ``LibTmuxException``, so ``_should_retry`` returns ``False`` and the retry never fires in production. The shipped tests in ``tests/test_middleware.py:475+`` only pass because ``_FlakyCallNext`` raises ``LibTmuxException`` directly, bypassing the production decorator wrap path. fastmcp v3.2.4 commit ``031c7e03`` (PR #3858 "Fix RetryMiddleware not retrying tool errors") adds the ``__cause__`` walk to ``_should_retry``. Reproduced the production-flow bug under 3.2.3 with a 25-line harness (calls=1) and verified the fix on 3.2.4 (calls=2 with max_retries=1). Free win that ships in the same bump: PR #3872 (commit ``f3c00ba1`` "Extract parameter descriptions from docstrings") adds an automatic NumPy/Google/Sphinx docstring parser that populates per-parameter ``inputSchema.description`` fields. The project's tools already use NumPy ``Parameters\n----`` style throughout (verified ``wait.py``, ``_utils.py``, ``middleware.py``), so per-parameter descriptions become visible to LLM clients with zero code changes. Next commit adds an integration-style regression test that calls a real ``@handle_tool_errors``-decorated tool through the middleware stack — catches the production no-op AND any future regression where someone changes the decorator's exception wrapping.
… decorator Integration-style regression guard for the production code path the existing unit tests miss. The three sibling tests above use ``_FlakyCallNext`` to raise ``LibTmuxException`` directly; in production every tool is wrapped by ``handle_tool_errors`` / ``handle_tool_errors_async`` which converts ``LibTmuxException`` to ``ToolError(...) from LibTmuxException``. At the middleware layer the exception type is ``ToolError``, not ``LibTmuxException`` — so the retry decision must walk ``__cause__`` to see the real failure type. The new test invokes the real ``list_sessions`` tool through ``ReadonlyRetryMiddleware.on_call_tool``, exercising the decorator wrap path that broke under fastmcp 3.2.3. Monkeypatches ``Server.sessions`` to raise ``LibTmuxException`` once then return ``[]`` so the test runs without a live tmux. Asserts ``calls == 2`` with a friendly error message naming the likely cause (fastmcp older than 3.2.4) so an accidental dep downgrade fails loudly. Catches both: (a) the production no-op the previous fastmcp pin allowed, and (b) any future regression where someone changes the decorator's exception wrapping in a way that defeats ``__cause__`` walking.
…tream consistency
All three round-2 weave:review reviewers flagged that
``ReadonlyRetryMiddleware``'s internal logger defaulted to
``fastmcp.retry`` (the upstream stock at ``error_handling.py:181``)
instead of the project's ``libtmux_mcp.*`` namespace. Operators
routing the audit stream by namespace prefix would silently miss
retry events.
Set the default explicitly: when ``logger_`` is ``None``, construct
``logging.getLogger("libtmux_mcp.retry")`` before delegating to
fastmcp's ``RetryMiddleware``. The ``if logger_ is None:`` pattern
avoids evaluating ``getLogger`` in the default expression.
Pinned in tests via ``test_readonly_retry_logger_uses_project_namespace``
asserting ``middleware._retry.logger.name == "libtmux_mcp.retry"``,
so a future refactor that drops the default fails loudly.
The new ``list_servers`` tool (added earlier in the branch) wasn't
documented anywhere under ``docs/`` — agents and human readers
discovering the tool from a client's tool list had no docs page to
land on. The conf.py area-map already routes ``server_tools`` →
``sessions``, so the natural home is ``docs/tools/sessions.md``
between ``get_server_info`` and ``create_session``.
Adds the standard four-part section: ``{fastmcp-tool}`` directive,
Use-when / Avoid-when / Side-effects narrative, an example JSON
request + response (including a second example with
``extra_socket_paths`` for the custom-path case), and the
``{fastmcp-tool-input}`` schema directive. Calls out the scope
caveat (custom ``tmux -S /path/...`` daemons are not in the
canonical scan) up front so readers don't reach for the tool in
the wrong situation.
Also adds a grid-item-card to ``docs/tools/index.md`` so the
discovery overview links to the new section.
The four prompts shipped this branch (``run_and_wait``,
``diagnose_failing_pane``, ``build_dev_workspace``,
``interrupt_gracefully``) had zero docs coverage — no
``{fastmcp-prompt}`` autodoc directive (the
sphinx-autodoc-fastmcp extension only ships ``{fastmcp-tool}``,
``{fastmcp-tool-input}``, ``{fastmcp-tool-summary}``; verified
by reading the installed extension's ``__init__.py``), and
``docs/topics/prompting.md`` covers general agent-prompting
heuristics rather than the project's MCP-registered prompts.
Adds ``docs/tools/prompts.md`` — a hand-written page (since
autodoc isn't available for prompts) with one section per recipe
covering: when to use it, why it exists vs. the obvious
alternative tool calls, parameters, and a sample render at one
representative argument set. The samples use stable
representative output (``<uuid>`` placeholder for the wait-for
channel) so the page doesn't churn on every regeneration.
Wires the new page into the docs toctree under "Use it" between
``tools/index`` and ``recipes`` (the latter is the longer
narrative agent-reasoning page, conceptually distinct from these
short MCP-protocol prompts). Adds a one-paragraph cross-reference
from ``docs/tools/index.md``'s decision tree so readers
discovering the tool surface find the prompts naturally.
…he new pane id The ``create_session`` response gained an ``active_pane_id`` field earlier in this branch (commit ``3f092bb``), but the docs still showed the pre-change response shape — readers landing on ``sessions.md`` had no signal that this field exists or what they'd do with it. Updates the example response to include the field, then adds a brief tip explaining the workflow win: callers can target subsequent ``send_keys`` / ``split_window`` / ``capture_pane`` calls against ``active_pane_id`` directly without a follow-up ``list_panes`` round-trip. Saves one MCP call in the most common "create a session, then act on it" workflow that prompts like ``build_dev_workspace`` already exploit. The full field surface is auto-rendered in ``docs/reference/api/models.md`` via the ``fastmcp_model_classes`` registration — this commit just adds the narrative discoverability that explains why the field exists.
bc4c2ed to
8c76b87
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
29 commits on
2026-04-follow-ups, addressed in two waves:ruff,mypy,pytest --reruns 0,just build-docs) ran green before each commit landed. Test count grew from 186 → 276 (+90 regression-guard tests).Quality gate
--reruns 0(overriding the project default of 2) was the standard for the entire 29-commit series, so flakes are surfaced not masked. The one timing-sensitive test that surfaced (test_wait_for_content_change_timeout) was stabilized in commit 13 with an active settle loop before any new feature work landed.Phase 1 — original review follow-ups
fix(caller-identity)TMUXenv var; self-kill guards now compare the caller's socket path to the target serverrefactor(imports)from fastmcp.exceptions import ToolErrorto module levelfeat(observability)AuditMiddleware— one structured record per tool call, with redacted digests forkeys/text/valuerefactor(pane)pane_tools.pyinto apane_tools/package (io, wait, search, copy_mode, layout, lifecycle, pipe, meta). Backwards-compatible re-exportsdocs(agents)test(doctest)-p no:doctest, added--doctest-modules, seeded doctests on_redact_digestand_summarize_argsdocs(safety)pipe_pane/set_environment/send_keys; documented the audit log; docstring warnings surface viasphinx_autodoc_fastmcptest(middleware)MiddlewareContext+CallToolRequestParamsstylerefactor(pane_tools[io])refactor(pane_tools[wait])docs(_utils)_caller_is_on_serverdocstring with codePhase 2 — FastMCP alignment
Driven by a multi-model brainstorm session (Claude × 3 / Gemini × 3 / GPT × 3, three refinement passes) that audited libtmux-mcp against FastMCP, libtmux, and tmux. The session produced a 17-commit plan with explicit priority tiers (
P1–P3) and four "deliberately not recommended" items.Quality gate precursor
test(pane_tools[wait]): stabilizetest_wait_for_content_change_timeoutunder--reruns=0Active 3-streak settle loop replaces a fixed sleep so zsh async hooks (vcs_info, git prompt) don't trip the no-change assertion. Lets the rest of the series run under the strict gate without flake masking.
PR group A — Preconditions (correctness fixes)
mcp(fix[_utils,annotations]):ANNOTATIONS_SHELLfor open-world shell toolssend_keys,paste_text,pipe_panewere sharingANNOTATIONS_CREATEwhich setopenWorldHint=False. Lying to MCP clients. Fix is a new preset, not a global flip — same shared dict is also used bycreate_session/create_window/split_window/swap_pane/enter_copy_mode, none of which are open-world.mcp(docs[server,instructions]): surfacedisplay_message,snapshot_pane,wait_for_*in_BASE_INSTRUCTIONSThe most token-efficient read tools were never named in the agent-facing instructions, so agents defaulted to
capture_paneretry loops.mcp(fix[pane_tools/io]): boundcapture_panewithmax_linesand in-band tail-preserving truncationDefault
max_lines=500. Prefix"[... truncated K lines ...]\n"when trimmed.max_lines=Noneopts out for callers that genuinely need full scrollback. Tail-preserving because the active prompt lives at the bottom.mcp(fix[pane_tools/meta,models]): boundsnapshot_panewithmax_linesand typedcontent_truncatedfieldsSame semantics as docs: restore prompt admonition styling and rebuild copy buttons after gp-sphinx SPA swap #16 but surfaced via two new
PaneSnapshotfields (content_truncated: bool,content_truncated_lines: int) rather than an in-band header —PaneSnapshotis already a Pydantic model so the typed channel is the cleaner shape.mcp(fix[pane_tools/search,models]): paginatesearch_panesviaSearchPanesResultwrapperBreaking change:
list[PaneContentMatch]→SearchPanesResultwithmatches,truncated,truncated_panes,total_panes_matched,offset,limit. Per-pane cap onmatched_linesplus pane-level pagination so truncation doesn't mean permanent blindness to later matches.PR group B — Context + wait-for
mcp(refactor[_utils]): addhandle_tool_errors_asyncfor Context-using toolsPrerequisite for #20. Sync
handle_tool_errorsis unchanged; async variant funnels through the same exception-to-ToolErrormapping table.mcp(feat[pane_tools/wait]):Context.report_progressinwait_for_textandwait_for_content_changeBoth wait tools converted to
async defwith actx: Context | Noneparameter. Replacesretry_untilwith an explicit asyncio loop that emits progress on every tick. Noelicit— FastMCP 3.1 has no first-class elicitation capability check (capabilities.samplingis the wrong predicate; it would hang headless CI agents).mcp(feat[wait_for_tools]): tmuxwait-forchannel tools with bounded timeoutNew
wait_for_channel(channel, timeout=30.0)andsignal_channel(channel).subprocess.run(timeout=…)is mandatory —tmux wait-forblocks indefinitely at the OS level. Tool docstring teaches the safestatus=$?; tmux wait-for -S done; exit $statuscomposition pattern so a crashed shell command doesn't pay the full timeout penalty.PR group C — Structured outputs + resources
mcp(test[tools]): regression-guard read-heavy tools againststr-return driftAudit found
get_pane_info/get_server_info/list_*/snapshot_panealready return Pydantic models. New regression tests pin that so a future refactor can't silently drop the MCPoutputSchemaby flattening one back tostr.mcp(refactor[resources/hierarchy]): declaremime_typeon all hierarchy resourcesFive JSON resources get
mime_type="application/json";get_pane_contentgetsmime_type="text/plain". (FastMCP requires resources to returnstr | bytes | list[ResourceContent], so thejson.dumpspayload stays — but clients now see the correct MIME tag.)PR group D — New tool families
mcp(feat[buffer_tools]): agent-namespaced paste buffers with UUID nonceload_buffer/paste_buffer/show_buffer/delete_buffer. Every call allocateslibtmux_mcp_<uuid4hex>_<logical>so concurrent agents (or parallel tool calls) cannot collide on the server-global buffer namespace.list_buffersis intentionally NOT exposed — OS clipboard sync into tmux buffers makes a generic enumeration a privacy footgun._SENSITIVE_ARG_NAMESextended forcontent.mcp(feat[hook_tools]): read-onlyshow_hookandshow_hooksWraps libtmux's
HooksMixin.show_hook/show_hookswith aHookListResultmodel that flattens scalar / dict /SparseArrayshapes. Write-hooks (set_hook/unset_hook) are deferred indefinitely — FastMCPlifespanteardown does not run onkill -9/OOM/C-extension fault, so any soft cleanup registry would leak agent-installed shell hooks into the user's persistent tmux server. The module docstring documents this and names three future paths.mcp(feat[prompts]): narrow prompts module withPromptsAsToolsfallbackFour prompts (
run_and_wait,diagnose_failing_pane,build_dev_workspace,interrupt_gracefully) — the canonical recipes fromdocs/topics/prompting.md, encoded as protocol-level prompts.LIBTMUX_MCP_PROMPTS_AS_TOOLS=1env-gates thePromptsAsToolstransform for tool-only clients.PR group E — Middleware + lifespan
mcp(feat[middleware]):TailPreservingResponseLimitingMiddlewaresubclassFastMCP's stock
ResponseLimitingMiddlewaretruncates the tail of oversized responses, which destroys the active shell prompt for terminal scrollback. Subclassed to drop the head and prefix the same[... truncated K bytes ...]\nheader the per-tool caps use.max_size=50_000(strictly above per-tool caps so it's a backstop, not a primary cap), scoped tocapture_pane/search_panes/snapshot_pane.mcp(feat[middleware,server]): addTimingMiddlewareandErrorHandlingMiddlewareStack order pinned by test:
Timing → TailPreservingResponseLimiting → ErrorHandling(transform_errors=True) → Safety → Audit.ErrorHandlingMiddleware(transform_errors=True)mapsResourceErrorfromresources/hierarchy.pyto MCP code-32002.mcp(feat[server,_utils]): FastMCPlifespanwith tmux presence check and cache cleanupshutil.which("tmux")probe at startup converts a missing-tmux failure from a confusing post-handshakeTmuxCommandNotFoundinto a clean cold-startRuntimeError. Clears_server_cacheon graceful shutdown (SIGTERM/SIGINT only — by design, not relied on for any invariant that must survive hard crashes; see commit 25's docstring).Deliberately NOT in this PR
Each was considered in the brainstorm and explicitly rejected with reasoning documented in the relevant module's docstring:
set_hook/unset_hook). Three future paths documented inhook_tools.pymodule docstring; none in scope here.tmux -Cevent bridge). Per-byte%outputevents across all panes would firehose Python's asyncio loop and starve MCP protocol handlers. If pursued later, must start fromrefresh-client -Bper-subscription with bounded queue + per-pane rate limit, not blanketattach.ctx.elicitfor destructive-tool confirmation. No reliable client-capability check exists in FastMCP 3.1;capabilities.samplingis the wrong predicate (it describes server→client LLM requests, not human UI). Tier gate + self-kill guards remain the hard boundary.list_buffers. OS clipboard sync makes it a privacy footgun.ResponseCachingMiddleware,RateLimitingMiddleware,ctx.sample, HTTP transport, MCPB bundling. Architectural mis-fits; rejected in the brainstorm.Diff size
Test plan
uv run ruff check .— cleanuv run ruff format --check .— cleanuv run mypy— cleanuv run py.test --reruns 0 -vvv— 276/276 passjust build-docs— succeededtmuxtemporarily renamed onPATH; confirm clean cold-startRuntimeErrorfrom the lifespan probemcp__libtmux__capture_paneagainst a >50 KB scrollback pane withmax_lines=None; confirmTailPreservingResponseLimitingMiddlewaretruncates the head, preserves the tail, prefixes the headermcp__libtmux__search_paneswithoffset/limitpagination across multiple matching paneswait_for_channel+signal_channelend-to-end in a real tmux sessionLIBTMUX_MCP_PROMPTS_AS_TOOLS=1and confirm prompts appear in the tool listget_session); confirm MCP error code-32002fromErrorHandlingMiddlewareNotes for reviewers
/home/d/.claude/plans/memoized-pondering-sparkle.md(local artifact, not committed). Each commit body cites the brainstorm finding it acts on.