Skip to content

Expand AI agent detection in user-agent#768

Open
simonfaltum wants to merge 5 commits intomainfrom
simonfaltum/agent-detection-expand
Open

Expand AI agent detection in user-agent#768
simonfaltum wants to merge 5 commits intomainfrom
simonfaltum/agent-detection-expand

Conversation

@simonfaltum
Copy link
Copy Markdown
Member

Why

We report an agent/<name> segment in the SDK user-agent when we can identify an AI coding agent driving the SDK. The current list covers 8 agents. This PR fills obvious gaps (Goose, Amp, Augment, Kiro, Windsurf), adds best-effort detection for VS Code Copilot (distinct from the already-detected Copilot CLI), and honors the emerging AGENT=<name> agents.md standard with an unknown fallback.

Identical changes are going out in parallel PRs for the Go and Python SDKs.

Changes

Before: each entry was a single (envVar, product) pair. Presence of any non-empty value on the env var would fire the match. Multi-match returned empty.

Now: each agent record holds a product name and a list of matchers. A matcher is either presence-only or an exact value match. An agent fires if any of its matchers fires. Ambiguity is judged by unique product (not raw matcher hits), so the same agent exposing both a bespoke env var and AGENT=<name> is not ambiguous with itself. When zero known agents match and AGENT is set to a non-empty value, detection returns unknown.

New detections: amp, augment, copilot-vscode, goose, kiro, windsurf. Goose and Amp also match on AGENT=goose and AGENT=amp respectively. Presence-only matchers now treat an empty env value as set (matching the Go SDK's os.LookupEnv semantics), so CLAUDECODE="" counts as Claude Code.

Test plan

  • Unit tests cover every new agent (goose, amp, augment, copilot-vscode, kiro, windsurf)
  • AGENT=goose alone detects goose
  • GOOSE_TERMINAL=1 + AGENT=goose detects goose (not ambiguous, same product)
  • AMP_CURRENT_THREAD_ID + AGENT=amp detects amp (not ambiguous)
  • AGENT=someweirdthing falls back to unknown
  • AGENT="" does not trigger the unknown fallback
  • AGENT=goose + CLAUDECODE=1 returns empty (ambiguity between two distinct products)
  • mvn -pl databricks-sdk-java test -Dtest=UserAgentTest passes (34 tests)
  • mvn -pl databricks-sdk-java spotless:check clean

Add detection for Goose, Amp, Augment, VS Code Copilot, Kiro, and
Windsurf. Also honor the agents.md standard AGENT env var with an
"unknown" fallback when set to a value we don't recognize.

Switches the detection data model from (envVar, product) pairs to
agent records with a list of matchers. Each agent fires if any of
its matchers fires (presence-only or exact value). Ambiguity is
judged by unique product, not raw matcher hits, so the same agent
setting both a bespoke var and AGENT=<name> is not ambiguous.

Co-authored-by: Isaac
Signed-off-by: simon <simon.faltum@databricks.com>
- Add NEXT_CHANGELOG.md entry covering the expanded agent list, the
  AGENT standard, and the empty-string semantics change.
- When the main matcher loop finds no match and AGENT is set to a known
  product name, return that product name instead of "unknown" (implicit
  known-product fallback). Known matchers still win over the fallback,
  so AGENT=cursor + CLAUDECODE=1 still yields claude-code.
- Restore alphabetical ordering: openclaw before opencode.
- Add provenance comments on new agent entries (goose, amp, augment,
  copilot-vscode, kiro, windsurf).
- New tests: testAgentProviderAgentEnvAmp, testAgentProviderAgentEnvCursor,
  testAgentProviderKnownMatcherWinsOverAgentFallback.

Co-authored-by: Isaac
Signed-off-by: simon <simon.faltum@databricks.com>
Previously, agents like amp and goose had dual matchers: their explicit
env var (AMP_CURRENT_THREAD_ID, GOOSE_TERMINAL) plus AGENT=<name>. This
caused asymmetric ambiguity: AGENT=goose + CLAUDECODE=1 resolved to ""
(both matchers fired on different products), while AGENT=cursor +
CLAUDECODE=1 resolved to "claude-code" (only claude-code matched,
cursor was handled by the AGENT fallback which does not trigger once
an explicit matcher has fired).

The rule is now uniform: explicit env var matchers always take
precedence over the generic AGENT=<name> signal. AGENT is treated
purely as a fallback for agents without an explicit matcher, or for
products we do not yet specifically recognize.

Changes:
- Remove per-agent AGENT=<name> matchers from amp and goose entries.
  Those products still set AGENT=<name>; the central fallback in
  lookupAgentProvider handles them.
- Update the lookupAgentProvider doc comment to reflect the new rule.
- Flip the existing AGENT=goose + CLAUDECODE=1 test to expect
  "claude-code" and rename accordingly.
- Add test for GOOSE_TERMINAL=1 + AGENT=cursor -> "goose".
- Add test for COPILOT_CLI=1 + COPILOT_MODEL=gpt-4 -> "" (documents
  the known, intentional ambiguity for Copilot CLI BYOK users).
- Update NEXT_CHANGELOG entry to mention precedence rule.

Signed-off-by: simon <simon.faltum@databricks.com>
Signed-off-by: simon <simon.faltum@databricks.com>
Nested agents (e.g. a Cursor CLI subagent spawned by Claude Code) set
multiple agent env vars on the same process. The previous ambiguity
guard silently dropped the signal in that case. Report "multiple"
instead so the stacked case is visible in telemetry.

Also collapse the known BYOK false positive where Copilot CLI users
have COPILOT_MODEL set alongside COPILOT_CLI: that pair now reports
"copilot-cli" rather than "multiple".

Co-authored-by: Isaac
Signed-off-by: simon <simon.faltum@databricks.com>
@github-actions
Copy link
Copy Markdown
Contributor

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-java

Inputs:

  • PR number: 768
  • Commit SHA: f6e161fa9d2449bfa5cd8712e7d5347d129f6309

Checks will be approved automatically on success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants