fix(kb): disable connectors after repeated sync failures#4046
fix(kb): disable connectors after repeated sync failures#4046waleedlatif1 merged 6 commits intostagingfrom
Conversation
The generic "Failed to obtain access token" error hid the actual root cause. Now logs credentialId, userId, authMode, and provider to help diagnose token refresh failures in trigger.dev. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Connectors that fail 10 times in a row are set to 'disabled' status, stopping the cron from scheduling further syncs. The UI shows an alert triangle with a reconnect banner. Users can re-enable via the play button or by reconnecting their account, which resets failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… variant Sync button should be disabled when connector is in disabled state to guide users toward reconnecting first. Badge variant changed from red to amber to match the warning banner styling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR SummaryMedium Risk Overview Updates the sync engine to treat missing OAuth refresh tokens as a hard error with better logging, and to record a clearer error message when a connector is disabled. Extends the workspace connectors UI to surface disabled connectors (badge, alert icon, warning banner with failure count and reconnect CTA), disables manual “Sync now” for disabled connectors, and treats the play/resume action as re-enabling. Updates the connector Reviewed by Cursor Bugbot for commit c3ea8e1. Configure here. |
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
Greptile SummaryThis PR implements a circuit-breaker for knowledge-base connectors: after 10 consecutive sync failures the connector is automatically set to Confidence Score: 5/5Safe to merge — circuit-breaker logic is correct, prior review comments are addressed, and the only remaining finding is a P2 copy clarification. All P0/P1 concerns from previous review threads are resolved. The only open finding is a P2 UX copy issue where the Reconnect banner text implies single-step re-enable while the actual flow requires a second Resume click. No data-integrity, security, or runtime correctness issues remain. connectors-section.tsx — minor banner copy clarification needed
|
| Filename | Overview |
|---|---|
| apps/sim/lib/knowledge/connectors/sync-engine.ts | Adds circuit-breaker: increments consecutiveFailures on failure, disables connector and clears nextSyncAt after 10 failures, with exponential backoff up to 24 h. Success path resets counter. resolveAccessToken is now inside the try/catch so token failures count toward the circuit-breaker. Logic is correct. |
| apps/sim/app/api/knowledge/[id]/connectors/[connectorId]/route.ts | PATCH handler resets consecutiveFailures and lastSyncError when status is set to active, and correctly uses === undefined (not falsy check) to avoid overriding an explicit null from the syncIntervalMinutes: 0 path. |
| apps/sim/hooks/queries/kb/connectors.ts | Adds consecutiveFailures: number and 'disabled' to ConnectorData type. No logic changes to query/mutation hooks. |
| apps/sim/app/workspace/[workspaceId]/knowledge/[id]/components/connectors-section/connectors-section.tsx | Adds disabled-state UI: amber badge, AlertTriangle icon overlay, warning banner with failure count. Reconnect button is now correctly gated on serviceId && providerId. Minor UX gap: the copy implies OAuth reconnect alone resumes syncing, but the user still needs to press Resume afterward. |
Sequence Diagram
sequenceDiagram
participant Cron
participant SyncEngine
participant DB
participant UI
Cron->>DB: Query status IN ['active','error'] AND nextSyncAt <= now
DB-->>Cron: connectors due
loop For each connector
Cron->>SyncEngine: dispatchSync(connectorId)
SyncEngine->>DB: SET status='syncing' WHERE status != 'syncing'
alt Sync success
SyncEngine->>DB: SET status='active', consecutiveFailures=0, nextSyncAt=now+interval
else Sync failure (< 10)
SyncEngine->>DB: SET status='error', consecutiveFailures+=1, nextSyncAt=now+backoff
else Sync failure >= 10
SyncEngine->>DB: SET status='disabled', consecutiveFailures=10, nextSyncAt=NULL
DB-->>UI: Connector marked disabled
UI->>UI: Show amber banner + Reconnect (OAuth) or Resume prompt (API key)
end
end
UI->>UI: User clicks Resume (play button)
UI->>DB: PATCH status='active' resets consecutiveFailures=0, nextSyncAt=now
DB-->>Cron: Connector eligible again on next cron tick
Reviews (4): Last reviewed commit: "fix(kb): remove dead interval branch whe..." | Re-trigger Greptile
apps/sim/app/api/knowledge/[id]/connectors/[connectorId]/route.ts
Outdated
Show resolved
Hide resolved
.../workspace/[workspaceId]/knowledge/[id]/components/connectors-section/connectors-section.tsx
Show resolved
Hide resolved
.../workspace/[workspaceId]/knowledge/[id]/components/connectors-section/connectors-section.tsx
Show resolved
Hide resolved
apps/sim/app/api/knowledge/[id]/connectors/[connectorId]/route.ts
Outdated
Show resolved
Hide resolved
- Use `=== undefined` instead of falsy check for nextSyncAt to preserve explicit null (manual sync only) when syncIntervalMinutes is 0 - Gate Reconnect button on serviceId/providerId so it only renders for OAuth connectors; show appropriate copy for API key connectors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
… coverage Token resolution failures (e.g. revoked OAuth tokens) were thrown before the try/catch block, bypassing consecutiveFailures tracking entirely. Also removes dead `if (refreshed)` guards at mid-sync refresh sites since resolveAccessToken now always returns a string or throws. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c3ea8e1. Configure here.
apps/sim/app/api/knowledge/[id]/connectors/[connectorId]/route.ts
Outdated
Show resolved
Hide resolved
When `updates.nextSyncAt === undefined`, syncIntervalMinutes was not in the request, so `parsed.data.syncIntervalMinutes` is always undefined. Simplify to just schedule an immediate sync — the sync engine sets the proper nextSyncAt based on the connector's DB interval after completion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@greptile |
|
@cursor revew |
|
Perfect! Let me create a comprehensive review summary. All the issues flagged by the automated reviewers have been addressed: PR Review SummaryI've completed a comprehensive review of the PR. All previously flagged issues have been successfully resolved: ✅ Fixed Issues
✅ Core Feature ImplementationCircuit Breaker Logic:
UI Indicators:
Re-enabling Flow:
Code Quality
Remaining NoteThe Greptile review mentioned a narrow TOCTOU race where Status: Ready to merge ✅ The PR successfully implements the circuit-breaker pattern for connector sync failures with proper UI feedback and re-enabling flow. All P0/P1 review comments have been addressed. |



Summary
resolveAccessTokento surface the actual root cause of token failures instead of a generic messageTest plan