Skip to content

feat(auth): server-side Steward token refresh in edge proxy#461

Open
0xSolace wants to merge 4 commits intodevelopfrom
feat/auth-server-refresh
Open

feat(auth): server-side Steward token refresh in edge proxy#461
0xSolace wants to merge 4 commits intodevelopfrom
feat/auth-server-refresh

Conversation

@0xSolace
Copy link
Copy Markdown
Collaborator

Fixes the 15-30 min silent-logout bug.

The bug

Users get logged out after ~30 min of idle. Return to tab, refresh, bounced to /login. Client-side refresh (visibilitychange, eager refresh on mount, login-page recovery) was insufficient — race between middleware checking cookies and client JS waking up.

The fix

Move refresh out of client JS and into the edge proxy. Store the refresh token in a second httpOnly cookie (steward-refresh-token, 30d). When middleware sees expired access token + valid refresh cookie → call Steward's /auth/refresh server-side, rotate both cookies, forward the fresh bearer on the SAME request. User never sees a login page for the refresh-token-still-valid case.

Changes

  • proxy.ts: refresh flow with structured [steward-auth] logging, transient-failure tolerance (5xx/network log but don't redirect), 401 clears cookies. Forwards the new bearer on the same request via x-middleware-request-authorization header so downstream auth works on the same round trip (no hop to client).
  • app/api/auth/steward-session/route.ts: accepts { token, refreshToken }, sets both httpOnly cookies. DELETE clears both.
  • packages/lib/providers/StewardProvider.tsx: syncs refresh token alongside access token, drops aggressive visibilitychange handler (middleware handles return-from-idle now).
  • app/login/steward-login-section.tsx: minor body-shape update for new POST signature.
  • packages/tests/unit/steward-proxy-refresh.test.ts: unit coverage for success/401/5xx/network-error paths + same-request auth forwarding.

Env vars

  • STEWARD_API_URL (server-side, for proxy refresh) — falls back to NEXT_PUBLIC_STEWARD_API_URL, final fallback https://eliza.steward.fi
  • NEXT_PUBLIC_STEWARD_API_URL still used by client provider

Known edge cases (documented)

  • JWT with no decodable exp is treated as usable (refresh skipped) — rare but worth flagging
  • Proxy 401 clears steward-token + steward-refresh-token only, not steward-authed

Tests

Bun unit tests cover the four proxy paths. Run in CI.

Co-authored-by: wakesync shadow@shad0w.xyz

Moves the refresh race out of client JS and into the edge proxy. When a
protected request arrives with an expired access token and a valid
refresh token cookie, the proxy calls Steward's /auth/refresh server-side,
rotates both cookies, and forwards the request with the new bearer so
downstream auth works on the same round trip.

- proxy.ts: refresh flow with structured [steward-auth] logging,
  transient-failure tolerance (5xx/network errors log but don't redirect),
  401 clears both cookies
- app/api/auth/steward-session/route.ts: sets both steward-token and
  steward-refresh-token httpOnly cookies, DELETE clears both
- packages/lib/providers/StewardProvider.tsx: syncs refresh token to the
  bridge alongside access token, drops visibilitychange handler (middleware
  handles return-from-idle now)
- app/login/steward-login-section.tsx: minor body-shape update for the
  new session bridge POST signature
- packages/tests/unit/steward-proxy-refresh.test.ts: coverage for
  success/401/5xx/network-error paths + same-request auth forwarding

Fixes the 15-30 min silent-logout bug.

Co-authored-by: wakesync <shadow@shad0w.xyz>
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
eliza-cloud-v2 Ready Ready Preview, Comment Apr 20, 2026 7:37am

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 20, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1b1a3fde-2833-464d-99c7-a354743eafcf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/auth-server-refresh

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Apr 20, 2026

Code Review: feat(auth) — server-side Steward token refresh in edge proxy

This is a well-structured fix for a real UX problem. The architecture (refresh in middleware, same-request bearer forwarding, transient-failure tolerance) is sound. A few issues worth addressing before merge.


Bugs / Correctness

steward-authed not cleared on forced proxy logout

getStewardDeleteCookies() in proxy.ts returns ["steward-token", "steward-refresh-token"] but omits steward-authed. When the middleware forces a logout (401 from Steward), client JS still sees steward-authed=1 and may assume the session is live, causing UI flickers or incorrect state until the next page load resolves the mismatch. The DELETE route handler does clear it; the proxy path should too.

// proxy.ts
function getStewardDeleteCookies(): string[] {
  return ["steward-token", "steward-refresh-token", "steward-authed"];
}

Zero-TTL access token sets an immediately-expired cookie

serializeCookie in getStewardRefreshSetCookies passes tokenTtl directly as Max-Age. If tokenTtl is 0 (token just expired at the moment of refresh), the Set-Cookie header becomes Max-Age=0, which browsers treat as "delete this cookie" — the user is immediately logged out again on the next request. Add a floor:

maxAge: typeof tokenTtl === "number" && tokenTtl > 0 ? tokenTtl : undefined,

(The condition is already there in getStewardRefreshSetCookies — double-check that tokenTtl can't be exactly 0 in practice after a fresh refresh.)


Security

Soft-fail on invalid refresh payload is too permissive

In tryRefreshStewardSession, a well-formed HTTP 200 response whose body lacks token/refreshToken strings is returned as { kind: "5xx", status: 502 }, which the caller treats as a transient failure and lets the request through with the expired token. This is a misclassification — an unexpected 200 payload isn't a transient network blip. Consider returning { kind: "401" } for unrecognizable payloads, or at minimum log it as a distinct "bad-payload" outcome so it's distinguishable in logs.

Refresh token still lives in localStorage

StewardProvider.tsx continues writing steward_refresh_token to localStorage, and steward-login-section.tsx falls back to reading it from there:

refreshToken ?? (typeof window !== "undefined" ? localStorage.getItem("steward_refresh_token") : null)

The httpOnly cookie is the secure store; localStorage is XSS-accessible. Now that the httpOnly cookie is the authoritative path, it's worth tracking whether the localStorage copy can be dropped in a follow-up — it's currently necessary for the OAuth callback flow, but it's extra attack surface.


Observability

stewardAuthMetricCounter is not useful in serverless

stewardAuthMetricCounter is a module-level let incremented on each auth event. On Vercel, each cold start resets it to 0. The counter appears in every log line but its value is meaningless across requests and will mislead anyone reading the logs. If per-process ordering is the intent, it's fine; if cross-request trending is the goal, a proper metric backend is needed. Worth a comment clarifying the intent, or just removing the field.


Minor

requestHeaders loop duplicates keys when x-middleware-override-headers is always set

The refactored middlewareNext now unconditionally calls headers.set("x-middleware-override-headers", ...) whenever requestHeaders is present, even if the Headers object is empty (unlikely but possible). The old version had the headersList.length > 0 guard. Not a live bug, but worth the guard if empty-header sets are ever constructed.

Test: expiresIn not present in the "same-request forwarding" test

The last test (forwards refreshed authorization on the same protected API request) omits expiresIn from the mock response, so tokenTtl is derived from the JWT's exp. That's fine and intentional, but the test doesn't assert on the Set-Cookie headers — it's worth adding that assertion so the test also covers the tokenTtl-from-JWT code path end-to-end.


What's working well

  • Same-request bearer forwarding via x-middleware-request-authorization is the right approach — eliminates the extra round-trip.
  • Transient-failure tolerance (5xx/network → soft-fail, no redirect) is the correct behavior for reliability.
  • Unit test coverage for all four proxy paths is solid.
  • serializeCookie / serializeDeletedCookie helpers clean up the raw header construction nicely.
  • Removing the visibilitychange handler is correct cleanup — middleware handles the return-from-idle case now.

The core design is good. The steward-authed gap and the soft-fail misclassification are the two things I'd fix before merging.

Auto-fix from biome check --write.

Co-authored-by: wakesync <shadow@shad0w.xyz>
@claude
Copy link
Copy Markdown

claude bot commented Apr 20, 2026

Code Review: feat(auth): server-side Steward token refresh in edge proxy

Overall: This is a solid fix for the silent-logout bug. The architecture is correct — moving refresh into the edge proxy eliminates the client race condition. The code is well-structured with good test coverage for the main paths. A few issues worth addressing before merge.


Bugs

1. logStewardAuth("skipped", ...) returns { kind: "401" } — misleading log

In tryRefreshStewardSession, when there's no refresh token, the log says "skipped" but the caller treats the result as a 401 and redirects to login. A monitoring system keyed on this log would miss the forced-logout case.

// proxy.ts (tryRefreshStewardSession)
logStewardAuth("skipped", ttl, { reason: "missing-refresh-token" });
return { kind: "401" };

Suggest logging "401-no-refresh-token" or splitting the signal so the log outcome matches the user impact.

2. steward-authed cookie not cleared on 401

getStewardDeleteCookies() returns only ["steward-token", "steward-refresh-token"]. After a forced-logout 401 from Steward, steward-authed=1 persists in the browser. Any client-side code that gates UI on steward-authed will show the authenticated state while the user is actually logged out. The PR description acknowledges this as a known edge case, but it should be fixed or have an inline comment explaining why it's intentionally skipped.

3. Empty refresh token accepted from Steward API

if (typeof body.token !== "string" || typeof body.refreshToken !== "string") {

An empty string passes this check. serializeCookie("steward-refresh-token", "") would set a cookie with no value. Add || body.refreshToken.length === 0.


Security

4. Refresh token in localStorage

steward_refresh_token lives in localStorage (read in steward-login-section.tsx and StewardProvider.tsx) and is XSS-accessible. The httpOnly cookie path in the proxy is correctly protected, but any XSS can steal the refresh token and call Steward's /auth/refresh directly. This is an existing pattern in the codebase (the access token is already in localStorage), so it's a known tradeoff — worth documenting if it isn't already.


Code Quality

5. stewardAuthMetricCounter is a module-level global duplicated across two files

// proxy.ts  AND  route.ts — both have:
let stewardAuthMetricCounter = 0;

In serverless/edge environments, this counter resets per cold start and is per-instance. The metric will drift unpredictably and can't be aggregated. If this is intended for operational visibility, use a structured log field instead of an incrementing counter, or at minimum add a comment explaining the limitation. Also, the same logStewardAuth function definition is copy-pasted into both files — consider extracting to a shared helper if the project has one.

6. middlewareNext sets x-middleware-override-headers unconditionally if requestHeaders is passed

Old code guarded on headersList.length > 0. New code:

if (options?.requestHeaders) {
  options.requestHeaders.forEach(...);         // sets headers
  headers.set("x-middleware-override-headers", ...);  // always set
}

If an empty Headers object is passed, x-middleware-override-headers is set to an empty string "". Unlikely to cause issues in practice, but the guard was there for a reason.

7. withSetCookies is unused in this PR's normal flow

withSetCookies is only used inside handleTokenFailure for the API (/api/) path when deleting cookies. The function is correct but its usage is narrow — the name implies broader cookie-appending utility. Fine as-is, just noting it for discoverability.


Tests

Good coverage for the four proxy paths. A few gaps:

  • No test for a valid (non-expired) steward token — the fast path that returns early without calling /auth/refresh. Adding this ensures the bypass doesn't accidentally get broken.
  • No test for JWT with no exp claim — the PR description calls this out as a known edge case (treated as valid, refresh skipped). A test would lock in the behavior.
  • Concurrent refresh not tested — two simultaneous requests with an expired token will both call /auth/refresh. Steward's API will rotate the refresh token on the first call, making the second call a 401. The test suite could at least document this behavior even if the fix belongs upstream.

Minor

  • steward-authed maxAge is hardcoded to 60 * 60 * 24 * 7 (7 days) while steward-token now uses dynamic TTL. Inconsistency is fine since steward-authed is just a flag, but worth a comment.
  • STEWARD_REFRESH_COOKIE_MAX_AGE (30 days) is defined in both proxy.ts and route.ts with the same value. Either share the constant or accept the duplication — just don't let them drift.

Summary: Fix the "skipped" log/401 mismatch and the empty refresh token check before merge. The steward-authed cleanup gap should either be fixed or documented inline. Everything else is low priority. The core refresh logic and test coverage are solid.

Co-authored-by: wakesync <shadow@shad0w.xyz>
@claude
Copy link
Copy Markdown

claude bot commented Apr 20, 2026

Code Review — feat(auth): server-side Steward token refresh in edge proxy

Summary: This PR fixes a meaningful UX bug (silent logout after ~30 min idle) by moving token refresh from client JS into the edge proxy. The approach is sound — storing the refresh token in a second httpOnly cookie and performing the refresh server-side on the same request round-trip is the right model for an edge middleware architecture.


What works well

  • Architecture is correct. Moving refresh to the proxy eliminates the race between cookie expiry and client JS waking up in a backgrounded tab. The x-middleware-request-authorization injection pattern is the right way to forward a fresh token on the same request.
  • Transient failure handling is appropriate. 5xx/network errors soft-fail without clearing cookies or redirecting, so a momentary Steward outage won't log users out.
  • Cookie security posture is solid. httpOnly, SameSite=Lax, Secure in prod, 30-day maxAge on refresh token — all correct.
  • Test coverage is good for a new code path. Four paths (success, 401, 5xx, network error) with header/cookie assertions.
  • Removing the visibilitychange handler is the right cleanup — middleware handles that case now.

Issues

1. stewardAuthMetricCounter is unreliable in serverless/edge

// proxy.ts:81 and route.ts:9
let stewardAuthMetricCounter = 0;

Module-level mutable state in an edge/serverless function resets on every cold start and is not shared between concurrent instances. This counter will be inconsistent across workers and misleading in logs (e.g., two requests may both log metric: 1). If you want a distributed counter, use a Redis increment. If this is just for local debugging, document that clearly, or drop the metric field from the log shape.

2. steward-authed is not cleared on proxy-forced 401

getStewardDeleteCookies() returns only ["steward-token", "steward-refresh-token"]. The non-httpOnly steward-authed cookie persists after a forced re-login redirect. The PR notes this, but it's worth calling out: client JS reading steward-authed to drive UI state will show a stale authenticated indicator until the client does its own cleanup. At minimum, add a note in getStewardDeleteCookies() explaining why steward-authed is intentionally excluded (or clear it too, since the proxy redirect will send them to /login anyway).

3. Missing refresh token conflated with a Steward 401

// proxy.ts — tryRefreshStewardSession
const refreshToken = request.cookies.get("steward-refresh-token")?.value;
if (!refreshToken) {
  logStewardAuth("skipped", ttl, { reason: "missing-refresh-token" });
  return { kind: "401" };
}

Returning { kind: "401" } when the refresh cookie is simply absent is misleading — the caller can't distinguish "Steward rejected the refresh token" from "we didn't have a token to send." This matters for logging and for future callers. Consider a dedicated { kind: "no-refresh-token" } variant, or handle this upstream before calling tryRefreshStewardSession.

4. middlewareNext behavior change when requestHeaders is empty

Before:

if (headersList.length > 0) {
  headers.set("x-middleware-override-headers", ...);
  // set x-middleware-request-* headers
}

After:

if (options?.requestHeaders) {
  options.requestHeaders.forEach(...);
  headers.set("x-middleware-override-headers", ...); // always set
}

If requestHeaders is provided but empty (no entries), the refactored code sets x-middleware-override-headers to an empty string. This is harmless in current usage since requestHeaders is only passed with entries, but it's a subtle behavioral change worth noting.

5. Duplicate logStewardAuth implementations

proxy.ts and route.ts each define their own logStewardAuth function and independent stewardAuthMetricCounter. The counters are siloed — a refresh in the proxy increments one counter, a session set via the route increments another. Since edge and server runtimes can't easily share a module, this may be unavoidable, but the inconsistency should at minimum be documented, or the shared logic extracted into a utility that both import.


Suggestions

Missing test: refresh-only path (no access token, refresh token present)

The condition (stewardToken?.value || stewardRefreshToken?.value) allows a request with only a refresh cookie (no access token) to attempt a refresh. This path isn't tested. Add a case where steward-token is absent but steward-refresh-token is present and the refresh succeeds.

Consider clamping negative TTLs before cookie serialization

In getStewardRefreshSetCookies, tokenTtl could theoretically be negative if the freshly-issued token is already past its exp (clock skew, bad server response). serializeCookie would then emit Max-Age=-N, which browsers treat as "delete immediately." Consider Math.max(0, tokenTtl) or a fallback to a sane default (e.g., 15 minutes) before passing to serializeCookie.

withSetCookies is defined but used in only one place

The helper is clean, but if it's only used in handleTokenFailure for API paths, consider inlining it there to reduce surface area. Alternatively, use it consistently — middlewareNext and middlewareRedirect both also append Set-Cookie headers manually.


Minor

  • route.ts:9let stewardAuthMetricCounter = 0; is module-level state in a Next.js API route. Same serverless caveat as above.
  • Test cache-busting via ?test=${Date.now()} in the import URL is clever but fragile if tests run within the same millisecond. A UUID or sequential counter would be safer.
  • The expiresIn field from the refresh response is used as tokenTtl if present, falling back to JWT exp. If the API returns expiresIn: 0, that results in a session cookie (no maxAge). This is probably fine but worth confirming with the Steward API contract.

Overall: Approve with the counter/logging reliability issue and the missing "no refresh token" test case addressed. The core fix is solid.

TypeScript strict mode rejected Mock<() => Promise<Response>> as typeof
fetch without an intermediate unknown cast (preconnect is missing).
Fix test file to:
- cast all globalThis.fetch mocks via 'as unknown as typeof fetch'
- type Array.some/every/find callback params as string (previously implicit any)
- cast fetchMock.mock.calls[0] as unknown[] to avoid tuple length 0 error

Unit tests continue to pass (5/5).

Co-authored-by: wakesync <shadow@shad0w.xyz>
@claude
Copy link
Copy Markdown

claude bot commented Apr 20, 2026

Code Review: feat(auth) — server-side Steward token refresh in edge proxy

Good overall approach. Moving refresh out of client JS and into the edge middleware is the right call for solving the idle-logout race condition. The diff is clean and the PR description documents edge cases honestly. A few issues worth addressing before merge:


Critical / High

steward-authed not cleared on proxy 401 redirect

getStewardDeleteCookies() returns ["steward-token", "steward-refresh-token"] but omits steward-authed. When the proxy redirects to /login after a failed refresh, client-side code checking document.cookie for steward-authed still sees 1. The user lands on /login with the auth indicator still set, which can cause confusing flashes or premature redirects back out of login.

// proxy.ts — getStewardDeleteCookies
function getStewardDeleteCookies(): string[] {
  return ["steward-token", "steward-refresh-token", "steward-authed"]; // add this
}

Note: the route.ts DELETE handler does delete steward-authed, but the proxy's own redirect path does not.


tryRefreshStewardSession returns { kind: "401" } when the refresh cookie is simply absent

if (!refreshToken) {
  logStewardAuth("skipped", ttl, { reason: "missing-refresh-token" });
  return { kind: "401" }; // ← treated same as a real 401 from Steward
}

Existing users who logged in before this PR will have no steward-refresh-token cookie. When their 15-30 min access token expires, the proxy will treat the missing refresh cookie as a hard 401 and redirect them to /login with cookies cleared — same behavior as a completely invalid session. These users will be logged out immediately on first expiry post-deploy.

Recommend returning a distinct { kind: "no-refresh-token" } variant and soft-failing (like 5xx/network-error) so these users are passthrough'd rather than bounced. They'll get redirected on the next request after Steward's own verification rejects the expired token.


Medium

Module-level stewardAuthMetricCounter is shared across concurrent requests

// proxy.ts
let stewardAuthMetricCounter = 0;

// route.ts
let stewardAuthMetricCounter = 0;

Edge middleware isolates can handle concurrent requests in the same V8 isolate. The counter is mutation-only (no read–modify–write race that changes behavior), so it won't corrupt business logic, but the metric will increment non-deterministically under load. If this is meant as a request count for observability, use crypto.randomUUID() or a per-request ID instead. Also the counter is duplicated between proxy.ts and route.ts — they're independent counters with the same name, which will confuse log correlation.


getStewardRefreshSetCookies doesn't rotate steward-authed

The steward-authed cookie has a hardcoded 7-day max-age set at login. A user with a 30-day refresh token will have steward-authed expire ~23 days before their refresh token does. At that point the proxy still handles refresh correctly (it doesn't depend on steward-authed), but client JS that gating on steward-authed will see the user as unauthenticated. Consider either:

  • Adding steward-authed=1; Max-Age=2592000 to getStewardRefreshSetCookies, or
  • Documenting that steward-authed is unreliable after 7 days and that client code should not gate on it for sessions longer than that.

Low / Style

withSetCookies is only used once

The helper is defined but called in only one place (handleTokenFailure for API paths). The middlewareNext path uses options.setCookies directly. Either consolidate to one approach or inline withSetCookies at its call site — the abstraction isn't pulling its weight.

getStewardApiUrl() could be memoized

Called on every tryRefreshStewardSession invocation. Environment variables don't change at runtime, so reading them three times per request is harmless but noisy. A module-level const STEWARD_API_URL = getStewardApiUrl() would be cleaner (also makes the fallback chain visible at startup rather than buried in a helper).


Tests

The four test paths (success, 401, 5xx, network error) are solid. The same-request auth forwarding test directly validates the x-middleware-request-authorization header. A few gaps worth noting:

  • No test for the missing refresh token path (currently hides behind kind: "401")
  • No test for the valid non-expired steward token passthrough (line checking stewardTtl > 0)
  • steward-authed cookie clearing on 401 is not asserted in the 401 test (catches the issue above)

Summary

Area Status
Core logic (server-side refresh) ✅ Correct approach
Cookie security (httpOnly, SameSite)
steward-authed on redirect ❌ Not cleared
Missing refresh token handling ⚠️ Will log out existing users post-deploy
Module-level counter ⚠️ Minor observability concern
Test coverage ⚠️ Missing 2-3 cases

The cookie clear and missing-refresh-token issues are the most likely to cause visible user impact at deploy time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant