Skip to content

Wire request signing to RuntimeServices store primitives (PR 9)#609

Open
prk-Jr wants to merge 86 commits intomainfrom
feature/edgezero-pr9-wire-signing-to-store-primitives
Open

Wire request signing to RuntimeServices store primitives (PR 9)#609
prk-Jr wants to merge 86 commits intomainfrom
feature/edgezero-pr9-wire-signing-to-store-primitives

Conversation

@prk-Jr
Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr commented Apr 2, 2026

Summary

  • Replaces direct FastlyConfigStore/FastlySecretStore construction in request_signing/ with RuntimeServices platform traits, making the signing subsystem platform-agnostic
  • Moves management API write operations (config item / secret CRUD) out of trusted-server-core into an adapter-only FastlyManagementApiClient, enforcing the EdgeZero layering rule that only the adapter may call Fastly-specific APIs
  • Threads RuntimeServices into AuctionContext so PrebidAuctionProvider (and future providers) can call RequestSigner::from_services without a deprecated FastlyConfigStore shim

Changes

File Change
crates/trusted-server-adapter-fastly/src/management_api.rs NewFastlyManagementApiClient with update_config_item, delete_config_item, create_secret, delete_secret via Fastly management API
crates/trusted-server-adapter-fastly/src/platform.rs FastlyPlatformConfigStore::put/delete and FastlyPlatformSecretStore::create/delete delegate to FastlyManagementApiClient
crates/trusted-server-adapter-fastly/src/main.rs Wire services into /verify-signature handler; add mod management_api
crates/trusted-server-core/src/request_signing/signing.rs Rewrite — RequestSigner::from_services(&RuntimeServices) replaces from_config(); verify_signature accepts &RuntimeServices; deprecated shim removed
crates/trusted-server-core/src/request_signing/rotation.rs Rewrite — KeyRotationManager::new() now infallible; all methods accept services: &RuntimeServices
crates/trusted-server-core/src/request_signing/endpoints.rs All handlers accept services: &RuntimeServices; tests use in-memory stub stores
crates/trusted-server-core/src/auction/types.rs AuctionContext gains pub services: &'a RuntimeServices
crates/trusted-server-core/src/auction/orchestrator.rs Pass services through derived AuctionContext construction sites
crates/trusted-server-core/src/auction/endpoints.rs Pass services when constructing AuctionContext
crates/trusted-server-core/src/integrations/prebid.rs Use RequestSigner::from_services(context.services) — no more #[allow(deprecated)]
crates/trusted-server-core/src/platform/test_support.rs Add build_services_with_config_and_secret test helper
crates/trusted-server-core/src/storage/api_client.rs DeletedFastlyApiClient moved to adapter-only management_api.rs
crates/trusted-server-core/src/storage/mod.rs Remove api_client module and FastlyApiClient re-export
docs/superpowers/plans/2026-03-31-pr9-wire-signing-to-store-primitives.md Implementation plan

Closes

Closes #490

Test plan

  • cargo test --workspace
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format
  • WASM build: cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1
  • Manual testing via fastly compute serve

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses tracing macros (not println!)
  • New code has tests
  • No secrets or credentials committed

prk-Jr and others added 30 commits March 18, 2026 16:54
Rename crates/common → crates/trusted-server-core and crates/fastly →
crates/trusted-server-adapter-fastly following the EdgeZero naming
convention. Add EdgeZero workspace dependencies pinned to rev 170b74b.
Update all references across docs, CI workflows, scripts, agent files,
and configuration.
Introduces trusted-server-core::platform with PlatformConfigStore,
PlatformSecretStore, PlatformKvStore, PlatformBackend, PlatformHttpClient,
and PlatformGeo traits alongside ClientInfo, PlatformError, and
RuntimeServices. Wires the Fastly adapter implementations and threads
RuntimeServices into route_request. Moves GeoInfo to platform/types as
platform-neutral data and adds geo_from_fastly for field mapping.
- Defer KV store opening: replace early error return with a local
  UnavailableKvStore fallback so routes that do not need synthetic ID
  access succeed when the KV store is missing or temporarily unavailable
- Use ConfigStore::try_open + try_get and SecretStore::try_get throughout
  FastlyPlatformConfigStore and FastlyPlatformSecretStore to honour the
  Result contract instead of panicking on open/lookup failure
- Encapsulate RuntimeServices service fields as pub(crate) with public
  getter methods (config_store, secret_store, backend, http_client, geo)
  and a pub new() constructor; adapter updated to use new()
- Reference #487 in FastlyPlatformHttpClient stub (PR 6 implements it)
- Remove unused KvPage re-export from platform/mod.rs
- Use super::KvHandle shorthand in RuntimeServices::kv_handle()
- Split fastly_storage.rs into storage/{config_store,secret_store,api_client,mod}.rs
- Add PlatformConfigStore read path via FastlyPlatformConfigStore::get using ConfigStore::try_open/try_get
- Add PlatformError::NotImplemented variant; stub write methods on FastlyPlatformConfigStore and FastlyPlatformSecretStore
- Add StoreName/StoreId newtypes with From<String>, From<&str>, AsRef<str>
- Add UnavailableKvStore to core platform module
- Add RuntimeServicesBuilder replacing 7-arg constructor
- Migrate get_active_jwks and handle_trusted_server_discovery to use &RuntimeServices
- Update call sites in signing.rs, rotation.rs, main.rs
- Add success-path test for handle_trusted_server_discovery using StubJwksConfigStore
- Fix test_parse_cookies_to_jar_empty typo (was emtpy)
- Make StoreName and StoreId inner fields private; From/AsRef provide all
  needed construction and access
- Add #[deprecated] to GeoInfo::from_request with #[allow(deprecated)] at
  the three legacy call sites to track migration progress
- Enumerate the six platform traits in the platform module doc comment
- Extract backend_config_from_spec helper to remove duplicate BackendConfig
  construction in predict_name and ensure
- Replace .into_iter().collect() with .to_vec() on secret plaintext bytes
- Remove unused bytes dependency from trusted-server-adapter-fastly
- Add comment on SecretStore::open clarifying it already returns Result
  (unlike ConfigStore::open which panics)
prk-Jr added 6 commits April 1, 2026 15:45
Thread RuntimeServices into AuctionContext so auction providers can
access platform stores directly. Update PrebidAuctionProvider to use
RequestSigner::from_services(context.services) instead of the now-
removed from_config() shim. All construction sites and test helpers
updated accordingly.

This satisfies the final acceptance criterion of #490: no
FastlyConfigStore/FastlySecretStore construction remains in the
request_signing/ modules.
@prk-Jr prk-Jr self-assigned this Apr 2, 2026
@prk-Jr prk-Jr changed the title (PR 9) Wire request signing to RuntimeServices store primitives Wire request signing to RuntimeServices store primitives (PR 9) Apr 2, 2026
@prk-Jr prk-Jr linked an issue Apr 2, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Solid architectural move — management API writes are cleanly extracted from trusted-server-core into the Fastly adapter, and request signing is now fully platform-agnostic via RuntimeServices. Tests migrated well with spy/stub stores exercising real crypto.

Two issues with HTTP status code handling in FastlyManagementApiClient that could cause silent failures in production.

Blocking

🔧 wrench

  • create_secret rejects HTTP 201 Created: Fastly secret store API returns 201 on creation, but only 200 is accepted — new secrets will be treated as failures (management_api.rs:221)
  • update_config_item may reject HTTP 201: Config store PUT may return 201 for new items — same risk as above (management_api.rs:139)

❓ question

  • Box::leak in test helpers: Intentional memory leak in orchestrator.rs:683 and prebid.rs:1290 — would LazyLock<RuntimeServices> be preferred?

Non-blocking

♻️ refactor

  • Duplicate JWKS_STORE_NAME LazyLock: Identical definition in both signing.rs:17 and rotation.rs:20 — could live in request_signing/mod.rs

🤔 thinking

  • api_key stored as Vec<u8> but used as string: Could be String directly, or a zeroizing type for security (management_api.rs:41)

⛏ nitpick

  • Response body read but discarded: buf is read in every method but never included in error messages, unlike the deleted FastlyApiClient which used it for debugging

👍 praise

  • KeyRotationManager::new() becoming infallible is a clean simplification
  • Spy store test pattern with Mutex<Vec<...>> is excellent — real crypto with in-memory stores

CI Status

  • integration tests: PASS
  • browser integration tests: PASS
  • prepare integration artifacts: PASS

Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs Outdated
Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs Outdated
Comment thread crates/trusted-server-core/src/auction/orchestrator.rs Outdated
Comment thread crates/trusted-server-core/src/request_signing/signing.rs Outdated
Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs Outdated
Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs Outdated
Comment thread crates/trusted-server-core/src/request_signing/rotation.rs Outdated
Comment thread crates/trusted-server-core/src/request_signing/rotation.rs
@prk-Jr prk-Jr requested a review from aram356 April 6, 2026 09:58
Copy link
Copy Markdown
Collaborator

@ChristianPavilonis ChristianPavilonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR does an excellent job of decoupling request signing from Fastly-specific storage primitives, moving to platform-agnostic RuntimeServices throughout. The layering is clean and the test infrastructure is solid.

However, there are two P0 issues in the Fastly management API client that could cause failures on key re-rotation, and several opportunities to reduce test code duplication and improve documentation coverage.

Findings by severity

Severity Count
P0 (critical) 2
P1 (high) 4
P2 (medium) 9

Findings not attached inline

(These findings reference lines outside the diff or files not modified in this PR.)

🤔 P2 — parse_ed25519_signing_key heuristic is fragile (crates/trusted-server-core/src/request_signing/signing.rs, line 32)

Using len > 32 to distinguish raw vs Base64-encoded keys works by current convention but creates an implicit contract between the writer (rotation.rs) and reader (this function). A 32-byte raw key that happens to be valid UTF-8 could theoretically be misinterpreted. Consider adding a doc comment explaining the encoding contract explicitly.


🤔 P2 — RequestSigner struct + kid field lack doc comments (crates/trusted-server-core/src/request_signing/signing.rs, line 51)

Per project documentation standards (CLAUDE.md), public structs and their fields should have doc comments. RequestSigner and its kid field are undocumented.


🤔 P2 — Public request/response endpoint types undocumented (crates/trusted-server-core/src/request_signing/endpoints.rs, lines 56-246)

VerifySignatureRequest, VerifySignatureResponse, RotateKeyRequest, RotateKeyResponse, DeactivateKeyRequest, DeactivateKeyResponse all lack doc comments. These are the public API surface — adding even one-line descriptions helps consumers understand the expected shapes.


👏 Praise — Excellent StoreName/StoreId newtype separation (crates/trusted-server-core/src/platform/types.rs)

Having distinct types for edge-visible runtime names vs. management API identifiers prevents subtle bugs at the type level. The compiler catches misuse that would otherwise be a runtime surprise.

See inline comments for all other findings.

Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs Outdated
Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs Outdated
Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs Outdated
Comment thread crates/trusted-server-adapter-fastly/src/platform.rs
Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs Outdated
Comment thread crates/trusted-server-core/src/request_signing/rotation.rs Outdated
Comment thread crates/trusted-server-core/src/request_signing/rotation.rs
Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs
Comment thread crates/trusted-server-core/src/request_signing/rotation.rs
Comment thread crates/trusted-server-core/src/request_signing/rotation.rs
Copy link
Copy Markdown
Collaborator

@ChristianPavilonis ChristianPavilonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is a well-architected PR — the core crate is now genuinely platform-agnostic for signing, and the management API isolation to the adapter is the right layering. A few correctness and structural items below, none blocking.


Note: One finding could not be placed as an inline comment because the relevant line is not part of the diff:

🤔 Module doc still references Fastly-specific store semantics (crates/trusted-server-core/src/request_signing/mod.rs, line 8)

The module doc still says "Fastly stores have two identifiers" and references ConfigStore::open / SecretStore::open. Since the whole point of this PR is platform-agnostic signing, this doc should use platform-neutral language (e.g. "Platform stores have two identifiers" and reference the trait methods).

Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs Outdated
Comment thread crates/trusted-server-core/src/request_signing/endpoints.rs
Comment thread crates/trusted-server-core/src/request_signing/rotation.rs Outdated
Comment thread crates/trusted-server-core/src/request_signing/rotation.rs Outdated
Comment thread crates/trusted-server-core/src/auction/orchestrator.rs Outdated
Comment thread crates/trusted-server-core/src/request_signing/mod.rs Outdated
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR cleanly separates the Fastly management API client from trusted-server-core into the adapter layer, replacing direct FastlyConfigStore/FastlySecretStore usage with RuntimeServices platform traits. The architecture improvement is solid. Two items need clarification before merge.

Blocking

🔧 wrench

  • Secret storage format change: the new FastlyManagementApiClient::create_secret base64-encodes the secret field at the transport layer (and uses PUT), while the old FastlyApiClient::create_secret sent it as raw text (POST). Since rotation.rs::store_private_key already base64-encodes the raw key, the effective payload format has changed. Need confirmation on Fastly API contract and backward compatibility with existing keys. (management_api.rs:68)

❓ question

  • Broadened status check: old code checked for specific status codes (OK, NO_CONTENT), new code uses is_success() (any 2xx). Was this intentional? (management_api.rs:90)

Non-blocking

🤔 thinking

  • Per-call client construction: each write call constructs a new FastlyManagementApiClient (4x during rotate_key). SDK caching mitigates this, but worth considering caching at a higher level. (platform.rs:69)
  • parse_active_kids untested directly: edge cases like ",", "", " kid-a , , kid-b " are not covered by focused unit tests. (mod.rs:56)

♻️ refactor

  • signing_store_ids returns unnamed tuple: fine for two values, but a named struct would improve readability if more settings are added. (endpoints.rs:168)

⛏ nitpick

  • JWKS_STORE_NAME/SIGNING_STORE_NAME visibility: pub but only used within the crate — pub(crate) would be more precise. (mod.rs:49)
  • PR checklist says "tracing": actual code and CLAUDE.md use log macros.

CI Status

  • integration tests: PASS
  • browser integration tests: PASS
  • prepare integration artifacts: PASS

Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs
Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs
Comment thread crates/trusted-server-adapter-fastly/src/platform.rs
Comment thread crates/trusted-server-core/src/request_signing/mod.rs
Comment thread crates/trusted-server-core/src/request_signing/mod.rs Outdated
Comment thread crates/trusted-server-core/src/request_signing/endpoints.rs
@prk-Jr prk-Jr requested a review from aram356 April 10, 2026 08:47
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

The architectural move — extracting FastlyApiClient from trusted-server-core into trusted-server-adapter-fastly::management_api and threading RuntimeServices through signing and auction — is exactly the right direction for EdgeZero's platform-agnostic core. Local fmt/clippy/cargo test --workspace all pass, and integration checks on GitHub are green.

Four blocking items prevent merge as-is, all clustered around the rotate/deactivate lifecycle. Fixing (1) first will likely surface (2) and (3) via real tests without any further changes.

Blocking

🔧 wrench

1. Handler tests in request_signing/endpoints.rs assert nothing (crates/trusted-server-core/src/request_signing/endpoints.rs:542-658)

test_handle_rotate_key_with_empty_body, test_handle_rotate_key_with_custom_kid, test_handle_deactivate_key_request, test_handle_deactivate_key_with_delete, and test_handle_trusted_server_discovery all use this pattern:

match result {
    Ok(mut resp) => { /* log … no asserts */ }
    Err(e) => log::debug!("Expected error in test environment: {}", e),
}

They pass whether the handler succeeds or fails, so the new rotate/deactivate/discovery handler paths are effectively untested even though the PR checklist claims "New code has tests."

Fix: assert on response status and on the success/error fields of the parsed JSON body. Minimum:

let mut resp = handle_rotate_key(&settings, &noop_services(), req)
    .expect("should return a response even when stores are unavailable");
assert_eq!(resp.get_status(), StatusCode::INTERNAL_SERVER_ERROR);
let body: RotateKeyResponse = serde_json::from_str(&resp.take_body_str())
    .expect("should deserialize rotate response");
assert!(!body.success);
assert!(body.error.is_some());

2. SECRET_UPSERT_METHOD = "PUT" on the Fastly secret-store collection endpoint is not a documented API verb (crates/trusted-server-adapter-fastly/src/management_api.rs:31, 257-289)

The previous api_client.rs used POST (the documented create verb); upsert semantics in Fastly's secret-store API are expressed by posting with method: "create_or_recreate" in the JSON body, not by switching the verb.

The only guards here are secret_upsert_method_uses_put (asserts the constant against itself) and create_secret_uses_secret_store_error_for_transport_failures (uses a bogus backend). Neither touches the real API contract. As written, the second rotation of any kid — or a same-day re-rotation via generate_date_based_kid — will likely return 405/400 from the management API in production.

Fix:

const SECRET_UPSERT_METHOD: &str = "POST";

fn build_secret_payload(secret_name: &str, secret_value: &str) -> String {
    serde_json::json!({
        "name": secret_name,
        "secret": general_purpose::STANDARD.encode(secret_value.as_bytes()),
        "method": "create_or_recreate",
    })
    .to_string()
}

This is distinct from the earlier base64 finding — it's about the verb itself.

3. KeyRotationManager::rotate_key is non-atomic across four management-API writes with no rollback (crates/trusted-server-core/src/request_signing/rotation.rs:61-97)

Sequence:

  1. store_private_key (secret store)
  2. store_public_jwk (config store)
  3. update_current_kid (config store)
  4. update_active_kids (config store)

Failure modes that leave the store inconsistent:

  • Fail at (2): private key lives in the secret store with no corresponding JWK — orphaned key material.
  • Fail at (4): current-kid advances to a kid that isn't in active-kids. The edge starts signing with a kid whose JWK is not advertised in the JWKS discovery document, so downstream verifiers that refresh via active-kids will reject every signed request until the store is manually repaired. This is a signing outage.

Fix: on failure after (1) or (2), attempt best-effort deletion of the freshly written artifacts and surface the rollback status in the returned Report. Ideally, reorder so current-kid only flips after both artifacts AND active-kids are confirmed written:

store_private_key → store_public_jwk → update_active_kids → update_current_kid

That ordering ensures any partial failure leaves the old kid still active and the new kid visible in JWKS but unused — a recoverable state.

4. delete_key has a symmetric partial-failure hole that leaks private key material (crates/trusted-server-core/src/request_signing/rotation.rs:205-227)

If deactivate_key and the JWK delete both succeed but secret_store.delete fails, the private key remains in the secret store while its JWK is gone. A retry then fails at the JWK delete step because check_response rejects 404 (see ❓ below), so the operator can't converge by re-running the command.

Fix: order the secret-store delete before the JWK delete so a partial failure leaves a verifiable-but-not-signable orphan JWK (safer than orphaned key material), and treat 404 from both deletes as success so retries converge.

❓ question

5. Are management-API deletes meant to be idempotent? (crates/trusted-server-adapter-fastly/src/management_api.rs:224-322)

check_response rejects any non-2xx, so a 404 on delete_config_item/delete_secret propagates as an error. For key-lifecycle use, idempotent deletes are usually desirable so a retried delete_key converges after a partial failure. Intentional? A minimal change:

if response.get_status() == StatusCode::NOT_FOUND {
    return Ok(());
}

This interacts directly with finding (4).

6. Status check broadened from specific codes to is_success() (crates/trusted-server-adapter-fastly/src/management_api.rs:90)

Old api_client.rs checked specific status codes (OK/NO_CONTENT); now any 2xx passes. Usually a good thing, but worth confirming it's deliberate.

Non-blocking

🤔 thinking

  • TOCTOU in rotate_key (request_signing/rotation.rs:70-89): reads current-kid and active-kids, then writes unconditionally. Two concurrent rotations can lose one side's update. Rotation is admin-gated and rare, so probably acceptable — but a read-before-write invariant check would be cheap insurance.
  • parse_ed25519_signing_key length heuristic is fragile (request_signing/signing.rs:38-56): len > 32 → base64, else raw. The production path always writes base64 via rotation.rs::store_private_key, so a raw value >32 bytes in production means data corruption — and this code hides it behind a misleading "failed to decode base64" error. Consider requiring base64 unconditionally or gating the raw branch behind #[cfg(test)].
  • Per-call client construction (adapter-fastly/src/platform.rs:68-76, 129-142): every put/delete constructs a fresh FastlyManagementApiClient, which re-opens api-keys and re-registers the management backend. Doc acknowledges it, but for a single rotate_key call that's four back-to-back opens to read the same entry. Follow-up could cache an Arc<FastlyManagementApiClient> on the stores behind a OnceLock.

♻️ refactor

  • Inconsistent admin-endpoint response semantics (request_signing/endpoints.rs:237, :345): handle_verify_signature returns 200 with opaque verified: false on internal errors, while handle_rotate_key/handle_deactivate_key return 500 with error: Some(format!("{}", e)) stringifying the full error-stack report. These are admin endpoints and presumably auth-gated upstream, but either the error detail should be opt-in behind a debug flag or the contract should be documented.
  • KeyRotationManager::new takes impl Into<String> (request_signing/rotation.rs:49): CLAUDE.md's "prefer From over Into" guidance suggests &str would be clearer. Minor.

🏕 camp site

  • SpySecretStore::create unconditionally returns Ok (request_signing/rotation.rs:329-342): a failing-create variant would exercise the rollback path requested in (3). A minimal SpySecretStore::with_create_failure_after(n) would unlock real tests for partial-failure recovery.

⛏ nitpick

  • Cargo.toml dep ordering (crates/trusted-server-adapter-fastly/Cargo.toml): trusted-server-js now sits between urlencoding and the alphabetical dependencies.
  • api_key: String (management_api.rs:109): no zeroize-on-drop. Client is short-lived so low-risk, but the previous Vec<u8> is gone. Flagging for awareness only.

👍 praise

  • Extracting FastlyApiClient into the adapter layer is the right layering move — trusted-server-core is now genuinely platform-agnostic for request signing.
  • parse_active_kids now has focused unit tests covering trimming, empty segments, and whitespace-only segments (request_signing/mod.rs:86-128) — directly addresses a prior review comment.
  • SigningPayload using structured JSON instead of delimiter-based formats is a real security improvement (prevents signature confusion attacks), and the round-trip tests in signing.rs validate it end-to-end.

CI Status

  • cargo fmt --all -- --checkPASS (local)
  • cargo clippy --workspace --all-targets --all-features -- -D warningsPASS (local)
  • cargo test --workspacePASS (local)
  • integration tests — PASS (GitHub)
  • browser integration tests — PASS (GitHub)
  • prepare integration artifacts — PASS (GitHub)

Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up with inline comments properly pinned to diff lines. Two corrections to the previous review:

  1. Retraction on the "handler tests assert nothing" 🔧: those weak match { Ok => log / Err => log } tests are pre-existing in feature/edgezero-pr8-content-rewriting; this PR only touched them mechanically to pass services through. Not blocking for PR 9. Reclassifying as a 🏕 for a follow-up: while you're in here, please replace them with real assertions — noop_services() returns Err deterministically, so an assertion like assert_eq!(resp.get_status(), StatusCode::INTERNAL_SERVER_ERROR) on the Ok(resp) branch would pin the contract.

  2. Remaining blockers stand (see inline):

    • SECRET_UPSERT_METHOD = "PUT" is not a documented Fastly verb — second same-kid rotation will hit the wire and likely 405.
    • rotate_key / delete_key are non-atomic across four management-API writes with no rollback — partial failures cause signing outages or orphaned key material.

The architectural direction remains the right one; these are lifecycle-correctness holes sitting on top of a clean refactor.

const FASTLY_API_HOST: &str = "https://api.fastly.com";
const API_KEYS_STORE: &str = "api-keys";
const API_KEY_ENTRY: &str = "api_key";
const SECRET_UPSERT_METHOD: &str = "PUT";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — PUT on the secret-store collection endpoint is not a documented Fastly verb

The previous api_client.rs::create_secret used POST on this exact path (the documented create verb). Fastly's secret-store API expresses upsert semantics via method: "create_or_recreate" in the JSON body, not by switching verb. The two tests guarding this (secret_upsert_method_uses_put and create_secret_uses_secret_store_error_for_transport_failures) don't touch the real API contract — one asserts the constant against itself and the other uses a bogus backend name.

As written, the second rotation of any kid — or a same-day re-rotation via generate_date_based_kid, which produces ts-YYYY-MM-DD — will almost certainly return 405/400 from the management API in production.

Fix:

const SECRET_UPSERT_METHOD: &str = "POST";

fn build_secret_payload(secret_name: &str, secret_value: &str) -> String {
    serde_json::json!({
        "name": secret_name,
        "secret": general_purpose::STANDARD.encode(secret_value.as_bytes()),
        "method": "create_or_recreate",
    })
    .to_string()
}

This is distinct from the earlier base64 review finding — this is about the verb itself.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed SECRET_UPSERT_METHOD to "POST" and added "method": "create_or_recreate" to build_secret_payload, which is the documented Fastly upsert pattern. Updated the test to assert "POST" and added an assertion that the payload includes the create_or_recreate field.

.read_to_string(&mut body)
.change_context(error_kind())?;

if response.get_status().is_success() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question — broadened status acceptance

The old api_client.rs accepted only specific codes (StatusCode::OK / NO_CONTENT); check_response now accepts any 2xx via is_success(). Usually a good thing (202 Accepted passes), but it's a behavior change worth confirming as deliberate.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deliberate. Accepting any 2xx (via is_success()) is more correct than hardcoding specific codes — it handles 202 Accepted in addition to 200/204. No change made.

) -> Result<(), Report<PlatformError>> {
let path = build_config_item_path(store_id, key);

let mut response = self.make_request("DELETE", &path, None, "application/json", || {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question — should deletes be idempotent?

delete_config_item and delete_secret currently reject any non-2xx, so a 404 on DELETE propagates as an error. For key-lifecycle use, idempotent deletes are usually desirable so a retried delete_key converges after a partial failure. Is the non-idempotent behavior intentional?

Minimal change if not:

if response.get_status() == StatusCode::NOT_FOUND {
    return Ok(());
}

This interacts directly with the delete_key partial-failure finding on rotation.rs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Both delete_config_item and delete_secret now return Ok(()) when the management API responds with 404, so retried delete_key calls converge after partial failures.

self.update_current_kid(&new_kid)?;
self.update_active_kids(&active_kids)?;
self.update_current_kid(services, &new_kid)?;
self.update_active_kids(services, &active_kids)?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — rotate_key is non-atomic across four management-API writes with no rollback

Ordering:

  1. store_private_key (secret store)
  2. store_public_jwk (config store)
  3. update_current_kid (config store)
  4. update_active_kids (config store)

Failure modes:

  • Fail at (2): private key lives in the secret store with no corresponding JWK — orphaned key material.
  • Fail at (4): current-kid advances to a kid that isn't yet in active-kids. The edge starts signing with a kid whose public JWK is not advertised in the JWKS discovery document, so downstream verifiers that refresh via active-kids will reject every signed request until the store is manually repaired. Signing outage.

Fix — reorder so current-kid flips last, after BOTH artifacts AND active-kids are confirmed written:

store_private_key → store_public_jwk → update_active_kids → update_current_kid

That way, any partial failure leaves the old kid still active and the new kid visible in JWKS but unused — a recoverable state instead of a signing outage. Ideally also attempt a best-effort delete of freshly-written artifacts on error from (1) or (2) and attach the rollback status to the returned Report.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Reordered to: store_private_key → store_public_jwk → update_active_kids → update_current_kid. Added best-effort rollback: if the JWK write (step 2) fails, the private key is deleted; if update_active_kids (step 3) fails, both are deleted. Any partial failure now leaves the old kid still active rather than causing a signing outage.

message: "failed to delete signing key from secret store".into(),
})?;

Ok(())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — delete_key leaks private key material on symmetric partial failure

If deactivate_key and the JWK delete both succeed but secret_store.delete fails, the private key remains in the secret store while its JWK is gone from the config store. Because check_response rejects 404 (see the delete-idempotency question on management_api.rs), a retry then fails at the JWK-delete step and the operator can't converge by re-running the command.

Fix — order the secret-store delete before the JWK delete so a partial failure leaves a verifiable-but-not-signable orphan JWK (safer than orphaned key material), and treat 404 from both deletes as success so retries converge.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Reordered to: deactivate → delete private key → delete JWK. A partial failure at the secret-store delete now leaves an orphan JWK (verifiable but not signable), which is safer than orphaned key material with no corresponding JWK. Combined with the idempotent-delete fix, a retry on the operator side will converge cleanly.

}
}
if !active_kids.iter().any(|kid| kid == &new_kid) {
active_kids.push(new_kid.clone());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — TOCTOU in active-kids read-then-write

current-kid and active-kids are read here, then written unconditionally below. Two concurrent rotations can lose one side's update. Rotation is admin-gated and rare, so probably acceptable — but a read-before-write invariant check ("did current-kid change between our read and our write?") would be cheap insurance against a foot-gun, especially once this PR lands and the management API path is exercised from more than one operator.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. Rotation is admin-gated and rare, so leaving a TOCTOU follow-up rather than adding locking complexity here. Noted for a later PR.

));
Ok(())
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏕 camp site — SpySecretStore::create unconditionally returns Ok

This is the one place a failing-create variant would exercise the rollback path requested in the rotate_key finding above. A minimal SpySecretStore::with_create_failure_after(n) — or a single fail_create_after: AtomicUsize — would unlock real tests for partial-failure recovery once the rollback logic lands.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added SpySecretStore::with_create_failure_after(n) using an AtomicUsize counter. Added two new tests: rotate_key_fails_when_private_key_store_write_fails (uses with_create_failure_after(0)) and delete_key_removes_secret_before_jwk (verifies JWK is absent from the config store after delete_key completes).

@@ -43,8 +55,10 @@ fn parse_ed25519_signing_key(key_bytes: Vec<u8>) -> Result<SigningKey, Report<Tr
Ok(SigningKey::from_bytes(&key_array))
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — length heuristic is fragile

len > 32 → base64, else raw. The production path always writes base64 via rotation.rs::store_private_key, so a raw value >32 bytes in production means data corruption — and this code will hide it behind a misleading "failed to decode base64" error instead of a "corrupt signing key" error.

Consider requiring base64 unconditionally and updating the one test that injects raw bytes to encode them, or gating the raw branch behind #[cfg(test)].

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the length heuristic entirely. parse_ed25519_signing_key now requires base64 unconditionally — a non-base64 value surfaces a clear "corrupt key material in secret store" error rather than a misleading decode failure. Changed the parameter to &[u8] as clippy suggested.

.attach("failed to initialize Fastly API client for config store write")?
.update_config_item(store_id.as_ref(), key, value)
.change_context(PlatformError::ConfigStore)
let client = crate::management_api::FastlyManagementApiClient::new()?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — per-call client construction

Every put/delete on FastlyPlatformConfigStore and FastlyPlatformSecretStore constructs a fresh FastlyManagementApiClient, which re-opens the api-keys secret store and re-registers the management backend. The doc comment acknowledges this, but for a single rotate_key call that's four back-to-back opens just to read the same api_key entry.

Follow-up (not for this PR): cache an Arc<FastlyManagementApiClient> on the stores behind a OnceLock, so repeated writes in one request share a single backend handle and a single secret-store read.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. Per-call construction noted as a follow-up — caching an Arc<FastlyManagementApiClient> behind a OnceLock on the platform stores. Not changed in this PR.

@prk-Jr prk-Jr changed the base branch from feature/edgezero-pr8-content-rewriting to main April 15, 2026 11:50
Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs Dismissed
Comment thread crates/trusted-server-adapter-fastly/src/management_api.rs Dismissed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire signing to store write primitives

4 participants