Skip to content

[CSM-1857] Stabilize Azure detector hash_v2 with deterministic iteration#4846

Open
dipto-truffle wants to merge 3 commits intomainfrom
test/csm-1857-hashv2-instability
Open

[CSM-1857] Stabilize Azure detector hash_v2 with deterministic iteration#4846
dipto-truffle wants to merge 3 commits intomainfrom
test/csm-1857-hashv2-instability

Conversation

@dipto-truffle
Copy link
Copy Markdown
Contributor

@dipto-truffle dipto-truffle commented Mar 27, 2026

Summary

  • Root cause: ProcessData in the Azure Entra v2 detector iterated Go maps (clientSecrets, clientIds, tenantIds) non-deterministically, causing the same credential to produce different (clientId, tenantId) pairings across scanner runs. This yielded different RawV2 values, different hash_v2 hashes, and duplicate secret rows in the database (the secondary issue in CSM-1857).
  • Fix: Iterate map keys in sorted order via slices.Sorted(maps.Keys(...)) and clone caller maps with maps.Clone before verification-driven mutations. Same inputs now always produce the same RawV2/hash_v2.
  • Tests: Added unit tests for RawV2 stability (ID count sensitivity, determinism over 50 runs, caller map immutability, hash divergence chain).

Corresponds to trufflesecurity/thog#5936.

Test plan

  • go test ./pkg/detectors/azure_entra/serviceprincipal/v2/ -v — all 8 tests pass
  • TestProcessData_DeterministicRawV2 confirms identical RawV2 across 50 repeated calls with the same inputs
  • TestProcessData_DoesNotMutateCallerMaps confirms caller maps are not modified
  • TestProcessData_RawV2DependsOnIDCount confirms RawV2 is populated only with unambiguous IDs
  • TestProcessData_SameSecretDifferentRawV2 documents the RawV2 divergence when chunk context differs
  • go test ./pkg/detectors/azure_entra/... — all pass

Made with Cursor


Note

Medium Risk
Changes the credential pairing/selection logic used to compute RawV2, which can affect deduplication and reported results for Azure secrets (and is reused by v1). Scope is limited and covered by new determinism and immutability tests.

Overview
Stabilizes Azure Entra service principal hash_v2 generation. ProcessData now iterates clientSecrets, clientIds, and tenantIds in sorted-key order and clones the caller-provided ID maps before performing verification-driven delete operations, eliminating cross-run nondeterminism and input mutation.

Adds a new spv2_determinism_test.go suite to assert deterministic RawV2 across repeated runs, document when RawV2 is omitted due to ambiguous IDs, and ensure ProcessData does not mutate the caller’s maps during verification.

Reviewed by Cursor Bugbot for commit 0a1c6a7. Bugbot is set up for automated code reviews on this repo. Configure here.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 27, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

@dipto-truffle dipto-truffle marked this pull request as ready for review April 1, 2026 21:11
@dipto-truffle dipto-truffle requested a review from a team April 1, 2026 21:11
@dipto-truffle dipto-truffle requested a review from a team as a code owner April 1, 2026 21:11
if r == nil {
// Only include the clientId and tenantId if we're confident which one it is.
if len(clientIds) != 1 {
if len(activeClients) != 1 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be mistaken, but removing the comments feels a bit aggressive; the PR strips out almost all comments in ProcessData. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Agent-generated] Good call — restored the comments that explain non-obvious intent:

  • "Skip known-invalid client/tenant combinations." (corrected from the misleading original "Skip known invalid tenants")
  • "Tenant doesn't exist." / "Tenant exists, ensure this isn't attempted as a clientId." (cross-map deletion rationale)
  • "Tenant doesn't exist; shouldn't happen given the check above." (defensive-check note)
  • "Tenant is valid but the client ID doesn't exist in it."
  • "The result is verified or there's only one associated client and tenant." (break condition)
  • "Only include the clientId and tenantId if we're confident which one it is." (ambiguous-ID clearing)

Dropped only // Handle errors. since it was immediately above if verificationErr != nil and added no value.

ProcessData in the Azure Entra v2 detector iterated Go maps
non-deterministically, causing the same credential to produce different
(clientId, tenantId) pairings across scanner runs. This yielded
different RawV2 values, different hash_v2 hashes, and duplicate secret
rows in the database (CSM-1857 secondary issue).

Iterate map keys in sorted order via slices.Sorted(maps.Keys(...)) and
clone caller maps with maps.Clone before verification-driven mutations.
Same inputs now always produce the same RawV2/hash_v2.

Made-with: Cursor
The test previously passed verify=false, but all delete calls that
mutate maps are inside the `if verify` block, making the test a no-op.
Use verify=true with a mock HTTP client that triggers tenant-not-found
deletions so the test actually validates the maps.Clone protection.

Made-with: Cursor
Address review feedback: the prior commits removed too many comments
from ProcessData. Restore the ones that explain non-obvious intent
(cross-map deletion rationale, defensive-check notes, ambiguous-ID
clearing logic) while dropping the genuinely redundant ones.

Made-with: Cursor
@dipto-truffle dipto-truffle force-pushed the test/csm-1857-hashv2-instability branch from d684545 to 0a1c6a7 Compare April 7, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants