[CSM-1857] Stabilize Azure detector hash_v2 with deterministic iteration#4846
Open
dipto-truffle wants to merge 3 commits intomainfrom
Open
[CSM-1857] Stabilize Azure detector hash_v2 with deterministic iteration#4846dipto-truffle wants to merge 3 commits intomainfrom
dipto-truffle wants to merge 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
pkg/detectors/azure_entra/serviceprincipal/v2/spv2_determinism_test.go
Outdated
Show resolved
Hide resolved
| if r == nil { | ||
| // Only include the clientId and tenantId if we're confident which one it is. | ||
| if len(clientIds) != 1 { | ||
| if len(activeClients) != 1 { |
Contributor
There was a problem hiding this comment.
I might be mistaken, but removing the comments feels a bit aggressive; the PR strips out almost all comments in ProcessData. What do you think?
Contributor
Author
There was a problem hiding this comment.
[Agent-generated] Good call — restored the comments that explain non-obvious intent:
- "Skip known-invalid client/tenant combinations." (corrected from the misleading original "Skip known invalid tenants")
- "Tenant doesn't exist." / "Tenant exists, ensure this isn't attempted as a clientId." (cross-map deletion rationale)
- "Tenant doesn't exist; shouldn't happen given the check above." (defensive-check note)
- "Tenant is valid but the client ID doesn't exist in it."
- "The result is verified or there's only one associated client and tenant." (break condition)
- "Only include the clientId and tenantId if we're confident which one it is." (ambiguous-ID clearing)
Dropped only // Handle errors. since it was immediately above if verificationErr != nil and added no value.
ProcessData in the Azure Entra v2 detector iterated Go maps non-deterministically, causing the same credential to produce different (clientId, tenantId) pairings across scanner runs. This yielded different RawV2 values, different hash_v2 hashes, and duplicate secret rows in the database (CSM-1857 secondary issue). Iterate map keys in sorted order via slices.Sorted(maps.Keys(...)) and clone caller maps with maps.Clone before verification-driven mutations. Same inputs now always produce the same RawV2/hash_v2. Made-with: Cursor
The test previously passed verify=false, but all delete calls that mutate maps are inside the `if verify` block, making the test a no-op. Use verify=true with a mock HTTP client that triggers tenant-not-found deletions so the test actually validates the maps.Clone protection. Made-with: Cursor
Address review feedback: the prior commits removed too many comments from ProcessData. Restore the ones that explain non-obvious intent (cross-map deletion rationale, defensive-check notes, ambiguous-ID clearing logic) while dropping the genuinely redundant ones. Made-with: Cursor
d684545 to
0a1c6a7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
ProcessDatain the Azure Entra v2 detector iterated Go maps (clientSecrets,clientIds,tenantIds) non-deterministically, causing the same credential to produce different(clientId, tenantId)pairings across scanner runs. This yielded differentRawV2values, differenthash_v2hashes, and duplicate secret rows in the database (the secondary issue in CSM-1857).slices.Sorted(maps.Keys(...))and clone caller maps withmaps.Clonebefore verification-driven mutations. Same inputs now always produce the sameRawV2/hash_v2.Corresponds to trufflesecurity/thog#5936.
Test plan
go test ./pkg/detectors/azure_entra/serviceprincipal/v2/ -v— all 8 tests passTestProcessData_DeterministicRawV2confirms identical RawV2 across 50 repeated calls with the same inputsTestProcessData_DoesNotMutateCallerMapsconfirms caller maps are not modifiedTestProcessData_RawV2DependsOnIDCountconfirms RawV2 is populated only with unambiguous IDsTestProcessData_SameSecretDifferentRawV2documents the RawV2 divergence when chunk context differsgo test ./pkg/detectors/azure_entra/...— all passMade with Cursor
Note
Medium Risk
Changes the credential pairing/selection logic used to compute
RawV2, which can affect deduplication and reported results for Azure secrets (and is reused by v1). Scope is limited and covered by new determinism and immutability tests.Overview
Stabilizes Azure Entra service principal
hash_v2generation.ProcessDatanow iteratesclientSecrets,clientIds, andtenantIdsin sorted-key order and clones the caller-provided ID maps before performing verification-drivendeleteoperations, eliminating cross-run nondeterminism and input mutation.Adds a new
spv2_determinism_test.gosuite to assert deterministicRawV2across repeated runs, document whenRawV2is omitted due to ambiguous IDs, and ensureProcessDatadoes not mutate the caller’s maps during verification.Reviewed by Cursor Bugbot for commit 0a1c6a7. Bugbot is set up for automated code reviews on this repo. Configure here.