Skip to content

fix(search): preserve value case for non-lowercased bleve fields#2633

Open
dschmidt wants to merge 2 commits intomainfrom
fix/search-preserve-value-case
Open

fix(search): preserve value case for non-lowercased bleve fields#2633
dschmidt wants to merge 2 commits intomainfrom
fix/search-preserve-value-case

Conversation

@dschmidt
Copy link
Copy Markdown

Summary

  • Replace the unconditional strings.ToLower(v) in the bleve compiler with a small allowlist (Name, Tags, Favorites, Content) that matches the fields whose index mapping uses a lowercasing analyzer.
  • Update the existing author:("John Smith" Jane) compiler tests; those names are not in the allowlist, so the compiler now preserves case.
  • Add an end-to-end regression in bleve/backend_test.go: Title:"Some Title" returns 1 hit, Title:"some title" returns 0 hits.

Why

The previous compiler lowercased every query value before emitting a bleve QueryStringQuery. That was a workaround for the four fields whose analyzer lowercases at index time — without the compiler-side fold, a query like Name:Report.pdf wouldn't match the indexed report.pdf. But every other field uses the default keyword analyzer, which preserves case. For those, the unconditional lowercasing silently turned audio.artist:Motörhead, Title:"Some Title", Path:"./Some Dir", etc. into lookups against tokens that didn't exist in the index.

The allowlist approach puts the case-folding decision next to the field mapping it mirrors. If a future field opts into a lowercasing analyzer, it gets added to the allowlist; otherwise it preserves case — which is also what value identity and aggregation display need. Names like deadmau5 or Motörhead stay the way the tag writer wrote them.

Out of scope

  • OpenSearch backend. Its default dynamic mapping uses text (standard analyzer, tokenised + lowercased) with a .keyword multi-field. The equivalent change there requires switching the TermQuery emission to MatchQuery for text fields (analyser applies) and using .keyword for aggregations. Separate PR, once aggregations are actually implemented.
  • The dotted-key KQL grammar (audio.artist:…) lives in feat(kql): support dotted keys in property restrictions #2632. The regression test in this PR uses Title as a stand-in and has a FIXME pointing at that PR — once it lands, Title can be swapped for audio.artist.

The bleve compiler lowercased every query value (except Hidden)
before handing it to the engine. This matched the index tokens
for fields whose analyzer folds case — Name, Tags, Favorites,
Content — but silently broke matching for every other field,
whose default keyword analyzer preserves case. A query like
Title:"Some Title" parsed fine, lowercased to "some title", and
missed the indexed token "Some Title".

Replace the blanket lowercasing with an allowlist of the four
fields whose index mapping actually uses a lowercasing analyzer.
Every other field now passes through unchanged, which keeps
values like "deadmau5" or "Motörhead" intact instead of
normalising them to a case the tag writer didn't choose.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjust Bleve query compilation to avoid case-folding values for fields whose index mapping preserves case, fixing mismatches for keyword-analyzed fields.

Changes:

  • Replace unconditional query-value lowercasing with an allowlist matching fields that lowercase at index time.
  • Update compiler unit tests to reflect preserved case for non-allowlisted fields (e.g., author).
  • Add an end-to-end regression test asserting case-sensitive behavior for Title.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
services/search/pkg/query/bleve/compiler.go Introduces lowercaseFields allowlist and applies lowercasing only for those mapped fields.
services/search/pkg/query/bleve/compiler_test.go Updates expected compiled query strings to preserve case for author values.
services/search/pkg/bleve/backend_test.go Adds regression test ensuring case is preserved for fields not marked lowercase (using Title).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/search/pkg/query/bleve/compiler.go Outdated
Copilot review pointed out that the comment claimed pre-lowercasing
makes non-analyzed query types (wildcard, fuzzy) match for every
allowlisted field. That is true for Name/Tags/Favorites, whose
lowercaseKeyword analyzer emits a single lowercased token, but the
Content analyzer also stems terms — so the guarantee doesn't hold
there. Drop the specific claim and keep the comment to the intent:
stay consistent with the field's analyzer.
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
C Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +17 to +21
var lowercaseFields = map[string]bool{
"Name": true,
"Tags": true,
"Favorites": true,
"Content": true,
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercaseFields is being used as a set; in this repo the common pattern is map[string]struct{} (e.g. services/thumbnails/pkg/thumbnail/mimetypes.go:7) to avoid boolean values and make membership checks explicit. Consider switching to map[string]struct{} and checking membership via _, ok := lowercaseFields[k].

Suggested change
var lowercaseFields = map[string]bool{
"Name": true,
"Tags": true,
"Favorites": true,
"Content": true,
var lowercaseFields = map[string]struct{}{
"Name": {},
"Tags": {},
"Favorites": {},
"Content": {},

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants