fix(search): preserve value case for non-lowercased bleve fields#2633
fix(search): preserve value case for non-lowercased bleve fields#2633
Conversation
The bleve compiler lowercased every query value (except Hidden) before handing it to the engine. This matched the index tokens for fields whose analyzer folds case — Name, Tags, Favorites, Content — but silently broke matching for every other field, whose default keyword analyzer preserves case. A query like Title:"Some Title" parsed fine, lowercased to "some title", and missed the indexed token "Some Title". Replace the blanket lowercasing with an allowlist of the four fields whose index mapping actually uses a lowercasing analyzer. Every other field now passes through unchanged, which keeps values like "deadmau5" or "Motörhead" intact instead of normalising them to a case the tag writer didn't choose.
There was a problem hiding this comment.
Pull request overview
Adjust Bleve query compilation to avoid case-folding values for fields whose index mapping preserves case, fixing mismatches for keyword-analyzed fields.
Changes:
- Replace unconditional query-value lowercasing with an allowlist matching fields that lowercase at index time.
- Update compiler unit tests to reflect preserved case for non-allowlisted fields (e.g.,
author). - Add an end-to-end regression test asserting case-sensitive behavior for
Title.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| services/search/pkg/query/bleve/compiler.go | Introduces lowercaseFields allowlist and applies lowercasing only for those mapped fields. |
| services/search/pkg/query/bleve/compiler_test.go | Updates expected compiled query strings to preserve case for author values. |
| services/search/pkg/bleve/backend_test.go | Adds regression test ensuring case is preserved for fields not marked lowercase (using Title). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Copilot review pointed out that the comment claimed pre-lowercasing makes non-analyzed query types (wildcard, fuzzy) match for every allowlisted field. That is true for Name/Tags/Favorites, whose lowercaseKeyword analyzer emits a single lowercased token, but the Content analyzer also stems terms — so the guarantee doesn't hold there. Drop the specific claim and keep the comment to the intent: stay consistent with the field's analyzer.
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var lowercaseFields = map[string]bool{ | ||
| "Name": true, | ||
| "Tags": true, | ||
| "Favorites": true, | ||
| "Content": true, |
There was a problem hiding this comment.
lowercaseFields is being used as a set; in this repo the common pattern is map[string]struct{} (e.g. services/thumbnails/pkg/thumbnail/mimetypes.go:7) to avoid boolean values and make membership checks explicit. Consider switching to map[string]struct{} and checking membership via _, ok := lowercaseFields[k].
| var lowercaseFields = map[string]bool{ | |
| "Name": true, | |
| "Tags": true, | |
| "Favorites": true, | |
| "Content": true, | |
| var lowercaseFields = map[string]struct{}{ | |
| "Name": {}, | |
| "Tags": {}, | |
| "Favorites": {}, | |
| "Content": {}, |




Summary
strings.ToLower(v)in the bleve compiler with a small allowlist (Name,Tags,Favorites,Content) that matches the fields whose index mapping uses a lowercasing analyzer.author:("John Smith" Jane)compiler tests; those names are not in the allowlist, so the compiler now preserves case.bleve/backend_test.go:Title:"Some Title"returns 1 hit,Title:"some title"returns 0 hits.Why
The previous compiler lowercased every query value before emitting a bleve
QueryStringQuery. That was a workaround for the four fields whose analyzer lowercases at index time — without the compiler-side fold, a query likeName:Report.pdfwouldn't match the indexedreport.pdf. But every other field uses the defaultkeywordanalyzer, which preserves case. For those, the unconditional lowercasing silently turnedaudio.artist:Motörhead,Title:"Some Title",Path:"./Some Dir", etc. into lookups against tokens that didn't exist in the index.The allowlist approach puts the case-folding decision next to the field mapping it mirrors. If a future field opts into a lowercasing analyzer, it gets added to the allowlist; otherwise it preserves case — which is also what value identity and aggregation display need. Names like
deadmau5orMotörheadstay the way the tag writer wrote them.Out of scope
text(standard analyzer, tokenised + lowercased) with a.keywordmulti-field. The equivalent change there requires switching theTermQueryemission toMatchQueryfor text fields (analyser applies) and using.keywordfor aggregations. Separate PR, once aggregations are actually implemented.audio.artist:…) lives in feat(kql): support dotted keys in property restrictions #2632. The regression test in this PR usesTitleas a stand-in and has a FIXME pointing at that PR — once it lands,Titlecan be swapped foraudio.artist.