fix: Do not escape regex in d2:validatePattern [DHIS2-21359]#100
Open
enricocolasante wants to merge 3 commits intomainfrom
Open
fix: Do not escape regex in d2:validatePattern [DHIS2-21359]#100enricocolasante wants to merge 3 commits intomainfrom
enricocolasante wants to merge 3 commits intomainfrom
Conversation
11ea7dc to
aafe0d0
Compare
aafe0d0 to
379b3f4
Compare
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Fix: preserve regex backslash escapes in
d2:validatePatternDHIS2-21359Problem
d2:validatePatternwas silently misinterpreting regex patterns that contained backslash-escaped characters inside character classes.String literals in expressions are stored as
Utf8StringNode, whosedecode()method strips backslashes from escape sequences (\X→X). For most string usage this is correct, but a regex pattern handed tod2:validatePatternshould reach the regex engine with its backslashes intact.The result was that a pattern like
(where the apostrophes include typographic U+2018/U+2019) would be decoded to
[…'-'…]. Java then reads'(U+0027)-U+2018 as a character range spanning code points 39–8216, which accidentally covers the entire Cyrillic block (e.g. И = U+0418 = 1048). So inputs that should not match — likeИван— were incorrectly returningtrue.A naive fix of passing the raw string (with backslashes) directly to the regex engine solves the JVM case but breaks JS: the Kotlin/JS regex engine runs in unicode mode and rejects unknown backslash escapes such as
\',\`, and\as invalid.Fix
Added
Utf8StringNode.decodeToRegex()alongside the existingdecode(). It processes the raw expression-string value specifically for regex use:\'→',\→,\+ U+2018 → U+2018, etc.)\d,\w,\s,\\,\uXXXX,\xXX, etc.)\-as\-— both Java and JS unicode-mode regex recognise this as a literal hyphen inside a character class, preventing it from being interpreted as a range operator between two adjacent charactersCalculator.evalToRawStringnow callsdecodeToRegexforSTRINGnodes and recursively unwrapsARGUMENT/PARnodes. Non-literal arguments (variables, sub-expressions) fall back toevalToStringso existing behaviour is unchanged.Test plan
testValidatePattern_NoMatch— assertsИван(Cyrillic) does not match a Latin/extended-Latin name pattern, on JVM and JStestValidatePattern_Match— assertsJohn(ASCII Latin) does match the same pattern, on JVM and JS