Another template tokenizer implementation by jg-rp · Pull Request #2071 · Shopify/liquid

jg-rp · 2026-04-06T18:27:57Z

This pull request demonstrates an alternative template tokenizer implementation.

Like the tokenizer from #2056, we use String#getbyte, String#byteindex and String#byteslice instead of a StringScanner.

On an M2 Mac Mini, running PHASE=tokenize bundle exec rake benchmark:strict2 gives:

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
           tokenize:   511.000 i/100ms
Calculating -------------------------------------
           tokenize:      5.112k (± 0.1%) i/s  (195.61 μs/i) -    102.711k in  20.091372s

The same benchmark for Liquid v5.12.0:

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
           tokenize:   265.000 i/100ms
Calculating -------------------------------------
           tokenize:      2.658k (± 0.1%) i/s  (376.29 μs/i) -     53.265k in  20.043212s

And for #2056:

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
           tokenize:   454.000 i/100ms
Calculating -------------------------------------
           tokenize:      4.567k (± 0.1%) i/s  (218.96 μs/i) -     91.708k in  20.080724s

Unlike v5.12.0 and #2056, the tokenizer in this PR also outperforms the old splitting regex approach from v5.6.0 when benchmarking without YJIT.

I did want to suggest aStringScanner implementation like this:

RE_START = /\{[{%]/
RE_OUT_BODY = /\{\{[^}]*\}\}?|\{\{.*%\}/
RE_TAG_BODY = /\{%[^%}]*%\}/

# @param scanner [StringScanner]
# @return [Array[String]]
def tokenize(scanner)
  tokens = [] # : Array[String]

  loop do
    case scanner.peek(2)
    when "{{"
      if (match = scanner.scan(RE_OUT_BODY))
        tokens << match
      else
        tokens << "{{"
        scanner.pos += 2
      end
    when "{%"
      if (match = scanner.scan(RE_TAG_BODY))
        tokens << match
      else
        tokens << "{%"
        scanner.pos += 2
      end
    else
      if (match = scanner.scan_until(RE_START))
        tokens << match.chop!.chop!
        scanner.pos -= 2
      else
        tokens << scanner.rest unless scanner.eos?
        break
      end
    end
  end

  tokens
end

Which is a performance improvement over v5.12.0, but does not come close to the byte scanning approach, and doesn't have the same performance characteristics without YJIT.

An alternative template tokenizer implementation

096d52c

jg-rp marked this pull request as ready for review April 7, 2026 07:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Another template tokenizer implementation#2071

Another template tokenizer implementation#2071
jg-rp wants to merge 1 commit intoShopify:mainfrom
jg-rp:new-tokenizer

jg-rp commented Apr 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jg-rp commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jg-rp commented Apr 6, 2026 •

edited

Loading