Skip to content

Commit b70388c

Browse files
naitohkou
andauthored
Use StringScanner#peek_byte to get double or single quotation mark (#227)
## Why? `StringScanner#peek_byte` is fast, because it does not generate String object. ## Benchmark ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22] Calculating ------------------------------------- before after before(YJIT) after(YJIT) dom 19.753 19.888 35.641 35.928 i/s - 100.000 times in 5.062402s 5.028121s 2.805792s 2.783339s sax 30.349 30.978 53.485 57.885 i/s - 100.000 times in 3.295012s 3.228103s 1.869671s 1.727567s pull 34.170 35.436 61.713 66.534 i/s - 100.000 times in 2.926534s 2.821955s 1.620404s 1.502996s stream 33.121 35.268 60.751 63.276 i/s - 100.000 times in 3.019222s 2.835443s 1.646065s 1.580374s Comparison: dom after(YJIT): 35.9 i/s before(YJIT): 35.6 i/s - 1.01x slower after: 19.9 i/s - 1.81x slower before: 19.8 i/s - 1.82x slower sax after(YJIT): 57.9 i/s before(YJIT): 53.5 i/s - 1.08x slower after: 31.0 i/s - 1.87x slower before: 30.3 i/s - 1.91x slower pull after(YJIT): 66.5 i/s before(YJIT): 61.7 i/s - 1.08x slower after: 35.4 i/s - 1.88x slower before: 34.2 i/s - 1.95x slower stream after(YJIT): 63.3 i/s before(YJIT): 60.8 i/s - 1.04x slower after: 35.3 i/s - 1.79x slower before: 33.1 i/s - 1.91x slower ``` - YJIT=ON : 1.01x - 1.08x faster - YJIT=OFF : 1.00x - 1.06x faster Co-authored-by: Sutou Kouhei <[email protected]>
1 parent bb0bedd commit b70388c

File tree

2 files changed

+28
-2
lines changed

2 files changed

+28
-2
lines changed

lib/rexml/parsers/baseparser.rb

+20-2
Original file line numberDiff line numberDiff line change
@@ -766,6 +766,25 @@ def process_instruction
766766
[:processing_instruction, name, content]
767767
end
768768

769+
if StringScanner::Version < "3.1.1"
770+
def scan_quote
771+
@source.match(/(['"])/, true)&.[](1)
772+
end
773+
else
774+
def scan_quote
775+
case @source.peek_byte
776+
when 34 # '"'.ord
777+
@source.scan_byte
778+
'"'
779+
when 39 # "'".ord
780+
@source.scan_byte
781+
"'"
782+
else
783+
nil
784+
end
785+
end
786+
end
787+
769788
def parse_attributes(prefixes)
770789
attributes = {}
771790
expanded_names = {}
@@ -785,11 +804,10 @@ def parse_attributes(prefixes)
785804
message = "Missing attribute equal: <#{name}>"
786805
raise REXML::ParseException.new(message, @source)
787806
end
788-
unless match = @source.match(/(['"])/, true)
807+
unless quote = scan_quote
789808
message = "Missing attribute value start quote: <#{name}>"
790809
raise REXML::ParseException.new(message, @source)
791810
end
792-
quote = match[1]
793811
start_position = @source.position
794812
value = @source.read_until(quote)
795813
unless value.chomp!(quote)

lib/rexml/source.rb

+8
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,14 @@ def position=(pos)
158158
@scanner.pos = pos
159159
end
160160

161+
def peek_byte
162+
@scanner.peek_byte
163+
end
164+
165+
def scan_byte
166+
@scanner.scan_byte
167+
end
168+
161169
# @return true if the Source is exhausted
162170
def empty?
163171
@scanner.eos?

0 commit comments

Comments
 (0)