-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Further improving fast date parsing #19519
WIP: Further improving fast date parsing #19519
Conversation
I'm also experimenting with some ideas to apply this performance gain to all date formats. Feel free to merge in the mean time. |
1072d8e
to
91c5976
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@omus were you able to accomodate the changes required for TimeZones? I see that things still deal with NTupl{7,Int}s...
I have a new commit, should try and merge it here.
j = 1 | ||
while (maxchars <= 0 || j <= maxchars) && i <= len | ||
max_pos = maxchars <= 0 ? len : min(chr2ind(str, ind2chr(str,i) + maxchars - 1), len) | ||
while i <= max_pos | ||
c, ii = next(str, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can actually also put @inbounds
on this line and pretty much everywhere next
is called to squeeze out a few more nanoseconds!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see you have it for parsing numbers
I think this can be done by something like adding a parameter to |
@shashi I haven't yet dealt with parsing support for TimeZones. I have some ideas including what you suggested but I'll need to do more experimentation to find out what works best. |
|
||
@label error | ||
raise && _raiseexception(t, err_idx, state) | ||
raise && throw(gen_exception(t, err_idx, state)) | ||
return R() | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a tryparsenext
in Base... Something like CSV.jl could use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this tryparse_internal
should become tryparsenext
and return the state as the second argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But of course that's a project of its own, changing all tryparse
s in Base to work that way.
j = 1 | ||
while (maxchars <= 0 || j <= maxchars) && i <= len | ||
max_pos = maxchars <= 0 ? len : min(chr2ind(str, ind2chr(str,i) + maxchars - 1), len) | ||
while i <= max_pos | ||
c, ii = next(str, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see you have it for parsing numbers
Added fundamentals of adding custom slots to |
Nice! :) Is ZonedDateTime parsing itself fast? |
ZonedDateTime parsing is about 10x faster. There are probably some additional things I can do outside of base to improve the performance more. |
I've got one more change I'd like to do which will be done on Monday. |
```julia julia> s = "2010-10-11T12:13:14.15" "2010-10-11T12:13:14.15" julia> df_default = Base.Dates.DateFormat("yyyy-mm-ddTHH:MM:SS.s") dateformat"yyyy-mm-ddTHH:MM:SS.s" julia> df_custom = Base.Dates.DateFormat("yyyy-mm-dd\\THH:MM:SS.s") dateformat"yyyy-mm-ddTHH:MM:SS.s" julia> @benchmark parse(DateTime,$s,$df_default) BenchmarkTools.Trial: memory estimate: 0.00 bytes allocs estimate: 0 -------------- minimum time: 127.267 ns (0.00% GC) median time: 135.160 ns (0.00% GC) mean time: 147.785 ns (0.00% GC) maximum time: 2.450 μs (0.00% GC) -------------- samples: 10000 evals/sample: 886 time tolerance: 5.00% memory tolerance: 1.00% julia> @benchmark parse(DateTime,$s,$df_custom) BenchmarkTools.Trial: memory estimate: 0.00 bytes allocs estimate: 0 -------------- minimum time: 71.947 ns (0.00% GC) median time: 74.662 ns (0.00% GC) mean time: 82.548 ns (0.00% GC) maximum time: 2.529 μs (0.00% GC) -------------- samples: 10000 evals/sample: 976 time tolerance: 5.00% memory tolerance: 1.00% ```
I've added support for custom parsers which now makes DateTime parsing using ISODateTimeFormat 230x faster. julia> @benchmark DateTime("2010-10-10T10:10:10.10")
BenchmarkTools.Trial:
memory estimate: 0.00 bytes
allocs estimate: 0
--------------
minimum time: 70.023 ns (0.00% GC)
median time: 77.467 ns (0.00% GC)
mean time: 109.659 ns (0.00% GC)
maximum time: 9.145 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 976
time tolerance: 5.00%
memory tolerance: 1.00% |
@shashi feel free to merge in my changes again. Code should be ready for review. |
should we change the target branch of this pr? |
Sure. Do I need to remove the "fixes" / "supersedes" notes before merging? |
good question, don't think so - i believe github only auto closes things that merge to master |
@@ -66,11 +66,15 @@ b2 = "96/Feb/1" | |||
# Here we've specifed a text month name, but given a number | |||
b3 = "96/2/15" | |||
@test_throws ArgumentError Dates.DateTime(b3,f) | |||
let err = try DateTime("2012-02-20T09:09:3.43i9") catch err; err end | |||
try | |||
Dates.tryparse_internal(DateTime, "2012-02-20T09:09:3.43i9", Dates.ISODateTimeFormat, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a @test false
here if the exception should happen all the time, or if it doesn't throw then this would pass silently
👍 |
Iterating on #18000 and #19545. Speed up is currently 140x.
before:
Shashi's update #19545
This PR:
fixes #15888, #13644
supersedes #18000, #19545