Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Further improving fast date parsing #19519

Merged
merged 5 commits into from
Dec 20, 2016

Conversation

omus
Copy link
Member

@omus omus commented Dec 7, 2016

Iterating on #18000 and #19545. Speed up is currently 140x.

before:

julia> @benchmark DateTime("2010-10-10T10:10:10.10")
BenchmarkTools.Trial: 
  memory estimate:  3.11 kb
  allocs estimate:  85
  --------------
  minimum time:     16.768 μs (0.00% GC)
  median time:      17.943 μs (0.00% GC)
  mean time:        20.198 μs (1.78% GC)
  maximum time:     3.734 ms (96.53% GC)
  --------------
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

Shashi's update #19545

julia> @benchmark DateTime("2010-10-10T10:10:10.10")
BenchmarkTools.Trial: 
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     138.344 ns (0.00% GC)
  median time:      140.283 ns (0.00% GC)
  mean time:        157.004 ns (0.00% GC)
  maximum time:     1.683 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     861
  time tolerance:   5.00%
  memory tolerance: 1.00%

This PR:

julia> @benchmark DateTime("2010-10-10T10:10:10.10")
BenchmarkTools.Trial: 
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     118.559 ns (0.00% GC)
  median time:      125.402 ns (0.00% GC)
  mean time:        138.223 ns (0.00% GC)
  maximum time:     2.867 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     907
  time tolerance:   5.00%
  memory tolerance: 1.00%

fixes #15888, #13644
supersedes #18000, #19545

@omus
Copy link
Member Author

omus commented Dec 7, 2016

I'm also experimenting with some ideas to apply this performance gain to all date formats. Feel free to merge in the mean time.

@simonbyrne
Copy link
Contributor

@shashi?

@kshyatt kshyatt added performance Must go faster dates Dates, times, and the Dates stdlib module labels Dec 7, 2016
@omus omus force-pushed the cv/dates-fast-parsing branch 2 times, most recently from 1072d8e to 91c5976 Compare December 14, 2016 22:57
@omus omus changed the title Further improving on #18000 WIP: Further improving fast date parsing Dec 14, 2016
@omus
Copy link
Member Author

omus commented Dec 14, 2016

Modified direction of the PR to work off of #19545. Mostly I am cleaning up the code without losing the speedup. I will probably also address #16928 here and may also include some custom code for specific formats.

Copy link
Contributor

@shashi shashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@omus were you able to accomodate the changes required for TimeZones? I see that things still deal with NTupl{7,Int}s...

I have a new commit, should try and merge it here.

j = 1
while (maxchars <= 0 || j <= maxchars) && i <= len
max_pos = maxchars <= 0 ? len : min(chr2ind(str, ind2chr(str,i) + maxchars - 1), len)
while i <= max_pos
c, ii = next(str, i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can actually also put @inbounds on this line and pretty much everywhere next is called to squeeze out a few more nanoseconds!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see you have it for parsing numbers

@shashi
Copy link
Contributor

shashi commented Dec 15, 2016

I think this can be done by something like adding a parameter to DatePart{char, T} where T is the output type, in this case the type of the result in _tryparse can still be determined... You might have other ideas.

@omus
Copy link
Member Author

omus commented Dec 15, 2016

@shashi I haven't yet dealt with parsing support for TimeZones. I have some ideas including what you suggested but I'll need to do more experimentation to find out what works best.


@label error
raise && _raiseexception(t, err_idx, state)
raise && throw(gen_exception(t, err_idx, state))
return R()
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a tryparsenext in Base... Something like CSV.jl could use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this tryparse_internal should become tryparsenext and return the state as the second argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But of course that's a project of its own, changing all tryparses in Base to work that way.

j = 1
while (maxchars <= 0 || j <= maxchars) && i <= len
max_pos = maxchars <= 0 ? len : min(chr2ind(str, ind2chr(str,i) + maxchars - 1), len)
while i <= max_pos
c, ii = next(str, i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see you have it for parsing numbers

@omus
Copy link
Member Author

omus commented Dec 16, 2016

Added fundamentals of adding custom slots to TimeType parsing (supports parsing of ZonedDateTime type). Changes don't impact performance of Date or DateTime

@shashi
Copy link
Contributor

shashi commented Dec 17, 2016

Nice! :)

Is ZonedDateTime parsing itself fast?

@omus
Copy link
Member Author

omus commented Dec 17, 2016

ZonedDateTime parsing is about 10x faster. There are probably some additional things I can do outside of base to improve the performance more.

@omus
Copy link
Member Author

omus commented Dec 17, 2016

I've got one more change I'd like to do which will be done on Monday.

omus added 3 commits December 19, 2016 09:00
```julia
julia> s = "2010-10-11T12:13:14.15"
"2010-10-11T12:13:14.15"

julia> df_default = Base.Dates.DateFormat("yyyy-mm-ddTHH:MM:SS.s")
dateformat"yyyy-mm-ddTHH:MM:SS.s"

julia> df_custom = Base.Dates.DateFormat("yyyy-mm-dd\\THH:MM:SS.s")
dateformat"yyyy-mm-ddTHH:MM:SS.s"

julia> @benchmark parse(DateTime,$s,$df_default)
BenchmarkTools.Trial:
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     127.267 ns (0.00% GC)
  median time:      135.160 ns (0.00% GC)
  mean time:        147.785 ns (0.00% GC)
  maximum time:     2.450 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     886
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark parse(DateTime,$s,$df_custom)
BenchmarkTools.Trial:
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     71.947 ns (0.00% GC)
  median time:      74.662 ns (0.00% GC)
  mean time:        82.548 ns (0.00% GC)
  maximum time:     2.529 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     976
  time tolerance:   5.00%
  memory tolerance: 1.00%

```
@omus
Copy link
Member Author

omus commented Dec 19, 2016

I've added support for custom parsers which now makes DateTime parsing using ISODateTimeFormat 230x faster.

julia> @benchmark DateTime("2010-10-10T10:10:10.10")
BenchmarkTools.Trial: 
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     70.023 ns (0.00% GC)
  median time:      77.467 ns (0.00% GC)
  mean time:        109.659 ns (0.00% GC)
  maximum time:     9.145 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     976
  time tolerance:   5.00%
  memory tolerance: 1.00%

@omus
Copy link
Member Author

omus commented Dec 19, 2016

@shashi feel free to merge in my changes again. Code should be ready for review.

@tkelman
Copy link
Contributor

tkelman commented Dec 19, 2016

should we change the target branch of this pr?

@omus
Copy link
Member Author

omus commented Dec 19, 2016

Sure. Do I need to remove the "fixes" / "supersedes" notes before merging?

@tkelman
Copy link
Contributor

tkelman commented Dec 19, 2016

good question, don't think so - i believe github only auto closes things that merge to master

@omus omus changed the base branch from master to sh/fast_date_parsing December 19, 2016 15:36
@@ -66,11 +66,15 @@ b2 = "96/Feb/1"
# Here we've specifed a text month name, but given a number
b3 = "96/2/15"
@test_throws ArgumentError Dates.DateTime(b3,f)
let err = try DateTime("2012-02-20T09:09:3.43i9") catch err; err end
try
Dates.tryparse_internal(DateTime, "2012-02-20T09:09:3.43i9", Dates.ISODateTimeFormat, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a @test false here if the exception should happen all the time, or if it doesn't throw then this would pass silently

@shashi shashi merged commit fdeef5d into JuliaLang:sh/fast_date_parsing Dec 20, 2016
@shashi
Copy link
Contributor

shashi commented Dec 20, 2016

👍

@omus omus deleted the cv/dates-fast-parsing branch December 20, 2016 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dates Dates, times, and the Dates stdlib module performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants