Skip to content

DTable reduce (on grouped dtable) performance improvements #294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 22, 2021

Conversation

krynju
Copy link
Member

@krynju krynju commented Oct 17, 2021

(merge before the tables.jl pr - will need to adjust that one once this is merged)

Noticed the grouped reduce is super slow for no real reason (normal reduce is super quick), remembered I just wrapped over normal reduce.
Was a good idea, but it was allocating a lot more than it should and just doing unnecessary work which then had to be transformed later anyway.

# THIS PR
# https://github.com/JuliaParallel/Dagger.jl/pull/294

PS C:\Users\krynjupc\.julia\dev\Dagger> julia -t16 .\dtable_bench2.jl
first groupby (compile overhead)
 46.232115 seconds (291.47 M allocations: 17.844 GiB, 37.25% gc time, 20.10% compilation time)
second groupby
 46.863056 seconds (278.45 M allocations: 17.126 GiB, 50.37% gc time)
first reduce (compile included)
 17.210092 seconds (41.67 M allocations: 2.415 GiB, 54.87% gc time, 17.93% compilation time)
second reduce
 11.101848 seconds (32.15 M allocations: 1.903 GiB, 57.17% gc time)
groupby reduce
 99.229038 seconds (330.47 M allocations: 20.138 GiB, 68.62% gc time)

# CURRENT MASTER
# https://github.com/JuliaParallel/Dagger.jl/pull/291

PS C:\Users\krynjupc\.julia\dev\Dagger> julia -t16 .\dtable_bench2.jl
first groupby (compile overhead)
 43.010412 seconds (294.32 M allocations: 18.008 GiB, 38.43% gc time, 23.85% compilation time)
second groupby
 45.847915 seconds (278.09 M allocations: 17.156 GiB, 49.82% gc time)
first reduce (compile included)
 58.989759 seconds (136.16 M allocations: 8.198 GiB, 60.60% gc time, 14.50% compilation time)
second reduce
 71.114289 seconds (156.93 M allocations: 8.890 GiB, 58.00% gc time)
groupby reduce
227.729765 seconds (575.97 M allocations: 32.183 GiB, 64.63% gc time, 0.08% compilation time)

@codecov-commenter
Copy link

Codecov Report

Merging #294 (4229f91) into master (defe1b1) will not change coverage.
The diff coverage is 0.00%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master    #294   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files          39      39           
  Lines        3184    3189    +5     
======================================
- Misses       3184    3189    +5     
Impacted Files Coverage Δ
src/table/operations.jl 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update defe1b1...4229f91. Read the comment docs.

@krynju krynju force-pushed the kr/dtable-reduce-performance-2 branch from 617dbd7 to 95d7f9b Compare October 18, 2021 17:46
@krynju
Copy link
Member Author

krynju commented Oct 18, 2021

@jpsamaroo This is good to go now. Just added some thing to better resolve columnnames - need a vector of symbols there in the, so had to add some additional code to handle that

@jpsamaroo jpsamaroo merged commit da1feab into JuliaParallel:master Oct 22, 2021
@jpsamaroo
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants