You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Compiler support for optimizing PersistentDict (JuliaLang#51993)
This is part of the work to address JuliaLang#51352 by attempting to allow the
compiler to perform SRAO on persistent data structures like
`PersistentDict` as if they were regular immutable data structures.
These sorts of data structures have very complicated internals (with
lots of mutation, memory sharing, etc.), but a relatively simple
interface. As such, it is unlikely that our compiler will have
sufficient power to optimize this interface by analyzing the
implementation.
We thus need to come up with some other mechanism that gives the
compiler license to perform the requisite optimization. One way would be
to just hardcode `PersistentDict` into the compiler, optimizing it like
any of the other builtin datatypes. However, this is of course very
unsatisfying. At the other end of the spectrum would be something like a
generic rewrite rule system (e-graphs anyone?) that would let the
PersistentDict implementation declare its interface to the compiler and
the compiler would use this for optimization (in a perfect world, the
actual rewrite would then be checked using some sort of formal methods).
I think that would be interesting, but we're very far from even being
able to design something like that (at least in Base - experiments with
external AbstractInterpreters in this direction are encouraged).
This PR tries to come up with a reasonable middle ground, where the
compiler gets some knowledge of the protocol hardcoded without having to
know about the implementation details of the data structure.
The basic ideas is that `Core` provides some magic generic functions
that implementations can extend. Semantically, they are not special.
They dispatch as usual, and implementations are expected to work
properly even in the absence of any compiler optimizations.
However, the compiler is semantically permitted to perform structural
optimization using these magic generic functions. In the concrete case,
this PR introduces the `KeyValue` interface which consists of two
generic functions, `get` and `set`. The core optimization is that the
compiler is allowed to rewrite any occurrence of `get(set(x, k, v), k)`
into `v` without additional legality checks. In particular, the compiler
performs no type checks, conversions, etc. The higher level
implementation code is expected to do all that.
This approach closely matches the general direction we've been taking in
external AbstractInterpreters for embedding additional semantics and
optimization opportunities into Julia code (although we generally use
methods there, rather than full generic functions), so I think we have
some evidence that this sort of approach works reasonably well.
Nevertheless, this is certainly an experiment and the interface is
explicitly declared unstable.
## Current Status
This is fully working and implemented, but the optimization currently
bails on anything but the simplest cases. Filling all those cases in is
not particularly hard, but should be done along with a more invasive
refactoring of SROA, so we should figure out the general direction here
first and then we can finish all that up in a follow-up cleanup.
## Obligatory benchmark
Before:
```
julia> using BenchmarkTools
julia> function foo()
a = Base.PersistentDict(:a => 1)
return a[:a]
end
foo (generic function with 1 method)
julia> @benchmark foo()
BenchmarkTools.Trial: 10000 samples with 993 evaluations.
Range (min … max): 32.940 ns … 28.754 μs ┊ GC (min … max): 0.00% … 99.76%
Time (median): 49.647 ns ┊ GC (median): 0.00%
Time (mean ± σ): 57.519 ns ± 333.275 ns ┊ GC (mean ± σ): 10.81% ± 2.22%
▃█▅ ▁▃▅▅▃▁ ▁▃▂ ▂
▁▂▄▃▅▇███▇▃▁▂▁▁▁▁▁▁▁▁▂▂▅██████▅▂▁▁▁▁▁▁▁▁▁▁▂▃▃▇███▇▆███▆▄▃▃▂▂ ▃
32.9 ns Histogram: frequency by time 68.6 ns <
Memory estimate: 128 bytes, allocs estimate: 4.
julia> @code_typed foo()
CodeInfo(
1 ─ %1 = invoke Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}(Base.HashArrayMappedTries.undef::UndefInitializer, 1::Int64)::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│ %2 = %new(Base.HashArrayMappedTries.HAMT{Symbol, Int64}, %1, 0x00000000)::Base.HashArrayMappedTries.HAMT{Symbol, Int64}
│ %3 = %new(Base.HashArrayMappedTries.Leaf{Symbol, Int64}, :a, 1)::Base.HashArrayMappedTries.Leaf{Symbol, Int64}
│ %4 = Base.getfield(%2, :data)::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│ %5 = $(Expr(:boundscheck, true))::Bool
└── goto JuliaLang#5 if not %5
2 ─ %7 = Base.sub_int(1, 1)::Int64
│ %8 = Base.bitcast(UInt64, %7)::UInt64
│ %9 = Base.getfield(%4, :size)::Tuple{Int64}
│ %10 = $(Expr(:boundscheck, true))::Bool
│ %11 = Base.getfield(%9, 1, %10)::Int64
│ %12 = Base.bitcast(UInt64, %11)::UInt64
│ %13 = Base.ult_int(%8, %12)::Bool
└── goto JuliaLang#4 if not %13
3 ─ goto JuliaLang#5
4 ─ %16 = Core.tuple(1)::Tuple{Int64}
│ invoke Base.throw_boundserror(%4::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}, %16::Tuple{Int64})::Union{}
└── unreachable
5 ┄ %19 = Base.getfield(%4, :ref)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│ %20 = Base.memoryref(%19, 1, false)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│ Base.memoryrefset!(%20, %3, :not_atomic, false)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
└── goto JuliaLang#6
6 ─ %23 = Base.getfield(%2, :bitmap)::UInt32
│ %24 = Base.or_int(%23, 0x00010000)::UInt32
│ Base.setfield!(%2, :bitmap, %24)::UInt32
└── goto JuliaLang#7
7 ─ %27 = %new(Base.PersistentDict{Symbol, Int64}, %2)::Base.PersistentDict{Symbol, Int64}
└── goto JuliaLang#8
8 ─ %29 = invoke Base.getindex(%27::Base.PersistentDict{Symbol, Int64}, 🅰️:Symbol)::Int64
└── return %29
```
After:
```
julia> using BenchmarkTools
julia> function foo()
a = Base.PersistentDict(:a => 1)
return a[:a]
end
foo (generic function with 1 method)
julia> @benchmark foo()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 2.459 ns … 11.320 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.460 ns ┊ GC (median): 0.00%
Time (mean ± σ): 2.469 ns ± 0.183 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ █ ▁ █ ▂
█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█ █
2.46 ns Histogram: log(frequency) by time 2.47 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @code_typed foo()
CodeInfo(
1 ─ return 1
```
0 commit comments