You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
improve performance issue of @nospecialize-d keyword func call
This commit tries to fix and improve performance for calling keyword
funcs whose arguments types are not fully known but `@nospecialize`-d.
The final result would look like (this particular example is taken from
our Julia-level compiler implementation):
```julia
abstract type CallInfo end
struct NoCallInfo <: CallInfo end
struct NewInstruction
stmt::Any
type::Any
info::CallInfo
line::Union{Int32,Nothing} # if nothing, copy the line from previous statement in the insertion location
flag::Union{UInt8,Nothing} # if nothing, IR flags will be recomputed on insertion
function NewInstruction(@nospecialize(stmt), @nospecialize(type), @nospecialize(info::CallInfo),
line::Union{Int32,Nothing}, flag::Union{UInt8,Nothing})
return new(stmt, type, info, line, flag)
end
end
@nospecialize
function NewInstruction(newinst::NewInstruction;
stmt=newinst.stmt,
type=newinst.type,
info::CallInfo=newinst.info,
line::Union{Int32,Nothing}=newinst.line,
flag::Union{UInt8,Nothing}=newinst.flag)
return NewInstruction(stmt, type, info, line, flag)
end
@Specialize
using BenchmarkTools
struct VirtualKwargs
stmt::Any
type::Any
info::CallInfo
end
vkws = VirtualKwargs(nothing, Any, NoCallInfo())
newinst = NewInstruction(nothing, Any, NoCallInfo(), nothing, nothing)
runner(newinst, vkws) = NewInstruction(newinst; vkws.stmt, vkws.type, vkws.info)
@benchmark runner($newinst, $vkws)
```
> on master
```
BenchmarkTools.Trial: 10000 samples with 186 evaluations.
Range (min … max): 559.898 ns … 4.173 μs ┊ GC (min … max): 0.00% … 85.29%
Time (median): 605.608 ns ┊ GC (median): 0.00%
Time (mean ± σ): 638.170 ns ± 125.080 ns ┊ GC (mean ± σ): 0.06% ± 0.85%
█▇▂▆▄ ▁█▇▄▂ ▂
██████▅██████▇▇▇██████▇▇▇▆▆▅▄▅▄▂▄▄▅▇▆▆▆▆▆▅▆▆▄▄▅▅▄▃▄▄▄▅▃▅▅▆▅▆▆ █
560 ns Histogram: log(frequency) by time 1.23 μs <
Memory estimate: 32 bytes, allocs estimate: 2.
```
> on this commit
```julia
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 3.080 ns … 83.177 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.098 ns ┊ GC (median): 0.00%
Time (mean ± σ): 3.118 ns ± 0.885 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▅▇█▆▅▄▂
▂▄▆▆▇████████▆▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▂▂▂▁▂▂▂▂▂▂▁▁▂▁▂▂▂▂▂▂▂▂▂ ▃
3.08 ns Histogram: frequency by time 3.19 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
```
So for this particular case it achieves roughly 200x speed up.
This is because this commit allows inlining of a call to keyword sorter
as well as removal of `NamedTuple` call.
Especially this commit is composed of the following improvements:
- Add early return case for `structdiff`:
This change improves the return type inference for a case when
compared `NamedTuple`s are type unstable but there is no difference
in their names, e.g. given two `NamedTuple{(:a,:b),T} where T<:Tuple{Any,Any}`s.
And in such case the optimizer will remove `structdiff` and succeeding
`pairs` calls, letting the keyword sorter to be inlined.
- Tweak the core `NamedTuple{names}(args::Tuple)` constructor so that it
directly forms `:splatnew` allocation rather than redirects to the
general `NamedTuple` constructor, that could be confused for abstract
input tuple type.
- Improve `nfields_tfunc` accuracy as for abstract `NamedTuple` types.
This improvement lets `inline_splatnew` to handle more abstract
`NamedTuple`s, especially whose names are fully known but its fields
tuple type is abstract.
Those improvements are combined to allow our SROA pass to optimize away
`NamedTuple` and `tuple` calls generated for keyword argument handling.
E.g. the IR for the example `NewInstruction` constructor is now fairly
optimized, like:
```julia
julia> Base.code_ircode((NewInstruction,Any,Any,CallInfo)) do newinst, stmt, type, info
NewInstruction(newinst; stmt, type, info)
end |> only
2 1 ── %1 = Base.getfield(_2, :line)::Union{Nothing, Int32} │╻╷ Type##kw
│ %2 = Base.getfield(_2, :flag)::Union{Nothing, UInt8} ││┃ getproperty
│ %3 = (isa)(%1, Nothing)::Bool ││
│ %4 = (isa)(%2, Nothing)::Bool ││
│ %5 = (Core.Intrinsics.and_int)(%3, %4)::Bool ││
└─── goto #3 if not %5 ││
2 ── %7 = %new(Main.NewInstruction, _3, _4, _5, nothing, nothing)::NewInstruction NewInstruction
└─── goto #10 ││
3 ── %9 = (isa)(%1, Int32)::Bool ││
│ %10 = (isa)(%2, Nothing)::Bool ││
│ %11 = (Core.Intrinsics.and_int)(%9, %10)::Bool ││
└─── goto #5 if not %11 ││
4 ── %13 = π (%1, Int32) ││
│ %14 = %new(Main.NewInstruction, _3, _4, _5, %13, nothing)::NewInstruction│││╻ NewInstruction
└─── goto #10 ││
5 ── %16 = (isa)(%1, Nothing)::Bool ││
│ %17 = (isa)(%2, UInt8)::Bool ││
│ %18 = (Core.Intrinsics.and_int)(%16, %17)::Bool ││
└─── goto #7 if not %18 ││
6 ── %20 = π (%2, UInt8) ││
│ %21 = %new(Main.NewInstruction, _3, _4, _5, nothing, %20)::NewInstruction│││╻ NewInstruction
└─── goto #10 ││
7 ── %23 = (isa)(%1, Int32)::Bool ││
│ %24 = (isa)(%2, UInt8)::Bool ││
│ %25 = (Core.Intrinsics.and_int)(%23, %24)::Bool ││
└─── goto #9 if not %25 ││
8 ── %27 = π (%1, Int32) ││
│ %28 = π (%2, UInt8) ││
│ %29 = %new(Main.NewInstruction, _3, _4, _5, %27, %28)::NewInstruction │││╻ NewInstruction
└─── goto #10 ││
9 ── Core.throw(ErrorException("fatal error in type inference (type bound)"))::Union{}
└─── unreachable ││
10 ┄ %33 = φ (#2 => %7, #4 => %14, #6 => %21, #8 => %29)::NewInstruction ││
└─── goto #11 ││
11 ─ return %33 │
=> NewInstruction
```
0 commit comments