-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify computation of return type in broadcast #39295
base: master
Are you sure you want to change the base?
Conversation
Since we rely on inference, we can use `_return_type` directly instead of via a complex machinery.
g() = (a = 1; Broadcast.combine_eltypes(x -> x + a, (1.0,))) | ||
@test @inferred(g()) === Float64 | ||
g() = (a = 1; x -> x + a) | ||
@test @inferred(broadcast(g(), 1.0)) === 2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pabloferz @Sacha0 Since you worked on these tests (this one and the one below), could you confirm that the new ones covers the same use case as the old ones? That wasn't completely clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regrettably sufficient time has elapsed since I looked at these tests and such that I no longer have much memory of them. Sorry Milan! :)
Does this have an impact on the inference & codegen time? The broadcast infrastructure is already a big piece of the latency for many packages, just curious whether this makes it better or worse. |
Here's a small benchmark with Master: julia> @time exp.(x);
0.071312 seconds (207.54 k allocations: 12.900 MiB, 99.55% compilation time)
julia> @time exp.(x);
0.073889 seconds (207.54 k allocations: 12.900 MiB, 99.48% compilation time)
julia> @time exp.(x);
0.072427 seconds (207.54 k allocations: 12.900 MiB, 99.54% compilation time) PR: julia> @time exp.(x);
0.075400 seconds (223.03 k allocations: 13.804 MiB, 99.46% compilation time)
julia> @time exp.(x);
0.071174 seconds (223.03 k allocations: 13.804 MiB, 99.56% compilation time)
julia> @time exp.(x);
0.077204 seconds (223.03 k allocations: 13.804 MiB, 99.58% compilation time) So there are a few more allocations, and it might be a bit slower, but it's not super clear. Do you have ideas about other possible benchmarks? |
Maybe one where |
Here's what I get for slightly more complex cases (still with a fresh session for each pair of commands): julia> @time x .+ 1;
0.062047 seconds (165.10 k allocations: 10.133 MiB, 99.52% compilation time)
julia> @time Float32.(x) .+ x .+ 1;
0.132495 seconds (281.68 k allocations: 16.544 MiB, 99.32% compilation time)
julia> @time x .+ 1;
0.063509 seconds (165.10 k allocations: 10.133 MiB, 99.48% compilation time)
julia> @time Float32.(x) .+ x .+ 1;
0.137296 seconds (281.68 k allocations: 16.544 MiB, 99.36% compilation time)
julia> @time x .+ 1;
0.061300 seconds (165.10 k allocations: 10.133 MiB, 99.34% compilation time)
julia> @time Float32.(x) .+ x .+ 1;
0.136219 seconds (281.68 k allocations: 16.544 MiB, 99.40% compilation time) PR: julia> @time x .+ 1;
0.065680 seconds (180.26 k allocations: 11.009 MiB, 99.58% compilation time)
julia> @time Float32.(x) .+ x .+ 1;
0.134870 seconds (296.66 k allocations: 17.251 MiB, 99.40% compilation time)
julia> @time x .+ 1;
0.065040 seconds (180.26 k allocations: 11.009 MiB, 99.35% compilation time)
julia> @time Float32.(x) .+ x .+ 1;
0.141318 seconds (296.66 k allocations: 17.251 MiB, 99.39% compilation time)
julia> @time x .+ 1;
0.068847 seconds (180.26 k allocations: 11.009 MiB, 99.54% compilation time)
julia> @time Float32.(x) .+ x .+ 1;
0.135004 seconds (296.66 k allocations: 17.251 MiB, 99.32% compilation time) So still a slight increase in allocations. But I've found a more serious problem: the CI failure is due to
and
I'm not sure we can actually get rid of these without reinventing most of combine_eltypes . What do you think? BTW, I'm surprised that combine_type is used to determine the type of the result (even when not empty) as it relies on inference.
|
SparseArrays may have some legacy issues with the way it forms the eltype. I think this PR seems reasonable. Should be similar time, since we're about to infer in to the methods (for the runtime code path) anyways. |
@@ -901,7 +888,8 @@ copy(bc::Broadcasted{<:Union{Nothing,Unknown}}) = | |||
const NonleafHandlingStyles = Union{DefaultArrayStyle,ArrayConflict} | |||
|
|||
@inline function copy(bc::Broadcasted{Style}) where {Style} | |||
ElType = combine_eltypes(bc.f, bc.args) | |||
ElType = promote_typejoin_union(Base._return_type(_broadcast_getindex, | |||
Tuple{typeof(bc), Int})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be:
Tuple{typeof(bc), Int})) | |
Tuple{typeof(bc), ndims(bc) == 1 ? eltype(axes(bc)[1]) : CartesianIndex{ndims(bc)}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it Base._return_type(iterate, Base._return_type(eachindex, Tuple{typeof(bc)}))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, dropped a function. I meant:
index_type(bc) = iterate(eachindex(bc))[1]
Base._return_type(index_type, Tuple{typeof(bc)})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting that all together:
Tuple{typeof(bc), Int})) | |
_broadcast_getindex_eltype(bc) = _broadcast_getindex(bc, iterate(eachindex(bc))[1]) | |
ElType = promote_typejoin_union(Base._return_type(_broadcast_getindex_eltype, Tuple{typeof(bc)})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure this is better than the existing code, which does pretty much the same calls, but bases it on calling eltype
, instead of inference, which has at least different tradeoffs for better or worse 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to try to proceed with this PR / design (inferring iterate), or keep the current one (call eltype)?
The tests themselves tend to lend themselves fairly nicely to compile-time benchmarking. E.g., |
Bump? |
Note the currently open question of whether this is actually better or worse (#39295 (comment)) |
Can we wake up this? My local bench shows that this PR is able to reduce about 10% ~ 13% of the time cost of |
The time it takes to run tests isn't usually a very interesting benchmark as tests are a very atypical coding pattern. Do you have evidence that this PR improves performance (or compile times) on real use cases? This isn't to say that I'm opposed to merging it. |
Well l have no further envidence, I just follows above advice from @mbauman to bench. |
The PR is currently wrong, though I have a suggestion above to fix it. The remaining question however, as before, is whether we want this design change (#39295 (comment)) |
Since we rely on inference, we can use
_return_type
directly instead of via a complex machinery.As suggested by @mbauman at #39185 (comment).