Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

give sum(T::Type, xs) = zero(T) when xs is empy #20561

Closed
wants to merge 2 commits into from

Conversation

felixrehren
Copy link
Contributor

@felixrehren felixrehren commented Feb 10, 2017

From #5311, we allow notation like sum(T::Type, xs) to cast the return type of the sum. Currently

julia> sum(Float64, [])
ERROR: ArgumentError: reducing over an empty collection is not allowed

With this PR:

julia> sum(Float64, [])
0.0

This lets you avoid writing boilerplate if isempty(xs) ... else ... code around what should be a simple sum(T, xs), by handling the case when xs is empty in the natural way.

This is achieved by extending mr_empty, which dispatches on f::Callable, op, T::Type. Here Callable = Union{Type, Function}. When f::Function, there is currently nothing we can say in general (some special cases are defined, like identity and abs2). When f::Type, however, the user has indicated what resulting type he would like: therefore, when op = +, return zero(f), and when op = *, return one(f). This also enhances the behaviour of sum(T::Type, xs) and prod(T::Type, xs).

Note this introduces a new error possibility:

julia> sum(String, [])
ERROR: MethodError: no method matching zero(::Type{String})

but I feel like anyone who tries to sum strings or any type T without a zero(T) has only himself to blame. And of course

julia> prod(String, [])
""

We allow users to call `sum(T::Type, xs)`. Currently, e.g. `sum(Float64, [])` gives an error because the `eltype` of `[]` is `Any` and so inference fails. But the fact that the user supplied a type means that we know the intended `eltype`, so we can use it to set e.g. `sum(Float64, []) = zero(Float64)`.
@ararslan
Copy link
Member

I don't think the behavior sum(T, Any[]) == zero(T) is sound. If the element type of the input can be anything, it's not safe to assume that one could convert any given element to T if the input collection was non-empty. So I don't think it makes sense to make that assumption in the empty case.

@pabloferz
Copy link
Contributor

Also, if you're going to write sum(Float64, []) you might just write sum(Float64[]) instead, which already works.

@oscardssmith
Copy link
Member

The place where this is better is stuff like sum(Float64, x) where x is an array of Float32's. This would allow the removal of current sum promotion magic

@tkelman tkelman added the needs tests Unit tests are required for this change label Feb 10, 2017
@felixrehren
Copy link
Contributor Author

felixrehren commented Feb 10, 2017

@ararslan The caller of sum(T, xs) is expecting eltype(xs) to be convertible to T. It's their responsibility to make sure that expectation is right, otherwise they'll be hit by errors whenever xs is nonempty. So the assumption doesn't hold in the empty case in general -- but, conditional on the caller having decided this is what they need, I think it's reasonable. [edit: callee -> caller, got it backwards before ...]

@pabloferz Agreed, of course. And someone who was going to write sum(Float64[]) in the first place might then just write zero(Float64), or just go straight for 0.0... It's useful for the cases when you write sum(Float64, xs), and sometimes xs may be empty. It lets you avoid writing an additional boilerplate if isempty(xs) ... else ... case (edited to suggest that above, too).

@oscardssmith I'm not familiar with that, please could you explain?

@tkelman will add tests if some consensus on usefulness emerges 👍

@ararslan
Copy link
Member

conditional on the callee having decided this is what they need, I think it's reasonable.

That can't be assumed in general, especially when writing generic code. As careful as people try to be with types, type inference can still give you an element type of Any in some cases, e.g. with generators. It's better to deal with those cases explicitly using isempty than to get out a potentially incorrect return type.

This would allow the removal of current sum promotion magic

I don't think that's true at all? This PR concerns only the empty case, plus I'd actually argue that this behavior results in more magic.

@JeffBezanson
Copy link
Member

type inference can still give you an element type of Any in some cases, e.g. with generators

Which I wish we didn't do! But since we have that behavior, we need to mitigate it by reducing the impact of getting a wider-typed array: Any[1,2] and Int[1,2] should behave similarly wherever possible.

So I'd be fine with this, except I believe it makes some cases of sum type-unstable: for example sum(Integer, Int8[1,2]) gives an Int32, but zero(Integer) gives an Int. So the type would depend on whether the argument is empty. Also, if you pass Any as the type you'll get no method matching zero(::Type{Any}) instead of the usual empty-reduce error message.

@nalimilan
Copy link
Member

This behavior makes sense to me too. The fact that the input can contain elements of an incorrect type is irrelevant: sum(Int, ...) will either return an Int or fail if conversion isn't possible. So why not extend this to the case where the input array is empty? This is particularly helpful because we rely on inference for empty arrays: specifying the desired return type to sum will protect your from an unexpected eltype of an empty array.

@felixrehren
Copy link
Contributor Author

felixrehren commented Feb 12, 2017

@JeffBezanson That's interesting, I hadn't thought of that. Where does the Int promotion happen? (I can't follow it here :/) It looks like mapreduce consistently converts its Int8 arguments to Int32, even when the system is 64 bit (here: on juliabox.com / Julia 0.5). I don't know why typeof(zero(Integer)) = Int64, typeof(Integer(zero(Int8))) = Int8, but typeof(sum(Int8[])) == Int32. Could this PR access the same promotion rules, to be compatible with this behaviour?

Of course this type-instability can be avoided by writing if isempty(xs) ... else .... Still, would like to get this type-stable on its own 👍

I agree, the error message is less appropriate when someone writes sum(T,xs) for empty xs and a type T without zero. We would want to add sum(T, itr) to the docstring -- would this be sufficiently informative?

sum(T, itr)

Casts the elements of itr to type T and returns the sum. If itr is empty, returns zero(T).

@martinholters
Copy link
Member

This needs to be considered together with #20560, as presently e.g. typeof(sum(Int8, [1,2,3])) == Int32. So typeof(sum(Int8, [])) == Int8 would be an unfortunate introduction of type-instability. Furthermore, solving #18423 would allow us to forbid summing an empty collection without need for the isempty boilerplate, and the given neutral element could decide the return type. So I don't think this PR makes much sense at the moment, but could be reconsidered after #20560 is settled.

@JeffreySarnoff
Copy link
Contributor

just wondering -- does this suggest prod(T, []) == one(T)?

@felixrehren
Copy link
Contributor Author

Yup!

@tpapp
Copy link
Contributor

tpapp commented Jan 4, 2019

It would be great to revive this PR, or find a solution of some form. I understand that writing type inferrable code is ideal, but sometimes it is not possible or worth it, then one has to special-case sum anyway. I frequently find doing something like

robustzero(_) = 0.0
robustzero(::Type{T}) where {T <: Real} = zero(T)
isempty(v) ? robustzero(eltype(T)) : sum(v)

or similar, eg with nested AD applications where getting inference to work can be tricky.

@JeffreySarnoff
Copy link
Contributor

maybe

Base.sum(::Type{R}, x::AbstractArray) where {R<:Real} = isempty(x) ? sum(zero(R)) : sum(x)

@tpapp
Copy link
Contributor

tpapp commented Jan 4, 2019

Possibly, but I would prefer to go to the fallback only when x has no known element type that has a zero. Eg

sum(Int, Float64[]) === 0.0
sum(Int, Any[]) === 0

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Jan 4, 2019

Base.sum(::Type{R1},x::Array{R2,1}) where {R1<:Real, R2<:Real} = isempty(x) ? sum(zero(R2)) : sum(x)

Base.sum(::Type{R1},x::Array{R2,1}) where {R1<:Real, R2<:Any} = isempty(x) ? sum(zero(R1)) : sum(x)

julia> sum(Int, [1.0, 2.0])
3.0

julia> sum(Int, Float32[])
0.0f0

julia> sum(Int, [])
0

julia> sum(Float32, [])
0.0f0

@tpapp
Copy link
Contributor

tpapp commented Jan 4, 2019

Neat. Now if only we could replace Real with "all types for which zero is defined"; but that may be tedious to maintain in the long run.

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Jan 4, 2019

Does this approach meet the desired outcomes mentioned early in this thread?
(particularly #20561 (comment))

Base.sum(::Type{R1},x::Array{R2,1}) where {R1<:Number, R2<:Number} =
    isempty(x) ? sum(zero(R2)) : sum(x)

Base.sum(::Type{R1},x::Array{R2,1}) where {R1<:Number, R2<:Any} =
    isempty(x) ? sum(zero(R1)) : sum(x)

Base.prod(::Type{R1},x::Array{R2,1}) where {R1<:Number, R2<:Number} =
    isempty(x) ? prod(one(R2)) : prod(x)

Base.prod(::Type{R1},x::Array{R2,1}) where {R1<:Number, R2<:Any} =
    isempty(x) ? prod(one(R1)) : prod(x)


julia> typeof(sum(Int8, [])), typeof(sum(Complex{Float32}, []))
(Int64, Complex{Float32})

julia> typeof(sum(Int8, Rational{Int16}[])), typeof(sum(Complex{Float32}, Complex{Float16}[]))
(Rational{Int16}, Complex{Float16})

julia> typeof(sum(Int8, [0.0])), typeof(sum(Complex{Float32}, [0//1]))
(Float64, Rational{Int64})

@tkf
Copy link
Member

tkf commented Jun 12, 2020

Now that we have init in sum etc. #36188, maybe we can close this PR? In general, it's better to deal with values than types in Julia API.

@tkf tkf added the fold sum, maximum, reduce, foldl, etc. label Jun 13, 2020
@musm
Copy link
Contributor

musm commented Dec 15, 2020

closed since we now have #36188

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fold sum, maximum, reduce, foldl, etc. needs tests Unit tests are required for this change
Projects
None yet
Development

Successfully merging this pull request may close these issues.