Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iron out ProjectTo #467

Closed
1 of 4 tasks
mzgubic opened this issue Jul 5, 2021 · 11 comments
Closed
1 of 4 tasks

Iron out ProjectTo #467

mzgubic opened this issue Jul 5, 2021 · 11 comments

Comments

@mzgubic
Copy link
Member

mzgubic commented Jul 5, 2021

Following JuliaDiff/ChainRulesCore.jl#385 and #459, there are still some things to iron out, namely:

  • How to test that the returned types are correct. An attempt is at WIP: Add the option to test tangent types  ChainRulesTestUtils.jl#186 but we can do better
  • Expand the documentation on how to use ProjectTo when writing rules.
  • Scalar rules, e.g. @scalar_rule x \ y (-(Ω / x), one(y) / x). Do we change the @scalar_rule macro, or do we just write out the rules that need to be projected? Is there a performance impact (I've tried for * and adding project appears to have no effect)?
  • The rule for ::typeof(sum), f, xs::AbstractArray breaks inference when an array of array is passed. Can we solve this?

Perhaps we want to solve some of these before merging the two PRs?

@oxinabox
Copy link
Member

oxinabox commented Jul 5, 2021

Expand the documentation on how to use ProjectTo when writing rules.

I will do this. In a follow up PR

Scalar rules, e.g. @scalar_rule x \ y (-(Ω / x), one(y) / x). Do we change the @scalar_rule macro, or do we just write out the rules that need to be projected? Is there a performance impact (I've tried for * and adding project appears to have no effect)?

@sethaxen may know.
I thought after JuliaDiff/ChainRulesCore.jl#170 we were already doing the right thing always.
But it seems we might not be.
I suspect we project in the @scalar_rule a projection like that should optimize out entirely.

Perhaps we want to solve some of these before merging the two PRs?

Nah, we do not need to

@mzgubic
Copy link
Member Author

mzgubic commented Jul 6, 2021

I thought after JuliaDiff/ChainRulesCore.jl#170 we were already doing the right thing always.
But it seems we might not be.

I think that PR makes sure that the result is right for complex numbers.
Here we are interested in the case where we operate on r::Real and c::Complex, e.g. r * c and we want the cotangent for r to also be Real.

@sethaxen
Copy link
Member

sethaxen commented Jul 6, 2021

I thought after JuliaDiff/ChainRulesCore.jl#170 we were already doing the right thing always.
But it seems we might not be.

I think that PR makes sure that the result is right for complex numbers.
Here we are interested in the case where we operate on r::Real and c::Complex, e.g. r * c and we want the cotangent for r to also be Real.

Yes this is correct. That PR stopped short of projecting, since we had not yet agreed to do that (though a few other rules sneakily project already).

@mcabbott
Copy link
Member

mcabbott commented Jul 6, 2021

Things I find surprising [Edit -- added 8-12]:

julia> using SparseArrays, FillArrays, ChainRulesCore

julia> ProjectTo([1,2.0]')(ones(1,2))  # 1. fails to preserve adjoint vector
1×2 adjoint(::Matrix{Float64}) with eltype Float64:
 1.0  1.0

julia> ProjectTo(Symmetric(rand(2,2)))([1 2; 100 0])  # 2. not a projection onto symmetric part
2×2 Symmetric{Float64, Matrix{Float64}}:
 1.0  2.0
 2.0  0.0

julia> ProjectTo(UpperTriangular(rand(2,2)))(Diagonal([1,2]))  # 3. wasn't there a plan to allow smaller spaces?
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.0
     2.0

julia> ProjectTo(rand(2,2)')(ones(2,2))  # 4. seems like an unnecessary matrix copy, why is this a good idea?
2×2 adjoint(::Matrix{Float64}) with eltype Float64:
 1.0  1.0
 1.0  1.0

julia> ProjectTo(rand(1, 3))(Fill(2.0, 1, 3)) # 5. seems like an unnecessary materialization, why?
1×3 Matrix{Float64}:
 2.0  2.0  2.0

julia> ProjectTo(true)(2.3)  # 6. if Bool is categorical, this should be some Zero
ERROR: InexactError: Bool(2.3)

julia> ProjectTo(1)(2.3)  # 7. but integers aren't, so this should work? 
ERROR: InexactError: Int64(2.3)

julia> ProjectTo(ones(2,5))(ones(10)) == ones(10) # 8. isn't reshaping precisely why you save size, not just type?
true

julia> ProjectTo(ones(2,5))(ones(5,2)) == ones(5,2)  # 9. but perhaps this should be an error, axes too different?
true

julia> s = sprand(3,10,0.2); p = ProjectTo(s); p(ones(3,10))  # 10. wasn't this example most of the reason not to store only the type?
ERROR: MethodError:

julia> ProjectTo([1 2; 3 4])((1,2,3,4)) # 11. would repair  Zygote.gradient(x -> +(x...), [1 2; 3 4])
ERROR: MethodError ...

julia> ProjectTo([1 2 3])(ZeroTangent())  # 12. why materialise, instead of propagating Zero?
1×3 Matrix{Int64}:
 0  0  0

The merged PR JuliaDiff/ChainRulesCore.jl#385 seems like a more elaborate cousin of https://gist.github.com/mcabbott/8a84086cc604d34b5e8dff2eb3839f3a . (Which is a too-minimal example intended only to sketch the idea compactly.) That has one more stage of indirection, in that projector(x) returns a Project{T} which acts on dx. The purpose of that is that, if this projection is to be applied to all cotangents (as now seems true, but wasn't when last I paid attention) then I believe it sometimes wants to allow a slightly wider set of types for dx than typeof(x) (as for 4,5,7 above). Which is encoded there by having projector(x) return a Project{T} with abstract type T, the widest deemed acceptable.

(One could of course instead make ProjectTo{T} with concrete T allow wider types for dx, but IMO the separation of concerns made things a bit clearer -- with abstract types "Project{T} produces always dx::Union{T,Zero}" is a simple description of what it does, and by running projector(x) you can see what is deemed acceptable.)

@oxinabox
Copy link
Member

oxinabox commented Jul 6, 2021

projector(x) return a Project{T} with abstract type T, the widest deemed acceptable.

That is what we have except it is spelt Projector(x) rather than projector(x).
And noting that the widest deemed acceptable is almost always the same as the type of x.
(not strictly always, e.g. Int->Float)
For appropriate definition of wide which is not really based on the type hierarchy, but a statement about the subspace.
Things need to have zeros and other structural elements the same

Thing that is missing is that we don't have the overloads (yet) to allow subspaces of your subspace not to be projected.
we should just go and add those.
It's non-breaking.

julia> ProjectTo([1,2.0]')(ones(1,2))  # fails to preserve adjoint vector
1×2 adjoint(::Matrix{Float64}) with eltype Float64:
 1.0  1.0

This seems right, what is not being preserved?

julia> ProjectTo(Symmetric(rand(2,2)))([1 2; 100 0])  # not a projection onto symmetric part
2×2 Symmetric{Float64, Matrix{Float64}}:
 1.0  2.0
 2.0  0.0

What do you mean? What output is expected?
Do you think we should be averaging them or something?

julia> ProjectTo(UpperTriangular(rand(2,2)))(Diagonal([1,2]))  # wasn't there a plan to allow smaller spaces?
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.0
     2.0

yeah, I think we should go and define a partial ordering on matrix types and allow that.

structured_mats = (:(Matrix{T}), :(UpperTriangular{T}), :(Diagonal{T}))
for ii in 1:lastindex(structured_mats)
    A = structured_mats[ii]
    for jj in i::lastindex(structured_mats)
         B = structured_mats(jj)
         @eval (::ProjectTo{$A}(dx::$B) where T) = dx
    end
end

similar for Symmetric, Diagonal, Array, Symmetric and a few others.

julia> ProjectTo(rand(2,2)')(ones(2,2))  # seems like an unnecessary matrix copy, why is this a good idea?
2×2 adjoint(::Matrix{Float64}) with eltype Float64:
1.0  1.0
1.0  1.0

This should be a noncopying operation.
https://github.com/JuliaDiff/ChainRulesCore.jl/blob/4b33290ff36d40cd2c8b9f0feb164db78ae75b72/src/projection.jl#L128-L130
It seems that it isn't (from running BenchmarkTools).
That needs some debugging.

julia> ProjectTo(true)(2.3)  # if Bool is categorical, this should be some Zero
ERROR: InexactError: Bool(2.3)

julia> ProjectTo(1)(2.3)  # but integers aren't, so this should work? 
ERROR: InexactError: Int64(2.3)

Yeah, these are probably bugs.
I think ProjectTo(x::Integer) should probably be defined as ProjectTo(float(x)).
If one is calling it, then they must be treating it as an embedded subspace of the reals.
Arguably could allow that for Bool. Almost never does one treat it that way, but if calling project then they must be.

@mcabbott
Copy link
Member

mcabbott commented Jul 6, 2021

not strictly always, e.g. Int->Float

That's why I thought it clearer that the Project{T} struct be labelled by what it is willing to produce, and the projector constructor have the logic of going from x to acceptable T. Easier to inspect and to reason about. You could build the same logic into constructors of ProjectTo, I guess that would still be pretty clear... it's building it into the methods of p::ProjectTo that would be a bit opaque, IMO, but I think you are proposing the constructor.

allow subspaces of your subspace not to be projected

But if T is concrete, it has no subspaces. So what exactly describes the subspace in question, is it just something we should remember, is there a comment, is there a definition in a test suite which will nag us later? This is why I like the abstract type idea, then it's in code. But perhaps it has other flaws I haven't seen yet.

This seems right, what is not being preserved?

The result ought to be <: AdjointAbsVec, not a matrix. Preserving this is one of the motivating cases of this whole business, I thought:

x = [1,2,3]
x' * x isa Number
ProjectTo(float(x)')(ones(1,3)) * x isa Vector

What output is expected?

I don't think that gradient(A -> A[2,1], Symmetric(rand(2,2))) should be zero. For cotangents at least, I am pretty sure you want the symmetric part of the matrix. I tried to dig into this a bit here FluxML/Zygote.jl#965 (comment), I think there are bugs in finite-differencing but didn't sort them out. (For tangents I am less sure.)

This should be a noncopying operation

But it can't be, the layout in physical memory is different. Which is why the appropriate projector for AdjointAbsMat is probably to AbstractMatrix. (Ditto PermutedDimsArray, except that eager permutedims is even more expensive.)

@oxinabox
Copy link
Member

oxinabox commented Jul 6, 2021

That's why I thought it clearer that the Project{T} struct be labelled by what it is willing to produce,

It is.
AFAICT everything is at the conceptual level as described.
Except we call it (::ProjectTo) rather than projector,
and we work with subspace rather than subtype, as subtype tells use nothing useful.

But it T is concrete, it has no subspaces.

Subspace, not subtype https://en.wikipedia.org/wiki/Linear_subspace

Diagonal is a subspace of symmetric, is a subspace of dense matrix.
Diagonal is a concrete type representing that you are in the diagonal subspace.
Symmetric is a concrete type, representing being in the Symmetric subspace.
Array is (one) concrete type representing being in a dense matrix.

Point of subspace is that it matches the structural requirements of the "super"space, and (if strict) adds some more.
The main structural requirement being zeros in the right places, but others being symmetry or skew-symmetry or conjugate symmetry etc or other things.
The mathematical definition is strong I think it might be all we need.

It captures what you were saying about "smaller" with Diagonal < Symmetric

This is why I like the abstract type idea, then it's in code

Subspaces in julia are represented with wrapper types not with type heirachy.
It's a partial ordering so I think it can't be.
Annoyingly Base.parent on a wrapper type is not certain to return the :super"space.

It will end-up in code, when we go and define the identity projections.
Possibly we should do this via defining
issubspace(A, B)::Bool for different combinations
then looping over our combinations like i showed above.

Though not all subspaces are captures in the type, SparseArrays being the example of ones that are not.
Those have to represent subspaces with the zeros where they are.
As Stefan said: "The alternative is too aweful".
But they don't really play well with anything else anyway, so it doesn't matter that it is hard to define relations for them.
Since there spaces are so arbitary. They don't even play well with themselves -- projecting one SparseArray onto another will still need to do work.

But it can't be, the layout in physical memory is different. Which is why the appropriate projector for AdjointAbsMat is probably to AbstractMatrix. (Ditto PermutedDimsArray, except that eager permutedims is even more expensive.

Right.
We could treat AbstractMatrix as in effect another representation of a dense matrix -- since we don't know that it isn't.
We can't treat it as anything else.
Dense matrixes basically being structureless.
That could be a useful way to name it, and we could name the projector for anything that is dense ProjectTo{AbstractMatrix} (including say ProjectTo(ones(2,2))).
And we should be able to have basically everything be projected without change onto it. since it is the "super"-space of everything.
Similarly we can name Adjoint{<:Array}, and PermutedDimsArray{<:Array} as ProjectTo{AbstractMatrix} since they also place no restriction on the structure..

But we can not call say Adjoint{<:UpperTriangular} this (or Adjoint{<:AbstractArray} more generally), as that does have structure that we need to keep track of.

 I don't think that `gradient(A -> A[2,1], Symmetric(rand(2,2)))` should be zero. 

Ok, yeah that sounds bad. That should be fixed if that is happening

@mcabbott
Copy link
Member

mcabbott commented Jul 6, 2021

Subspace, not subtype

I am aware! My proposal is to see whether the former could usefully be encoded in the latter. If it could, this might be a pretty scheme. As I said, maybe this has holes in it, but give me credit for having thought for more than a minute here.

Similarly we can name Adjoint{<:Array}, and PermutedDimsArray{<:Array} as ProjectTo{AbstractMatrix} since they also place no restriction on the structure

Precisely. With the one exception of adjoint vectors, my example 1 above. And maybe they preserve axes(x).

Under the abstract proposal, the projection operator for x::Diagonal{Int, Vector{Int}} might be P...{Diagonal{Real, AbstractVector{Real}}} which would allow for say Diagonal{Float64, Fill{...}}.

The projection operator for x::Symmetric{Float64} may need to be something like P...{Union{Symmetric{Real}, Hermitian{Real}, Diagonal{Real}}}. Maybe that's a bit ugly. Maybe we decide those aren't acceptable, and make it P...{Symmetric{Real}}, which can be implemented something like issymmetric(dx) ? Symmetric(dx) : Symmetric((dx .+ dx') ./ 2) --- this may produce a Symmatric(Diagonal(...)) but maybe that's OK, it still doesn't materialise.

@oxinabox
Copy link
Member

oxinabox commented Jul 6, 2021

but give me credit for having thought for more than a minute here.

Sorry, communicating in writing is hard.
You did say concrete types have no subpaces, which really made it sound like you were talking about types.

My proposal is to see whether the former could usefully be encoded in the latter.

I think we are talking about the same thing then.
I'll put up a PR to add the missing identity projections for subspaces in an hour or two.
And then we can see if we are indeed talking about the same thing.
I don't think this is at all a breaking or conceptual change to the current design.

@mcabbott
Copy link
Member

mcabbott commented Jul 6, 2021

Sure. What I'm trying to reverse-engineer is sort-of what the conceptual idea of the current design actually is. I was hoping there was a list of potentially awkward examples somewhere which guided this. It seems to always store concrete type of x, and force dx to match this precisely, which is what strikes me as too rigid; this behaviour is the opposite extreme to FluxML/Zygote.jl#965. Altering it to store abstract types like ProjectTo{AbstractMatrix{Real}} would be one way to relax this, whether it's a conceptual change or not doesn't seem important.

It seems to also store but never use a bunch of other stuff, I haven't understood the scope of what that's for (nor how to invent awkward edge cases for it).

@mcabbott
Copy link
Member

I think this can be closed. Open questions are tracked at https://github.com/JuliaDiff/ChainRulesCore.jl/labels/ProjectTo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants