-
Notifications
You must be signed in to change notification settings - Fork 14
Updated to reflect new LearnBase interface for getobs and ObsDim #50
base: master
Are you sure you want to change the base?
Conversation
You could temporarily check in a |
So one issue that I've run into with MLDataPattern.jl. There are some cases (e.g. LearnBase.getobs(data::DataSubset, idx) =
getobs(subset.data, _view(subset.indices, idx); obsdim = subset.obsdim) because it does not make sense that function LearnBase.getobs(subset::DataSubset, idx; obsdim = default_obsdim(subset))
@assert obsdim === subset.obsdim
return getobs(subset.data, _view(subset.indices, idx); obsdim = obsdim)
end Or I can just ignore Both options seem messy to me. Since the LearnBase.default_obsdim(subset::DataSubset) = default_obsdim(subset.data)
LearnBase.getobs(subset::DataSubset, idx; obsdim = default_obsdim(subset)) =
getobs(subset.data, _view(subset.indices, idx); obsdim = obsdim) There are two reasons for this:
UPDATE: I see now that perhaps the reason for storing function DataSubset{T,I,O}(data::T, indices::I,obsdim::O) where {T,I,O}
if T <: Tuple
error("inner constructor should not be called using a Tuple")
end
1 <= minimum(indices) || throw(BoundsError(data, indices))
maximum(indices) <= nobs(data, obsdim) || throw(BoundsError(data, indices))
new{T,I,O}(data, indices, obsdim)
end While this kind of check is a nice convenience, I think it just increases the code complexity of MLDataPattern.jl, and it is not necessary. If |
Sorry for the delay. Not using MLDataPattern for quite a long time, I'm not as familiar with it as you are so I don't know which is better. Generally, we want to make things simpler during this refactoring. If making it convenient for users makes it complex and thus harder to maintain for developers, I personally prefer the complex version just to make it more intuitive to use.
Given that usage like |
Hi, any update on this? When will [email protected] be supported? Can I help with this somehow? |
Yes, this PR effort is something I haven't had time to push through. I will put it together in a stage this week where someone else can take the torch (with an explanation of what needs to be done). |
For the views like |
Sounds good. Although I consider eval with loop for defining them a bit of abuse and hard to read code and I would rathere have some lightweight supertype which would be part of MLDataPattern and we would define the behavior needed for |
One of the things I would like to determine is how much of the non-getobs/nobs code can be removed (or at least is there value in keeping it). If there is a need to keep a lot of the shared interfaces, then we can define a super type. Instead of inheriting from |
So it should be |
Yeah basically if there is an abstract type like |
Can someone rebase? |
c777c32
to
e5962d5
Compare
Hi @rancimat, what is the status of this? |
@racinmat had to hand back off to me due to work commitments. I have been busy all semester, but I will work on this during break. |
Yes, in the end I had less time for this, so I did what I could, but I won't be able to finish this, sorry. |
This is a WIP in response to JuliaML/LearnBase.jl#44. The changes to the LearnBase.jl interface reduce the MLDataPattern.jl codebase significantly. Most notably, we are able to avoid a lot of the routing logic for
getobs
.Since there are other JuliaML packages that will need similar updates, I want to summarize the highlights of what this transition entails:
convert(LearnBase.ObsDimension, obsdim)
statements;obsdim
now has no type restrictions.obsdim
type. This is no longer necessary, and you can directly callselectdim(A, idx, obsdim)
from Base. In some crude testing, I found no performance regressions from doing this.selectdim
essentially does the views + colon tricks that the generated functions used to do.getobs(x)
,getobs(x, obsdim)
, andnobs(x)
from the code. The only interface functions to implement aregetobs(x, idx, obsdim)
andnobs(x, obsdim)
(note: this may change slightly based on the LearnBase.jl PR but deleting the above methods is still valid.There are still some parts I haven't addressed like
gettarget
andBatchView
. There are no technical challenges here; I just haven't gotten to them. We'll want to get those changes in too before merging.