-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to use NullableArrays and CategoricalArrays #145
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
import Compat.String | ||
|
||
import DataStructures: OrderedDict | ||
|
||
import Base: eltype, show, convert, isascii, | ||
import Base: eltype, show, convert, isascii, isnull, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need import
for this, merely using
AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is because RCall also has a isnull method.
@test rcopy(Nullable, RObject(1)).value == 1 | ||
@test rcopy(Nullable, RObject("abc")).value == "abc" | ||
@test rcopy(RObject(Nullable(1))) == 1 | ||
@test rcopy(Nullable, RObject(Nullable(1, true))).isnull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should really avoid accessing the isnull
field directly, as it's likely to go away soon (JuliaLang/julia#18510). Use the accessor instead. (This also applies in other places.)
@test rcopy(DataArray,"c(NA,'NA')").na == @data([NA,"NA"]).na | ||
@test_throws ErrorException rcopy(DataArray,"as.factor(c('a','a','c'))") | ||
@test rcopy(PooledDataArray,"as.factor(c('a','a','c'))").pool == ["a","c"] | ||
@test rcopy(NullableArray,"c(NA,TRUE)").isnull == NullableArray([true,true], [true,false]).isnull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here for NullableArray
. You can use isequal
to test for equality of both contents and missingness, which is even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any method to compare the following withoout values
?
isequal(NullableArray([1,2,3]), collect(1:3)) # true
isequal(NullableArray([1,2,3]).values, collect(1:3)) # false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you need to do NullableArray(1:3)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isequal(NullableArray(1:3), 1:3)
or isequal(NullableArray(1:3), collect(1:3))
both return false..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant isequal(rcopy(NullableArray,"c(NA,TRUE)"), NullableArray([true,true], [true,false])and so on (i.e. convert both sides to a
NullableArray`).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops.. i see
function sexp{T<:Compat.String,N,R<:Integer}(v::$typ{T,N,R}) | ||
rv = protect(sexp(v.refs)) | ||
try | ||
for (i,isna) = enumerate(v.refs .== 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v.refs .== 0
will allocate a vector. Better do == 0
inside the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for catching it
rv[i] = naeltype(eltype(rv)) | ||
end | ||
end | ||
setattrib!(rv, Const.LevelsSymbol, sexp(v.pool.index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use CategoricalArrays.index(v)
, ordered(v)
and levels(v)
. Private fields should really not be accessed directly as they could change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CategoricalArrays.index
does not work?
julia> v = CategoricalArray(repeat(["a", "b"], inner = 5));
julia> CategoricalArrays.levels(v)
2-element Array{String,1}:
"a"
"b"
julia> CategoricalArrays.index(v)
ERROR: MethodError: no method matching index(::CategoricalArrays.CategoricalArray{String,1,UInt32})
Closest candidates are:
index(::CategoricalArrays.CategoricalPool{T,R<:Integer,V}) at /Users/Randy/.julia/v0.5/CategoricalArrays/src/pool.jl:161
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, indeed, at first I didn't intend it to be used outside of the package. For now the best approach is CategoricalArrays.index(v.pool)
. I'll try to see if I can improve the API in next releases.
TODO