-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improvements to serializer #21514
improvements to serializer #21514
Conversation
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @jrevels |
base/serialize.jl
Outdated
end | ||
resolve_ref_immediately(s, ndt) | ||
|
||
# deserialize_typename can assume unwrap_unionall(wrapper).parameters is set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't it also assume that the layout, mutabl, and types fields are valid, and may start propagating bad assumption about ndt
as soon as the ccall
returned the incomplete object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code as written doesn't directly assume that --- i.e. deserialize_typename
itself doesn't look at that info. We just need to make sure the type isn't used for anything serious until it's initialized. Hopefully null pointer accesses would indicate if we violate that. The boolean flags could be initialized earlier though, at least.
src/datatype.c
Outdated
@@ -79,14 +79,12 @@ jl_datatype_t *jl_new_abstracttype(jl_value_t *name, jl_datatype_t *super, jl_sv | |||
return jl_new_datatype((jl_sym_t*)name, super, parameters, jl_emptysvec, jl_emptysvec, 1, 0, 0); | |||
} | |||
|
|||
jl_datatype_t *jl_new_uninitialized_datatype(void) | |||
JL_DLLEXPORT jl_datatype_t *jl_new_uninitialized_datatype(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will also need to be exported in julia_internal.h
ref #308 |
b1b465d
to
ca310e8
Compare
Hmm very interesting failure here. Log backed up: https://gist.github.com/JeffBezanson/e47083278bf39cdfbce276a5e044055a |
From the log failure, it looks like something must have mutated some const field of a DataType type after allocating it............ |
Yeah, I could believe that. I'm convinced I can hack around it somehow. |
I'm not convinced that we should. The existence of this function means that any attempt to inspect the intermediate state of the deserializer (in the debugger, via profiling, if it has an IO error during deserialization) will immediately corrupt the entire process. From experience with solving this for the incremental serializer in dump.c, this approach is also simply not workable – the ordering of what it needs to try to do is simply not feasible to do recursively. Roughly speak, the issue is that you need to defer ever sending the Just to be clear though, I'm really happy with the rest of the changes here. I'm just not seeing the motivation for introducing the function that returns uninitialized datatypes. It just sounds really likely to introduce lots of other memory corruption and other issues. (based on my experience with having to solve the dual of this problem in dump.c) |
How does this actually happen, though? It seems that would require e.g. an instance of the new DataType to exist inside the Anyway, I can certainly put the TypeName/wrapper stuff back how it was, and keep the rest of this. |
Is this any more complicated than: type T
x::T
end The secondary problem it runs into is a case like:
which then ends up requiring deserializing the back-reference before the object itself. |
This one is just a garden variety reference cycle --- T.types points directly back to T, without needing to look at the type layout. This case actually works on this branch, but fails on master. Here's a test case (run on master):
More complicated field types could be a problem, but in all of these cyclic cases everything is a pointer, which we (somewhat by accident) correctly conclude from layout being NULL. So I think this is at least an improvement over what we have now. |
I understand that, and can make it arbitrarily more complicated if you would like. But I'm not sure that "accidentally getting this case right" is necessarily a positive feature, "when probably getting similar cases catastrophically wrong" is the tradeoff. Changing this because we accidentally getting this case right is also pretty likely to get in the way of merging #18632 (unboxing more immutables) which needs to change the implementation of the layout computation. |
Well, I don't want you to make it arbitrarily more complicated, but I'd settle for an example of a case that is catastrophically wrong. |
ca310e8
to
a89d6b3
Compare
Ok, controversial part removed. |
a89d6b3
to
f475c7d
Compare
A couple more improvements: reorganized the tags table, added a short (UInt16) backref, added short-Int64 (for Int64s with values in Int32 range), and split the Int tag to Int32- and Int64-specific ones, which will make files slightly more portable between 32- and 64-bit machines. |
f475c7d
to
9dca3ab
Compare
9dca3ab
to
28e2d51
Compare
base/serialize.jl
Outdated
|
||
const NTAGS = length(TAGS) | ||
|
||
function sertag(v::ANY) | ||
ptr = pointer_from_objref(v) | ||
ptags = convert(Ptr{Ptr{Void}}, pointer(TAGS)) | ||
@inbounds for i in 1:NTAGS | ||
ptr == unsafe_load(ptags,i) && return (i+1)%Int32 | ||
@inbounds for i in 1:(NTAGS-78) # constant ints & reserved slots never returned here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any convenient way to avoid hardcoding the 78?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I can metaprogram my way out of that.
28e2d51
to
c8a2539
Compare
@nanosoldier |
base/serialize.jl
Outdated
|
||
# tags >= this just represent themselves, their whole representation is 1 byte | ||
const VALUE_TAGS = sertag(()) | ||
const ZERO_TAG = sertag(0) | ||
const ZERO32_TAG = Int32(NTAGS-65) | ||
const ZERO64_TAG = Int64(NTAGS-32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could make use of n_int_literals here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
down with magic numbers!
c8a2539
to
bd804ee
Compare
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @jrevels |
This caused SQLite.jl to start failing its tests - https://github.com/JuliaCI/pkg.julialang.org/blob/08d51a803d9ecb17a71f2a40b1b5c0afff95f42c/logs/SQLite_0.6.log cc @quinnj Also Sugar.jl https://github.com/JuliaCI/pkg.julialang.org/blob/08d51a803d9ecb17a71f2a40b1b5c0afff95f42c/logs/Sugar_0.6.log cc @SimonDanisch |
Hmmm....I thought I removed the dependency on past serializer versions. I'll take a look; thanks for the ping. |
0.6 will necessarily change the serialization format, so I'm taking the opportunity to do some other improvements.
This improves the design a bit to remove weird hacks, fix a TODO item, and allow using backreferences for all objects, particularly Symbols and DataTypes. One common inefficiency before was serializing types repeatedly for all their instances.
Next, this preserves shared references for all mutable values. There is some potential for slowdowns there, but I think it's worth it for being both more accurate and more efficient when there are many references to a large object.
Also adds some more common values to the tags table, and adds tags for smaller string lengths and backrefs.
This changes the format, but is API compatible except in a rare corner case (when you want to define
serialize
and support shared references, but also want to use something other than your value's own type as the identifying tag).@nanosoldier
runbenchmarks("serialization", vs = ":master")