-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make cfunction optmization in codegen valid #19801
Conversation
Since the |
c904726
to
37c85de
Compare
Tests added. I'll probably leave the condition as is for now.... |
@@ -1635,7 +1635,7 @@ static jl_cgval_t emit_ccall(jl_value_t **args, size_t nargs, jl_codectx_t *ctx) | |||
frt = jl_tparam0(frt); | |||
Value *llvmf = NULL; | |||
JL_TRY { | |||
llvmf = jl_cfunction_object((jl_function_t*)f, frt, (jl_tupletype_t*)fargt); | |||
llvmf = jl_cfunction_object((jl_function_t*)f, frt, (jl_tupletype_t*)fargt, ctx->world); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also check here that min_age == max_age, to see if the backedge is right
JL_GC_PUSH1(&argt); | ||
if (jl_is_tuple(argt)) { | ||
// TODO: maybe deprecation warning, better checking | ||
argt = (jl_value_t*)jl_apply_tuple_type_v((jl_value_t**)jl_data_ptr(argt), jl_nfields(argt)); | ||
} | ||
Function *llvmf = jl_cfunction_object(f, rt, (jl_tupletype_t*)argt); | ||
Function *llvmf = jl_cfunction_object(f, rt, (jl_tupletype_t*)argt, | ||
ptls->world_age); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Is this actually right? Now that I think back on some of my original work on cfunction, I think I may have contemplated they should always appear to operate in the newest world. And that it would update the JIT code as needed to make that true. Clearly I forgot about that plan as I finished up the PR though last year. But it's a new year here now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's very important to make sure that we can still do the same cfunction
optimization that we are doing now, i.e. cfunction
on known types/functions should return a constant (or relocated) pointer. How can you do it if cfunction
always returns the function in the latest world?
Merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do it this way. (see comments)
I disagree. I think it is agreed on that trying to observe a change after |
That is usually true, which makes it a good heuristic, but I think that's just an artifact, not a principle design goal. I would instead state this principle in the reverse as "world age should not change within a dynamic scope, due to eval, or when adding a method definition, etc." However, I would also point out that To see why it should be preferred to use the newest age, consider some properties of the static compilation case:
One corollary of all this is that it should probably be hard to "capture" the world counter as a first-class value. It is just simpler to not have any constructs for explicitly capturing world age (other than a task). |
I would find it surprising if, in
results |
not entirely. that's why I wrote: "This further means that ccall(cfunction(f)) cannot be the same as f(), so that option isn't possible." |
Maybe it can't be exactly the same, but can it at least have the property described in my comment? |
Nope, that actually derives the opposite result. If
Yes, but to be more pedantic however, it's factored as |
But the world number is implicitly an argument to method lookup itself, and therefore must also be an argument to cfunction. I don't see how |
I'm not sure what you mean by "implicitly" in this context. Method lookup takes an explicit argument of the world number. It is explicitly passed to However, The world age counter is a bit different, since we specialize on it very heavily, it's not really realistic to hope for the best (too many places need to assert if it isn't set correctly.) So we need to deal with it, and so there's 3 general option categories:
Inference does a complex little dance to build equivalence classes of world ages. I don't see this argument really going anywhere though. I think this is sort of an empty argument, since it quickly becomes a tautology between the choice to do static inference vs. dynamic de-optimization (e.g. given that's how we want to get efficient, statically compile code, we specialize code on world age in this way. And because that makes it efficient, we use it to statically compile code before running.) I already mentioned, however, that the difference here is between whether the lookup result is a closure over the world age (e.g. |
Ok, I think we're getting somewhere: the difference is that with cfunction, the lookup and call are not atomic. First, looking only at my specific example:
is there anything that can happen between lines 2 and 3 that renders it invalid to look up Next, I'll grant that you might call You seem to be saying that a callback can't see the world counter in the TLS. I don't see why --- can't a callback still look at the TLS just like any other code? |
I won't against such pattern but it's worth pointing out that such implementation will almost certainly break the usage of running the function on an unmanaged thread, which is the current recommended/documented way to interact with threaded callback and is used by |
Which part are you referring to? The trampoline, or the second part of the bit you quoted? |
The trampoline I was also trying to reply to whether the usage should be supported so I copied too much..... For the usage pattern, I feel like it might be useful to have a version of |
No, in this case the trouble would be outside
You say "not easy", I say, that means have to write our IO scheduler to create a dedicated Task to run all callbacks.
That's probably true for case 3. Although it's pretty easy to optimize, since the code that will get run is a function of the cfunction signature and not the runtime context, that state is easy to track with a backedge and rewrite (dynamic patching) when required. For case 1, you either have to hard
Again, would have to rewrite the scheduler, and it'd be tricky to use
TLS is sometimes not present ( |
I think we currently only allow ccall on bitstypes loaded from constants? I think we can depend upon that not getting invalided under all of the three above scenarios. We already have to detect this situation in the To have proper unmanaged thread support, we will probably need some way of realizing a new Task on that stack. It's probably possible we could fold that into |
Isn't it true that either with or without this PR, we have the problem of the world changing between getting a pointer from |
That's correct, this just tries to better check that we have optimized the code to capture the expected world. That's also why I'm arguing that |
Ok, I think I'm starting to get a handle on this then. My understanding of your preferred approach:
If the first point is assumed, I think the rest follows pretty naturally. If you're going to enter a new dynamic scope, it might as well be in the newest world. The first point is the surprising part though. It's not immediately intuitive that entering a cfunction would be a new dynamic scope. I think in a perfect world we wouldn't want that behavior, but the argument seems to be that staying in the same dynamic scope is impossible since there's no reliable way to pass the world number through. If we could get at the world number, the cfunction could contain code to check the world number and branch out if it needs to be recompiled. But that check is a bit expensive, so we'd rather not bother. Is that a fair summary? |
Yes, that's a good summary.
This is the bit I would actually disagree with (well, except the "surprising" and "non intuitive" bits). After recently reviewing all of the places in base where we call back into Julia code, I discovered that we should always change to the newest dynamic scope. I honestly wasn't expecting that. But it looks like anywhere I didn't do this was simply an omission due to starting from the wrong assumption about this (and because those places are generally only reached while handling toplevel expressions, it's harder to construct a case where you could notice the difference). I don't know that I have a full mental model of why this seems to be the case so consistently. The best notion I have right now to explain it is that it ends up being highly desirable to distinguish between call-backs vs. call-forwards (c.f the I think that just leaves the behavior of the If this was a perfect world, I would propose a dual system, where a call (any call, not just cfunction) automatically transformed from call-forward to call-back depending on whether the function argument was derived from below or above on the call-stack. But I think that's probably mostly nonsense. |
It could also be the world on construction (or whatever captured world) since that's the only solution I can think of that doesn't break |
I believe it's because that's what every dynamic language do so doing that and not doing any static optimization would seem to be "correct" and is what most people expects the fix of #265 to do. In another world, it seems correct since most people won't want to run the old method when a newer one is available (I won't be surprised if they do in the future though). In this sense, I think all three versions could match what one expect in terms of the world the callback is executed in (basically anything newer than the one on construction should be fine). However, since most people also expect the
What's "call-forward" vs "call-back"? What's the difference between the |
That's not a good expectation, since it's neither documented nor accurate. |
x-ref my proposal for adding a Callback type with approximately identical correspondence such that |
@vtjnash: what's going on with this? |
someone should merge #20167 and close this |
The backedge in inference if obviously very pessimistic but I doubt it will cause too much trouble. Also need tests.
This makes sure that
cfunction
(orjl_function_ptr
) always returns a function pointer that captures the calling world and makes sure that both the runtime implementation and codegen agrees on this.Add to 0.6 milestone since this should be ready as is (missing tests and further optimizations) and should make it easier to handle cases like #19790 in packages.