-
-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Category refinement sometimes changes the hash of parents #14471
Comments
Attachment: trac_14471_demonstrate.patch.gz Initial patch |
This comment has been minimized.
This comment has been minimized.
Initial patch |
comment:2
Attachment: debug_ignored_exceptions.patch.gz The attachment: debug_ignored_exceptions.patch is a backport from Python-3.3 to give better diagnosis of ignored exceptions. With it, I get
so the KeyError really comes from the WeakValueDictionary remove callback. |
comment:3
Adding some debugging in I haven't figured out exactly which cache in the Sage library is the culprit, maybe Simon knows? |
comment:4
My first guess would be that the cache at play is a "cached_method" or a "cached_function" cache, probably the one from
aren't quite of the shape you're finding. A quick analysis; This is
As you can see, We also see that The bug we're seeing indicates that we end up with:
Since the My guess is that this happens IN cyclic garbage removal, where callbacks do get separated from deletions. Python tries hard to avoid calling such callbacks unnecessarily or in situations where it's unsafe to do so, but perhaps cleaning up extremely complicated cyclic structures (which we do create) could fool the detection. If this is right, then these keyerrors should be relatively harmless: They would be happening on a dictionary that is slated for demolition anyway. That would suggest we could just catch and ignore the keyerror in the WeakValueDictionary removal function. It would be good to identify the scenario that justifies doing so, however. |
comment:5
I agree and verified with If I change the remove() callback to
then I get
So the weakvaluedict (self.data) is in a bad state... |
comment:6
Replying to @vbraun:
Something is going wrong of course, though. Something seems to be mutating self.data in between. The only methods that seem to be mutating the underlying Some of the "iterator" methods on |
comment:7
Replying to @nbruin:
Fair enough, but even after I catch the KeyError the I agree that this smells like something else is mutating |
gdb/cython backtrace |
Attachment: sage_crash_NCNONO.log Attachment: sage_crash_IfVSEK.log gdb/cython backtrace in a debug build |
comment:8
I've added an |
comment:9
Replying to @vbraun:
That is weird, and sounds to me like That might give you some indication of what exactly is wrong with the entry and hence where the problem might be coming from. One thing your traceback is confirming:
One possible scenario:
Your diagnosis that
(how many you ignored keyerrors you get depends a bit on memory layout etc.) This is definitely a bug in the python library. The problem is that This is now Python issue 17816. Python3 does not seem to have this problem and it may not be the problem we're running into here. |
comment:10
I've looked at the Python 3 code and there weakref keeps a separate list of keys to remove to work around this issue. Any further non-Sage related discussion should probably move to the Python bugtracker. As for Sage, should we patch Python for this? I hit this bug in #14469 and its likely that we'll trip over it again. |
comment:11
Replying to @vbraun:
I would suggest a paranoid
The I tried. Making the test
also makes the error go away (but is less paranoid than the option above), but indeed, making the test
does not!! So there is something wrong with self.data.keys(). Indeed, putting
in there makes for "ignored runtime error". On the other hand, if I run the example above with the So:
One difference between This could happen if the MRO of some class changed to insert/remove |
comment:12
Replying to @vbraun:
This entry is produced by the sage: f(x,y) = x^2+y
sage: m = matrix([[f,f*f],[f^3,f^4]]); m
[ (x, y) |--> x^2 + y (x, y) |--> (x^2 + y)^2]
[(x, y) |--> (x^2 + y)^3 (x, y) |--> (x^2 + y)^4]
sage: m(1,2)
[ 3 9]
[27 81]
sage: m(y=2,x=1)
[ 3 9]
[27 81]
sage: m(2,1)
[ 5 25]
[125 625]
+ sage: D=UniqueRepresentation.__classcall__.get_cache()
+ sage: [k for k in D.data.keys() if k not in D.data]
+ [] With the added doctest, we get different behaviour depending on whether
changes the hash of one of the key components involved; probably the "callable function ring", i.e.,
that last one's really bad! The |
comment:13
I think this illustrates the problem rather well:
We have that
Obviously, when class is not fixed, it's a bad ingredient for a hash. Given that
we do need to do something about hashing, since Given that this code stems from #5930, it predates these dynamic classes voodoo by quite a bit, so the fact that it doesn't operate well with it is a bug in the dynamic classes stuff. A job for Simon, Nicholas and the gang! Furthermore, note that EDIT: The last bit is not the case. |
comment:15
I did a very quick search for lines in source containing both |
comment:16
Isn't this super-dangerous, changing the hash can potentially give you wrong results. Also, I thought it is already wrong to make the bare class (not |
comment:17
Replying to @vbraun:
Experience shows you don't have to change hashes for that, but yes, feel free to up the severity of the ticket if you feel strongly about this.
I think the system just gets tricked into that here and I think it's unavoidable: If I remember correctly, the category initialization gets delayed for efficiency reasons. Fixing that would probably cause unacceptable loss of performance in other places. |
comment:18
Replying to @nbruin:
Is that actually true? This is about constructing a parent, after all. Though maybe somebody who actually wrote the category code could chime in and tell us what he/she envisioned and where it is documented (j/k). |
comment:19
See The difference of |
comment:20
PS: A further difference is that finding the correct category is a bit more involved than in other examples. It may be an algebra or a module, depending on whether we talk about general or square matrices. And concerning module, this involves checking whether the base ring is a field, which also tends to be slow. And all this effort would be in vain, because the category framework isn't used for matrix spaces in the elliptic curve code. |
comment:21
But the problem is |
comment:22
Replying to @vbraun:
As much as I know, I never touched
I hope so. |
Attachment: trac_14471-review.patch.gz |
comment:33
I think it would be a good idea to show that the hash actually did change. I extended the new test accordingly, in a review patch. Positive review, then! |
This comment has been minimized.
This comment has been minimized.
comment:34
No, I spoke to soon. Now, as the debug flag is set, it makes sense to run the full doc tests. After all, it could be that the debug flag uncovers a bug. |
comment:35
Doctests pass on my machine, of course |
comment:36
Replying to @vbraun:
Confirmed! |
comment:37
The PDF documentation doesn't build due to the use of single instead of double backticks:
|
Attachment: trac_14471-pdf-fix.patch.gz Initial patch |
This comment has been minimized.
This comment has been minimized.
comment:38
Fixed. |
Merged: sage-5.12.beta0 |
Some objects with dynamical classes use
hash(self.__class__)
. Since this is not an invariant of dynamical classes bad things will happen.One such instance is that, under some circumstances, a WeakValueDictionary remove callback is being called but cannot find the object in the dictionary since the hash changed.
Apply
CC: @simon-king-jena @nbruin @nthiery @hivert
Component: memleak
Author: Volker Braun
Reviewer: Simon King
Merged: sage-5.12.beta0
Issue created by migration from https://trac.sagemath.org/ticket/14471
The text was updated successfully, but these errors were encountered: