Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A coercion-related memory leak #15424

Closed
simon-king-jena opened this issue Nov 15, 2013 · 29 comments
Closed

A coercion-related memory leak #15424

simon-king-jena opened this issue Nov 15, 2013 · 29 comments

Comments

@simon-king-jena
Copy link
Member

Yet another leak (this test is with #15303):

sage: import gc
sage: K = IntegerModRing(111115)
sage: C = type(K)
sage: len([1 for bla in gc.get_objects() if isinstance(bla,C)])
1
sage: K.has_coerce_map_from(ZZ)
True
sage: del K
sage: _ = gc.collect()
sage: len([1 for bla in gc.get_objects() if isinstance(bla,C)]) # no leak
0
sage: a = K.get_action(ZZ, op=operator.add, self_on_left=True)
sage: del K,a
sage: _ = gc.collect()
sage: len([1 for bla in gc.get_objects() if isinstance(bla,C)])
0
sage: K = IntegerModRing(111115)
sage: a = K.get_action(ZZ, op=operator.mul, self_on_left=True)
sage: b = ZZ.get_action(K, op=operator.mul, self_on_left=False)
sage: del K,a,b
sage: _ = gc.collect()
sage: len([1 for bla in gc.get_objects() if isinstance(bla,C)]) # no leak
0
sage: K = IntegerModRing(111115)
sage: x = K.one()
sage: del K,x
sage: _ = gc.collect()
sage: len([1 for bla in gc.get_objects() if isinstance(bla,C)]) # no leak
0
sage: K = IntegerModRing(111115)
sage: x = K.one()*2
sage: del K,x
sage: _ = gc.collect()
sage: len([1 for bla in gc.get_objects() if isinstance(bla,C)]) # LEAK
1

This is astonishing. Wouldn't one think that getting (thus, caching) a coercion or getting an action should trigger a leak, or creating an element? No, we actually need to multiply two elements to find the leak.

Surprising:

sage: K = IntegerModRing(111115)
sage: K.get_action(ZZ, op=operator.mul, self_on_left=True) is None
True
sage: ZZ.get_action(K, op=operator.mul, self_on_left=True) is None
True

Shouldn't there be an action? OK, perhaps not, since coercion is used for the multiplication:

sage: cm = sage.structure.element.get_coercion_model()
sage: cm.explain(K,ZZ, op=operator.mul)
Coercion on right operand via
    Natural morphism:
      From: Integer Ring
      To:   Ring of integers modulo 111115
Arithmetic performed after coercions.
Result lives in Ring of integers modulo 111115
Ring of integers modulo 111115

But if coercion is used, then why is establishing a coercion not enough to trigger the leak? In a new session:

sage: import gc
sage: K = IntegerModRing(111115)
sage: C = type(K)
sage: phi = K.coerce_map_from(ZZ)
sage: x = phi(2)
sage: del K,phi,x
sage: _ = gc.collect()
sage: len([1 for bla in gc.get_objects() if isinstance(bla,C)])
0

CC: @nbruin

Component: memleak

Keywords: coercion model, weak reference

Reviewer: Simon King

Issue created by migration from https://trac.sagemath.org/ticket/15424

@nbruin
Copy link
Contributor

nbruin commented Nov 16, 2013

comment:1

I tried to see where the reference can be coming from:

%cpaste
import gc
from sage.structure.coerce_dict import *
def all_referrers(c,X):
    found_IDs=set()
    R=[]
    new=gc.get_referrers(c)
    found_IDs.add(id(new))
    found_IDs.add(id(R))
    found_IDs.add(id(globals()))    
    while len(new)>0:
        r=new.pop()
        if id(r) in found_IDs or type(r) not in X:
            print "skipping",type(r)
            continue
        R.append(r)
        found_IDs.add(id(r))
        new.extend(gc.get_referrers(r))
        print "type(r)=%s len(R)=%s len(new)=%s"%(type(r),len(R),len(new))
    return R

def getR():
    K = IntegerModRing(111115)
    C = type(K)
    phi = K.coerce_map_from(ZZ)
    del phi
    #vary this line:
    x = K.one()*2
    #del K,x
    _ = gc.collect()
    X=set([list,dict,tuple,
           RingHomset_generic_with_category,
           sage.rings.finite_rings.integer_mod.Integer_to_IntegerMod,
           TripleDict, MonoDict, C])
    R=all_referrers(list(c for c in gc.get_objects() if isinstance(c,C))[0],X)
    return R
--
RingHomset_generic_with_category=type(ZZ.Hom(QQ))
R=getR()

and I found that with x=K.one()*2 there is also a TripleDict that show up. It's fairly big (44 entries) and the entries all seem to be of the form D[domain,codomain,None]=morphism from domain to codomain (or None entries). Since the morphism has a strong reference to the codomain, this would keep our ring alive [it doesn't seem like the kind of dictionary that can afford to be weak on its values]. Judging from the entries, this dictionary is a global one. I think there is a ticket somewhere that mentions the same phenomenon (also doing arithmetic), in the context of finite fields. Somewhere from the era when we started to work on #715 in earnest.

@simon-king-jena
Copy link
Member Author

comment:2

Replying to @nbruin:

and I found that with x=K.one()*2 there is also a TripleDict that show up. It's fairly big (44 entries) and the entries all seem to be of the form D[domain,codomain,None]=morphism from domain to codomain (or None entries). Since the morphism has a strong reference to the codomain, this would keep our ring alive [it doesn't seem like the kind of dictionary that can afford to be weak on its values]. Judging from the entries, this dictionary is a global one.

Hm. Is there any global TripleDict beside the one in sage.categories.homset? With grep, I found none. But in this case, the values would not be morphisms but homsets.

@simon-king-jena
Copy link
Member Author

comment:3

I just notice: #14711 is still in need of review. I am currently building the branch from there, to see if it fixes the problem from here.

@simon-king-jena
Copy link
Member Author

comment:4

Replying to @simon-king-jena:

I just notice: #14711 is still in need of review. I am currently building the branch from there, to see if it fixes the problem from here.

No, it does not.

@nbruin
Copy link
Contributor

nbruin commented Nov 16, 2013

comment:5

OK, looking at the referrers to the TripleDict I get a sage.structure.coerce.CoercionModel_cache_maps sage.structure.coerce.CoercionModel_cache_maps, so I guess it's not a "global" cache but one stored on an object that has an awfully long lifespan.

Correction: the values are either None or a pair of maps. So I guess the keys are not domain and codomain but parents of pairs of elements and the pair of maps are the ones that map into the common parent.

So my guess is that this cache is used when determining a common parent and then stores the maps needed to get to that. It seems to me we can mitigate this leak considerably if we DON'T store the maps here if we find a coercion from one into the other would do the job: instead store a symbolic "coerce_to_first" or "coerce_to_second" value in there.

In cases where there is a genuine third parent into which we're mapping, it's a little safer: then the codomain at least doesn't keep alive the keys. It's unclear to me how long we should be keeping the common overparent in that case. This cache will mean its life is bounded below by the shortest life time of the two "covered" parents.

@simon-king-jena
Copy link
Member Author

Attachment: chain.png

A reference chain preventing an integer mod ring from garbage collection

@simon-king-jena
Copy link
Member Author

comment:6

The attached picture shows that the offending reference chain starts in the module sage.functions.other (which I have never even heard of) and proceeds (via some __dict__ of 73 items) to CoercionModel_cache_maps, which then references a TripleDict.

@simon-king-jena
Copy link
Member Author

comment:7

Indeed, sage.functions.other contains the line coercion_model = sage.structure.element.get_coercion_model(). But I think keeping a reference to the coercion model must be legal. Hence I need to look further.

@simon-king-jena
Copy link
Member Author

comment:8

Ouch. I thought that in our coercion model, maps and actions are cached on the level of parents. Now I see that they additionally are cached in the coercion model!

Apparently this additional cache is not always used. For example, it is not used if you just do K.coerce_map_from(ZZ). But it seems that a coercion map is stored in the additional cache if you do a multiplication.

And then you have a strongly referenced TripleDict, a key (ZZ,K,None) and a morphism phi:ZZ->K as value. Even with #14711, phi would strongly reference K. ZZ is immortal. Hence, the callback for the item (ZZ,K,None):phi will never be called.

What shall we do about it? Isn't it the case that we are only storing maps that already are cached on the level of parents? Then it would be safe to just store a weak reference to this map, and the problem was solved.

@simon-king-jena
Copy link
Member Author

comment:9

PS: Similarly for CoercionModel_cache_maps._action_maps.

@simon-king-jena
Copy link
Member Author

comment:10

PPS: Which means that we need to make maps weakrefable (actions already are weakrefable).

@simon-king-jena
Copy link
Member Author

Branch: u/SimonKing/ticket/15424

@simon-king-jena
Copy link
Member Author

Author: Simon King

@simon-king-jena
Copy link
Member Author

Commit: b052e5e

@simon-king-jena
Copy link
Member Author

comment:12

I did not run the complete doctests, but I have added a new test that shows that I have fixed the problem. It is orthogonal to #14711.

As I said above: The idea is to only keep a weak reference from coercion model to coerce maps, which needs to make morphisms weakrefable, and which should be safe because the maps are cached on their dodomain anyway.


New commits:

[b052e5e](https://github.com/sagemath/sagetrac-mirror/commit/b052e5e)Coercion model should only store weak references to coerce maps

@simon-king-jena
Copy link
Member Author

Changed keywords from none to coercion model, weak reference

@simon-king-jena
Copy link
Member Author

comment:13

Argh. Schemes seem to hate me.

Almost always when I implement something that fixes a problem, it turns out that some examples from sage.schemes will fail, because the elliptic curve code relies on what I intended to fix.

But this time it is extreme:

sage -t src/sage/schemes/elliptic_curves/ell_rational_field.py  # 652 doctests failed

@simon-king-jena
Copy link
Member Author

Work Issues: Convince elliptic curves to not fail

@nbruin
Copy link
Contributor

nbruin commented Nov 16, 2013

comment:15

Don't store weakrefs as values in TripleDict and MonoDict. That's what we have weakvalues=True for. Or are you afraid the callback is too expensive to handle?

@simon-king-jena
Copy link
Member Author

comment:16

Replying to @nbruin:

Don't store weakrefs as values in TripleDict and MonoDict. That's what we have weakvalues=True for.

Oops, I forgot about this.

Or are you afraid the callback is too expensive to handle?

No. CoercionModel_cache_maps._coercion_maps stores tuples (namely: pairs of coercion maps, in both directions, hence, often (mor,None) or (None,mor)). We can't weakref the tuples. However, we could have a weakrefable extension type with two cdef attributes, so that we can store this instead.

@simon-king-jena
Copy link
Member Author

comment:17

Replying to @simon-king-jena:

We can't weakref the tuples. However, we could have a weakrefable extension type with two cdef attributes, so that we can store this instead.

No, this won't work, because there would be no reference back to the pair, and thus it would be immediately garbage collected.

@nbruin
Copy link
Contributor

nbruin commented Nov 16, 2013

comment:18

Found it: #14058 seems relevant. Especially read the sage-devel thread referenced there. We've thought quite a bit about this stuff before. Especially with the newly MUCH better lookup performance of TripleDict and MonoDict, we may be able to afford some more lookup indirection.

@simon-king-jena
Copy link
Member Author

comment:19

Replying to @nbruin:

Found it: #14058 seems relevant. Especially read the sage-devel thread referenced there. We've thought quite a bit about this stuff before. Especially with the newly MUCH better lookup performance of TripleDict and MonoDict, we may be able to afford some more lookup indirection.

OK, then let us see if it fixes the problem.

@simon-king-jena
Copy link
Member Author

comment:20

Yep, #14058 fixes the problem. Then I give a review to this ticket, saying that it is a duplicate and please be resolved as such.

@simon-king-jena
Copy link
Member Author

Changed work issues from Convince elliptic curves to not fail to none

@simon-king-jena
Copy link
Member Author

Changed author from Simon King to none

@simon-king-jena
Copy link
Member Author

Reviewer: Simon King

@jdemeyer
Copy link
Contributor

Changed branch from u/SimonKing/ticket/15424 to none

@jdemeyer
Copy link
Contributor

Changed commit from b052e5e to none

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants