Skip to content

FIX dont pickle __builtins__ of dynamic modules #325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jan 29, 2020

Conversation

pierreglaser
Copy link
Member

Closes #316
save_module saves all entries of a dynamic module __dict__, including the __builtins__. If they get populated with unpickleable object, save_module will fail. This PR proposes to discard __builtins__ at pickling time and re-create a __builtins__ entry at unpickling time using the builtins module. They should correspond according to the CPython docs, although it is not guaranteed.

@ogrisel

@codecov
Copy link

codecov bot commented Jan 20, 2020

Codecov Report

Merging #325 into master will increase coverage by 34.9%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #325      +/-   ##
==========================================
+ Coverage   58.12%   93.03%   +34.9%     
==========================================
  Files           2        2              
  Lines         855      861       +6     
  Branches      175      178       +3     
==========================================
+ Hits          497      801     +304     
+ Misses        324       37     -287     
+ Partials       34       23      -11
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 92.06% <100%> (+12.54%) ⬆️
cloudpickle/cloudpickle_fast.py 95.67% <100%> (+95.67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e57a45...4e07d81. Read the comment docs.

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add a new test that simulates the behavior of the non-picklable builtin injected by scipy.

@pierreglaser
Copy link
Member Author

pierreglaser commented Jan 20, 2020

Lets not worry about the PyPy failure for now, I found out what was the reason, and have opened #326 to fix it.

@pierreglaser
Copy link
Member Author

OK, the only failing builds are Python3 + windows and pypy3, and the reasons are unrelated to this PR.

@pierreglaser
Copy link
Member Author

I spoke too fast, there is an issue with the new test and pypy. Let me check.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 24, 2020

See also: #316 (comment) . We need to make sure that we will not cause breakage when pickling functions that actually rely on pybind11.

@pierreglaser
Copy link
Member Author

pierreglaser commented Jan 24, 2020

We need to make sure that we will not cause breakage when pickling functions that actually rely on pybind11.

I can add a test -- my gut feeling though is that shall a function need to use this construct, then it's serialized version will contain instructionz to import the necessary scipy modules that will take care of re-populating __builtins__ for us (c.f our IRL discussion)

@ogrisel
Copy link
Contributor

ogrisel commented Jan 24, 2020

Yes I would assume that one would not define new dynamic function / classes based on pybind11 at runtime in the __main__ module but those function would be importable by name from their module. But as I am not familiar with the inner works of pybind11 I am worried we might miss some edge cases.

@pierreglaser
Copy link
Member Author

pierreglaser commented Jan 24, 2020

Also, let's recall that before this, serializing functions relying on pybind11 and PyCapsule objects would simply fail (PyCapsule objects being unpicklable). So I am not sure if we will really introduce any regression with this.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 24, 2020

I agree.

On the other hand, but what about pickleable inserted dynamically (not at import time but when calling a function for instance) into the __builtins__ dict? We used to support this but now dynamic functions/classes that relied on this will be broken.

One could argue that inserting things in the __builtins__ dynamically is bad development practice but I wonder if we will not break legitimate use cases (e.g. profiler tools maybe?).

@ogrisel
Copy link
Contributor

ogrisel commented Jan 24, 2020

@ogrisel
Copy link
Contributor

ogrisel commented Jan 24, 2020

A minimally invasive fix for #316 would be to continue pickling a copy of the __builtins__ dict for which we would have filtered out the values that are PyCapsule instances.

We should probably run some benchmarks if we implement this solution.

@pierreglaser
Copy link
Member Author

pierreglaser commented Jan 24, 2020

On the other hand, but what about pickleable inserted dynamically (not at import time but when calling a function for instance) into the builtins dict? We used to support this but now dynamic functions/classes that relied on this will be broken.

I see. Generally, objects in the __builtins__ are not first-class citizens in cloudpickle though. For instance, we don't consider them when trying to extract the globals of a function. __builtins__ pickling happens when pickling dynamic modules, but not sure where it can happen anywhere else. I might be missing something though.

Code snippet currently breaking at unpickling time on cloudpickle master:

(pickling time)

In [1]: import cloudpickle
   ...: var = 1
   ...: __builtins__.__dict__['same_var'] = var
   ...:
   ...: def f():
   ...:     return same_var
   ...:
   ...: with open('test_pk.pkl', 'wb') as fh:
   ...:     cloudpickle.dump(f, fh)

(unpickling time)

In [7]: import cloudpickle
   ...: with open('test_pk.pkl', 'rb') as fh:
   ...:     f = cloudpickle.load(fh)
   ...: f()  # raises NameError

@@ -1142,6 +1145,7 @@ def subimport(name):
def dynamic_subimport(name, vars):
mod = types.ModuleType(name)
mod.__dict__.update(vars)
mod.__dict__['__builtins__'] = builtins.__dict__
Copy link
Member Author

@pierreglaser pierreglaser Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the Python docs: "most modules have the name __builtins__ made available as part of their globals. The value of __builtins__ is normally either this module or the value of this module’s __dict__ attribute"

We may want to set the '__builtins__' entry to the type it was at unpickling time, although I have no idea whether taking this additional precaution this has an impact on runtime (either performance or simply errors)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with always using a dict for now. If someone complains we can always try to improve that later.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 26, 2020

Thanks for #325 (comment). The fact that we do not currently support restoring manually injected builtins at unpickle time in the master branch of cloudpickle makes me think that this PR should not introduce a regression.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 26, 2020

There is a weird test failure happening in PyPy3 in a branch of the code that is meant for Python 2 backward compat.

https://travis-ci.org/cloudpipe/cloudpickle/jobs/641459853?utm_medium=notification&utm_source=github_status

_______________________ CloudPickleTest.test_namedtuple ________________________

self = <tests.cloudpickle_test.CloudPickleTest testMethod=test_namedtuple>

    def test_namedtuple(self):

        MyTuple = collections.namedtuple('MyTuple', ['a', 'b', 'c'])

        t1 = MyTuple(1, 2, 3)

        t2 = MyTuple(3, 2, 1)

    

        depickled_t1, depickled_MyTuple, depickled_t2 = pickle_depickle(

>           [t1, MyTuple, t2], protocol=self.protocol)

tests/cloudpickle_test.py:1168: 

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

tests/cloudpickle_test.py:67: in pickle_depickle

    return pickle.loads(cloudpickle.dumps(obj, protocol=protocol))

cloudpickle/cloudpickle.py:1129: in dumps

    cp.dump(obj)

cloudpickle/cloudpickle.py:485: in dump

    return Pickler.dump(self, obj)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:408: in dump

    self.save(obj)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:475: in save

    f(self, obj) # Call unbound method with explicit self

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:774: in save_list

    self._batch_appends(obj)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:798: in _batch_appends

    save(x)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:520: in save

    self.save_reduce(obj=obj, *rv)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:598: in save_reduce

    save(cls)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:475: in save

    f(self, obj) # Call unbound method with explicit self

cloudpickle/cloudpickle.py:881: in save_global

    self.save_dynamic_class(obj)

cloudpickle/cloudpickle.py:690: in save_dynamic_class

    save(clsdict)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:475: in save

    f(self, obj) # Call unbound method with explicit self

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:818: in save_dict

    self._batch_setitems(obj.items())

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:844: in _batch_setitems

    save(v)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:475: in save

    f(self, obj) # Call unbound method with explicit self

cloudpickle/cloudpickle.py:958: in save_classmethod

    self.save_reduce(type(obj), (orig_func,), obj=obj)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:603: in save_reduce

    save(args)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:475: in save

    f(self, obj) # Call unbound method with explicit self

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:729: in save_tuple

    save(element)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:475: in save

    f(self, obj) # Call unbound method with explicit self

cloudpickle/cloudpickle.py:560: in save_function

    return self.save_function_tuple(obj)

cloudpickle/cloudpickle.py:762: in save_function_tuple

    save(state)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:475: in save

    f(self, obj) # Call unbound method with explicit self

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:818: in save_dict

    self._batch_setitems(obj.items())

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:844: in _batch_setitems

    save(v)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:475: in save

    f(self, obj) # Call unbound method with explicit self

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:729: in save_tuple

    save(element)

/opt/python/pypy3.5-5.8.0/lib-python/3/pickle.py:475: in save

    f(self, obj) # Call unbound method with explicit self

cloudpickle/cloudpickle.py:555: in save_function

    if _is_global(obj, name=name):

cloudpickle/cloudpickle.py:197: in _is_global

    if _is_dynamic(module):

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

module = (<coverage.debug.DebugOutputFile object at 0x00007f898f0bf788>, False)

    def _is_dynamic(module):

        """

        Return True if the module is special module that cannot be imported by its

        name.

        """

        # Quick check: module that have __file__ attribute are not dynamic modules.

        if hasattr(module, '__file__'):

            return False

    

        if hasattr(module, '__spec__'):

            if module.__spec__ is not None:

                return False

    

            # In PyPy, Some built-in modules such as _codecs can have their

            # __spec__ attribute set to None despite being imported.  For such

            # modules, the ``_find_spec`` utility of the standard library is used.

            parent_name = module.__name__.rpartition('.')[0]

            if parent_name:  # pragma: no cover

                # This code handles the case where an imported package (and not

                # module) remains with __spec__ set to None. It is however untested

                # as no package in the PyPy stdlib has __spec__ set to None after

                # it is imported.

                try:

                    parent = sys.modules[parent_name]

                except KeyError:

                    msg = "parent {!r} not in sys.modules"

                    raise ImportError(msg.format(parent_name))

                else:

                    pkgpath = parent.__path__

            else:

                pkgpath = None

            return _find_spec(module.__name__, pkgpath, module) is None

    

        else:

            # Backward compat for Python 2

            import imp

            try:

                path = None

>               for part in module.__name__.split('.'):

E               AttributeError: 'tuple' object has no attribute '__name__'

cloudpickle/cloudpickle.py:1394: AttributeError

@pierreglaser
Copy link
Member Author

This should be fixed by #326.

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I still think we should increment the second component of the version number to highlight the fact that this bugfix is deeper than usual.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 26, 2020

Also please try to run the downstream tests on this PR to check that this does not introduce any regression.

@pierreglaser pierreglaser force-pushed the dynamic-module-builtins branch from 929621e to 3c46b8a Compare January 26, 2020 18:22
@ogrisel
Copy link
Contributor

ogrisel commented Jan 27, 2020

All the joblib test failures seems to be caused by the fact that joblib has now a test dependency on the threadpoolctl package.

The good news is that the tests of all other downstream projects pass.

@pierreglaser pierreglaser force-pushed the dynamic-module-builtins branch from 30b3ba0 to d451f68 Compare January 29, 2020 20:18
@pierreglaser pierreglaser merged commit cab41c1 into cloudpipe:master Jan 29, 2020
@pierreglaser pierreglaser deleted the dynamic-module-builtins branch January 29, 2020 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants