-
Notifications
You must be signed in to change notification settings - Fork 176
FIX dont pickle __builtins__ of dynamic modules #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX dont pickle __builtins__ of dynamic modules #325
Conversation
Codecov Report
@@ Coverage Diff @@
## master #325 +/- ##
==========================================
+ Coverage 58.12% 93.03% +34.9%
==========================================
Files 2 2
Lines 855 861 +6
Branches 175 178 +3
==========================================
+ Hits 497 801 +304
+ Misses 324 37 -287
+ Partials 34 23 -11
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to add a new test that simulates the behavior of the non-picklable builtin injected by scipy.
Lets not worry about the |
OK, the only failing builds are |
I spoke too fast, there is an issue with the new test and |
See also: #316 (comment) . We need to make sure that we will not cause breakage when pickling functions that actually rely on pybind11. |
I can add a test -- my gut feeling though is that shall a function need to use this construct, then it's serialized version will contain instructionz to import the necessary scipy modules that will take care of re-populating |
Yes I would assume that one would not define new dynamic function / classes based on pybind11 at runtime in the |
Also, let's recall that before this, serializing functions relying on |
I agree. On the other hand, but what about pickleable inserted dynamically (not at import time but when calling a function for instance) into the One could argue that inserting things in the |
A minimally invasive fix for #316 would be to continue pickling a copy of the We should probably run some benchmarks if we implement this solution. |
I see. Generally, objects in the Code snippet currently breaking at unpickling time on cloudpickle master: (pickling time) In [1]: import cloudpickle
...: var = 1
...: __builtins__.__dict__['same_var'] = var
...:
...: def f():
...: return same_var
...:
...: with open('test_pk.pkl', 'wb') as fh:
...: cloudpickle.dump(f, fh) (unpickling time) In [7]: import cloudpickle
...: with open('test_pk.pkl', 'rb') as fh:
...: f = cloudpickle.load(fh)
...: f() # raises NameError |
@@ -1142,6 +1145,7 @@ def subimport(name): | |||
def dynamic_subimport(name, vars): | |||
mod = types.ModuleType(name) | |||
mod.__dict__.update(vars) | |||
mod.__dict__['__builtins__'] = builtins.__dict__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the Python docs: "most modules have the name __builtins__
made available as part of their globals. The value of __builtins__
is normally either this module or the value of this module’s __dict__
attribute"
We may want to set the '__builtins__'
entry to the type it was at unpickling time, although I have no idea whether taking this additional precaution this has an impact on runtime (either performance or simply errors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with always using a dict for now. If someone complains we can always try to improve that later.
Thanks for #325 (comment). The fact that we do not currently support restoring manually injected builtins at unpickle time in the master branch of cloudpickle makes me think that this PR should not introduce a regression. |
There is a weird test failure happening in PyPy3 in a branch of the code that is meant for Python 2 backward compat.
|
This should be fixed by #326. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I still think we should increment the second component of the version number to highlight the fact that this bugfix is deeper than usual.
Also please try to run the downstream tests on this PR to check that this does not introduce any regression. |
929621e
to
3c46b8a
Compare
All the joblib test failures seems to be caused by the fact that joblib has now a test dependency on the threadpoolctl package. The good news is that the tests of all other downstream projects pass. |
30b3ba0
to
d451f68
Compare
Closes #316
save_module
saves all entries of a dynamic module__dict__
, including the__builtins__
. If they get populated with unpickleable object,save_module
will fail. This PR proposes to discard__builtins__
at pickling time and re-create a__builtins__
entry at unpickling time using thebuiltins
module. They should correspond according to the CPython docs, although it is not guaranteed.@ogrisel