-
Notifications
You must be signed in to change notification settings - Fork 176
Improve dynamic module recognition #357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve dynamic module recognition #357
Conversation
Codecov Report
@@ Coverage Diff @@
## master #357 +/- ##
==========================================
- Coverage 92.95% 92.91% -0.04%
==========================================
Files 2 2
Lines 809 805 -4
Branches 164 164
==========================================
- Hits 752 748 -4
Misses 29 29
Partials 28 28
Continue to review full report at Codecov.
|
@henryiii this should hopefully fix the bug you reported. On my machine I get: In [1]: import boost_histogram._core.hist as h
In [2]: h
Out[2]: <module 'boost_histogram._core.hist'>
In [3]: from cloudpickle import loads, dumps
In [4]: loads(dumps(h))
Out[4]: <module 'boost_histogram._core.hist'>
In [5]: assert loads(dumps(h)) is h If you can, feel free to checkout this PR locally to ensure this fixes your bug. |
I'll check this within a few hours, thank you! |
Looks good to me, import boost_histogram as bh
import cloudpickle
h = bh.Histogram(bh.axis.Regular(50, 0, 20))
h.fill([1 ,2 , 3, 4, 5])
cloudpickle.dumps(h) works on this branch! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick comment in passing:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass of comments and suggestions for improvements.
mmh, I just realised the importability via module attribute lookupassociated semantic importability using module-specific import machinery:associated semantic: This PR actually confuses the two: |
Indeed, it would be better to have the code make the distinction between the 2 notions and only check for the one needed where appropriate. |
What's the status of this? I could put in a hack in boost-histogram that would get this working, but I'd rather it be fixed properly for everyone. |
fb6d34a
to
da60d2d
Compare
rebased. |
By iterating a bit more and adding some test cases, my current opinion on this PR has a little bit changed. If #354 appeared in the first place, it is because As of now The more I think about this, the more I believe WDYT @ogrisel? FYI: I tried having |
The |
pinging downstream libraries developers: @suquark @mrocklin @HyukjinKwon @ogrisel - I would like to have your opinion before merging this: This PR proposes to simplify Does the assumption |
I personally don't have much experience here. I don't often run into cases
where people try to serialize dynamically generated modules.
…On Mon, May 25, 2020 at 7:17 AM Pierre Glaser ***@***.***> wrote:
pinging downstream libraries developers: @suquark
<https://github.com/suquark> @mrocklin <https://github.com/mrocklin>
@HyukjinKwon <https://github.com/HyukjinKwon> @ogrisel
<https://github.com/ogrisel> - I would like to have your opinion before
merging this:
This PR proposes to simplify module pickling; when cloudpickle is asked
to pickle a module, it checks if this module is importable (and not
dynamically created) by looking for it inside sys.modules, and if so,
cloudpickle pickle the said module by writing a sequence of instruction
summing up to import <module> in the pickle string.
Previously, cloudpickle module's importability check was way more
complicated than simply looking inside sys.modules. But my opinion is
that looking inside sys.modules should be simpler, and more failure-proof.
Does the assumption module in sys.modules.values() <=> module is
importable sound OK to you? Or is there any use case of yours where this
assumption break (for instance knowingly removing an importable module from
sys.modules)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#357 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTDJUMDQQ2XYXBOLJLDRTJ4YRANCNFSM4LZA5QUA>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I agree that just checking sys,modules
to known whether a module should be importable or not should be a good enough heuristic.
Code that generates non-importable dynamic modules registered in sys.modules
is probably pathological and I don't think we should try to support it.
if _is_dynamic(obj): | ||
if _is_importable(obj): | ||
return subimport, (obj.__name__,) | ||
else: | ||
obj.__dict__.pop('__builtins__', None) | ||
return dynamic_subimport, (obj.__name__, vars(obj)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be named to _make_dynamic_module
as we actually do not import the module at all.
We should still keep dynamic_subimport
as an alias for _make_dynamic_module
to preserve backward compat with old pickle files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subimport
function could also be renamed to _import_module
to be more explicit. But we would also need to keep an alias named subimport
pointing to _import_module
for backward compat as well.
#377 confirms that the failure observed in the distributed tests ("OSError: [Errno 101] Network is unreachable") is not related to the changes in this PR. |
@pierreglaser I put a comment on #357 (comment) but as this name was not introduced in this PR, I am fine with not addressing it here. Could you please add an entry to the changelog before merging this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge!
This makes sense. I investigated a related issue recently where modules are imported from a source file. If you strictly follow the importlib package doc to import a source file you add the module to |
Fixes #354
To emulate package capabilities while being a single file, extension modules (for instance a
mod.so
file) can dynamically create a module object (most likely using the*package_name*.*parent_module_name*.*submodule_name*
naming convention to be compatible withimport
semantics).Internally, they will use the
PyImport_AddModule(submodule_qualified_name
), which creates a module object and adds it tosys.package
. Such a module is a dynamic module, but is also importable, which shows that, despite whatcloudpickle
's logic seems to imply, those are not mutually exclusive attributes.This PR adds support to treat these dynamic modules as importable. We probably should rename
_is_dynamic
to_is_importable
, as the latter quality is really what matters here.