-
Notifications
You must be signed in to change notification settings - Fork 176
Add pickling of dict_keys, dict_values, dict_items #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #384 +/- ##
==========================================
+ Coverage 91.40% 91.61% +0.20%
==========================================
Files 3 3
Lines 640 656 +16
Branches 134 135 +1
==========================================
+ Hits 585 601 +16
Misses 34 34
Partials 21 21
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you commented in #380 (comment), I also think it's indeed safer not to ship the full dict when shipping keys or values as sending the rest might be unintended and could potentially cause leaking of sensitive information.
I am fine with sending those as lists directly for convenience with some inline comment why we do so.
|
||
|
||
def _make_dict_keys(obj): | ||
return dict.fromkeys(obj).keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it's worth rebuilding a fake dict with None
valued values. Maybe we could just ship the list of keys and values directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason not to leave these as just a list of keys is that dict_keys objects are set-like and support set type operations: https://docs.python.org/3/tutorial/datastructures.html#dictionaries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point.
|
||
|
||
def _make_dict_values(obj): | ||
return {i: _ for i, _ in enumerate(obj)}.values() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I am not sure of any particular special properties of dict_values objects, I think the approach taken here should mirror the approach taken with dict_keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
@ogrisel, thanks for taking the time to review and providing feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution @brl0. In retrospect I am kind of ambivalent with whether or not we should merge this PR.
It seems like a nice convenience feature for cloudpickle users. However it could also cause very hard to debug side effects, e.g. if a dict_keys object and the backing dict are both shipped in the pickle, this would result in un-tying them: a modification of the dict content would not be reflected in the dict_keys
object anymore.
Maybe it's better to fail at pickle time. Most of the time the solution is to convert the dict_keys instance into a list prior to pickling. However sometimes it cannot be easily done, for instance if the keys object is stored as a private attribute of an object of a third party library.
I would appreciate the feedback from other cloudpickle contributors and users here.
@ogrisel, thanks for the feedback, I can understand and appreciate your ambivalence in this situation. I think if there were a perfect solution to this issue it would probably already be in the standard library. I encountered this issue working with Dask Distributed and raised an issue there.
Let me know if there are any changes needed to make this PR more acceptable. |
I have a better understanding of the problem thanks to your work on this PR but I don't see any further improvements from the top of my head to make it more acceptable. |
For background this has come up a handful of times in github issues for Dask. People include I agree that it doesn't seem quite right to serialize them, given their "view" nature. At the same time, I have difficulty constructing situations where this would bite people. As an example of prior-art, we already serialize Numpy views happily and no one complains. |
That's valid point. Should, I take this as a +1 for merge of this PR? It would be great to get the opinion of some Ray developers, e.g. @suquark or @pcmoritz for instance. @brl0 I am sorry but I merged a refactoring by @pierreglaser concurrently (see #368). This PR will have to be updated accordingly before being able to merge. |
+0.5 maybe. I think that it's probably a good idea, but I haven't thought
too deeply about it. It would be good to get opinions from others.
…On Wed, Jul 1, 2020 at 1:49 AM Olivier Grisel ***@***.***> wrote:
As an example of prior-art, we already serialize Numpy views happily and
no one complains.
That's valid point.
Should, I take this as a +1 for merge of this PR?
It would be great to get the opinion of some Ray developers, e.g. @suquark
<https://github.com/suquark> or @pcmoritz <https://github.com/pcmoritz>
for instance.
@brl0 <https://github.com/brl0> I am sorry but I merged a refactoring by
@pierreglaser <https://github.com/pierreglaser> concurrently (see #368
<#368>). This PR will have
to be updated accordingly before being able to merge.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#384 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTGP4AAGQJRJCVAAFM3RZL2CZANCNFSM4OD7FGQQ>
.
|
@ogrisel, I have updated this PR to merge the changes in master. Let me know if you have any concerns about accepting that I should address. |
Thanks @brl0. I would like to give a little more time for other stakeholders to react. If nobody reacts by next week I will consider merging this PR. |
Sorry to bump this PR but it'd be great to have this merged in. I understand that it's a convenience function which might cause side effects for cloudpickle, but it seems like python's dict_keys/values are causing similarly confusing side effects because of their list like behavior. In our case, we're running into use case where we need to cache/save fairly complex python objects because they aren't really serializable. There are a few devs and data scientists on the project and it's easy to run into situations where there might be an instance variable that is indirectly set to dict_keys or the sort. And of course, it all runs fine except in pickling where list like objects are cause cloudpickle to puke out, so trying to trace where that list_like codepath comes from can be fairly non-obvious |
Ok let's merge. |
I'm glad to see this. Thanks for following up @ogrisel.
…On Tue, Aug 11, 2020 at 9:12 AM Olivier Grisel ***@***.***> wrote:
Merged #384 <#384> into
master.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#384 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTBAS3W6QKZ53QVCCNLSAFUWRANCNFSM4OD7FGQQ>
.
|
2.0.0 ===== - Python 3.5 is no longer supported. - Support for registering modules to be serialised by value. This allows code defined in local modules to be serialised and executed remotely without those local modules installed on the remote machine. ([PR #417](cloudpipe/cloudpickle#417)) - Fix a side effect altering dynamic modules at pickling time. ([PR #426](cloudpipe/cloudpickle#426)) - Support for pickling type annotations on Python 3.10 as per [PEP 563]( https://www.python.org/dev/peps/pep-0563/) ([PR #400](cloudpipe/cloudpickle#400)) - Stricter parametrized type detection heuristics in _is_parametrized_type_hint to limit false positives. ([PR #409](cloudpipe/cloudpickle#409)) - Support pickling / depickling of OrderedDict KeysView, ValuesView, and ItemsView, following similar strategy for vanilla Python dictionaries. ([PR #423](cloudpipe/cloudpickle#423)) - Suppressed a source of non-determinism when pickling dynamically defined functions and handles the deprecation of co_lnotab in Python 3.10+. ([PR #428](cloudpipe/cloudpickle#428)) 1.6.0 ===== - `cloudpickle`'s pickle.Pickler subclass (currently defined as `cloudpickle.cloudpickle_fast.CloudPickler`) can and should now be accessed as `cloudpickle.Pickler`. This is the only officially supported way of accessing it. ([issue #366](cloudpipe/cloudpickle#366)) - `cloudpickle` now supports pickling `dict_keys`, `dict_items` and `dict_values`. ([PR #384](cloudpipe/cloudpickle#384)) 1.5.0 ===== - Fix a bug causing cloudpickle to crash when pickling dynamically created, importable modules. ([issue #360](cloudpipe/cloudpickle#354)) - Add optional dependency on `pickle5` to get improved performance on Python 3.6 and 3.7. ([PR #370](cloudpipe/cloudpickle#370)) - Internal refactoring to ease the use of `pickle5` in cloudpickle for Python 3.6 and 3.7. ([PR #368](cloudpipe/cloudpickle#368)) 1.4.1 ===== - Fix incompatibilities between cloudpickle 1.4.0 and Python 3.5.0/1/2 introduced by the new support of cloudpickle for pickling typing constructs. ([issue #360](cloudpipe/cloudpickle#360)) - Restore compat with loading dynamic classes pickled with cloudpickle version 1.2.1 that would reference the `types.ClassType` attribute. ([PR #359](cloudpipe/cloudpickle#359)) 1.4.0 ===== **This version requires Python 3.5 or later** - cloudpickle can now all pickle all constructs from the ``typing`` module and the ``typing_extensions`` library in Python 3.5+ ([PR #318](cloudpipe/cloudpickle#318)) - Stop pickling the annotations of a dynamic class for Python < 3.6 (follow up on #276) ([issue #347](cloudpipe/cloudpickle#347)) - Fix a bug affecting the pickling of dynamic `TypeVar` instances on Python 3.7+, and expand the support for pickling `TypeVar` instances (dynamic or non-dynamic) to Python 3.5-3.6 ([PR #350](cloudpipe/cloudpickle#350)) - Add support for pickling dynamic classes subclassing `typing.Generic` instances on Python 3.7+ ([PR #351](cloudpipe/cloudpickle#351)) 1.3.0 ===== - Fix a bug affecting dynamic modules occuring with modified builtins ([issue #316](cloudpipe/cloudpickle#316)) - Fix a bug affecting cloudpickle when non-modules objects are added into sys.modules ([PR #326](cloudpipe/cloudpickle#326)). - Fix a regression in cloudpickle and python3.8 causing an error when trying to pickle property objects. ([PR #329](cloudpipe/cloudpickle#329)). - Fix a bug when a thread imports a module while cloudpickle iterates over the module list ([PR #322](cloudpipe/cloudpickle#322)). - Add support for out-of-band pickling (Python 3.8 and later). https://docs.python.org/3/library/pickle.html#example ([issue #308](cloudpipe/cloudpickle#308)) - Fix a side effect that would redefine `types.ClassTypes` as `type` when importing cloudpickle. ([issue #337](cloudpipe/cloudpickle#337)) - Fix a bug affecting subclasses of slotted classes. ([issue #311](cloudpipe/cloudpickle#311)) - Dont pickle the abc cache of dynamically defined classes for Python 3.6- (This was already the case for python3.7+) ([issue #302](cloudpipe/cloudpickle#302)) 1.2.2 ===== - Revert the change introduced in ([issue #276](cloudpipe/cloudpickle#276)) attempting to pickle functions annotations for Python 3.4 to 3.6. It is not possible to pickle complex typing constructs for those versions (see [issue #193]( cloudpipe/cloudpickle#193)) - Fix a bug affecting bound classmethod saving on Python 2. ([issue #288](cloudpipe/cloudpickle#288)) - Add support for pickling "getset" descriptors ([issue #290](cloudpipe/cloudpickle#290)) 1.2.1 ===== - Restore (partial) support for Python 3.4 for downstream projects that have LTS versions that would benefit from cloudpickle bug fixes. 1.2.0 ===== - Leverage the C-accelerated Pickler new subclassing API (available in Python 3.8) in cloudpickle. This allows cloudpickle to pickle Python objects up to 30 times faster. ([issue #253](cloudpipe/cloudpickle#253)) - Support pickling of classmethod and staticmethod objects in python2. arguments. ([issue #262](cloudpipe/cloudpickle#262)) - Add support to pickle type annotations for Python 3.5 and 3.6 (pickling type annotations was already supported for Python 3.7, Python 3.4 might also work but is no longer officially supported by cloudpickle) ([issue #276](cloudpipe/cloudpickle#276)) - Internal refactoring to proactively detect dynamic functions and classes when pickling them. This refactoring also yields small performance improvements when pickling dynamic classes (~10%) ([issue #273](cloudpipe/cloudpickle#273)) 1.1.1 ===== - Minor release to fix a packaging issue (Markdown formatting of the long description rendered on pypi.org). The code itself is the same as 1.1.0. 1.1.0 ===== - Support the pickling of interactively-defined functions with positional-only arguments. ([issue #266](cloudpipe/cloudpickle#266)) - Track the provenance of dynamic classes and enums so as to preseve the usual `isinstance` relationship between pickled objects and their original class defintions. ([issue #246](cloudpipe/cloudpickle#246)) 1.0.0 ===== - Fix a bug making functions with keyword-only arguments forget the default values of these arguments after being pickled. ([issue #264](cloudpipe/cloudpickle#264)) 0.8.1 ===== - Fix a bug (already present before 0.5.3 and re-introduced in 0.8.0) affecting relative import instructions inside depickled functions ([issue #254](cloudpipe/cloudpickle#254)) 0.8.0 ===== - Add support for pickling interactively defined dataclasses. ([issue #245](cloudpipe/cloudpickle#245)) - Global variables referenced by functions pickled by cloudpickle are now unpickled in a new and isolated namespace scoped by the CloudPickler instance. This restores the (previously untested) behavior of cloudpickle prior to changes done in 0.5.4 for functions defined in the `__main__` module, and 0.6.0/1 for other dynamic functions. 0.7.0 ===== - Correctly serialize dynamically defined classes that have a `__slots__` attribute. ([issue #225](cloudpipe/cloudpickle#225)) 0.6.1 ===== - Fix regression in 0.6.0 which breaks the pickling of local function defined in a module, making it impossible to access builtins. ([issue #211](cloudpipe/cloudpickle#211)) 0.6.0 ===== - Ensure that unpickling a function defined in a dynamic module several times sequentially does not reset the values of global variables. ([issue #187](cloudpipe/cloudpickle#205)) - Restrict the ability to pickle annotations to python3.7+ ([issue #193]( cloudpipe/cloudpickle#193) and [issue #196]( cloudpipe/cloudpickle#196)) - Stop using the deprecated `imp` module under Python 3. ([issue #207](cloudpipe/cloudpickle#207)) - Fixed pickling issue with singleton types `NoneType`, `type(...)` and `type(NotImplemented)` ([issue #209](cloudpipe/cloudpickle#209)) 0.5.6 ===== - Ensure that unpickling a locally defined function that accesses the global variables of a module does not reset the values of the global variables if they are already initialized. ([issue #187](cloudpipe/cloudpickle#187)) 0.5.5 ===== - Fixed inconsistent version in `cloudpickle.__version__`. 0.5.4 ===== - Fixed a pickling issue for ABC in python3.7+ ([issue #180]( cloudpipe/cloudpickle#180)). - Fixed a bug when pickling functions in `__main__` that access global variables ([issue #187]( cloudpipe/cloudpickle#187)). 0.5.3 ===== - Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types ([issue #144](cloudpipe/cloudpickle#144)). - itertools objects can also pickled ([PR #156](cloudpipe/cloudpickle#156)). - `logging.RootLogger` can be also pickled ([PR #160](cloudpipe/cloudpickle#160)). 0.5.2 ===== - Fixed a regression: `AttributeError` when loading pickles that hold a reference to a dynamically defined class from the `__main__` module. ([issue #131]( cloudpipe/cloudpickle#131)). - Make it possible to pickle classes and functions defined in faulty modules that raise an exception when trying to look-up their attributes by name. 0.5.1 ===== - Fixed `cloudpickle.__version__`. 0.5.0 ===== - Use `pickle.HIGHEST_PROTOCOL` by default. 0.4.4 ===== - `logging.RootLogger` can be also pickled ([PR #160](cloudpipe/cloudpickle#160)). 0.4.3 ===== - Fixed a regression: `AttributeError` when loading pickles that hold a reference to a dynamically defined class from the `__main__` module. ([issue #131]( cloudpipe/cloudpickle#131)). - Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types. ([issue #144](cloudpipe/cloudpickle#144)) 0.4.2 ===== - Restored compatibility with pickles from 0.4.0. - Handle the `func.__qualname__` attribute. 0.4.1 ===== - Fixed a crash when pickling dynamic classes whose `__dict__` attribute was defined as a [`property`](https://docs.python.org/3/library/functions.html#property). Most notably, this affected dynamic [namedtuples](https://docs.python.org/2/library/collections.html#namedtuple-factory-function-for-tuples-with-named-fields) in Python 2. (cloudpipe/cloudpickle#113) - Cloudpickle now preserves the `__module__` attribute of functions (cloudpipe/cloudpickle#118). - Fixed a crash when pickling modules that don't have a `__package__` attribute (cloudpipe/cloudpickle#116). 0.4.0 ===== * Fix functions with empty cells * Allow pickling Logger objects * Fix crash when pickling dynamic class cycles * Ignore "None" mdoules added to sys.modules * Support WeakSets and ABCMeta instances * Remove non-standard `__transient__` support * Catch exception from `pickle.whichmodule()` 0.3.1 ===== * Fix version information and ship a changelog 0.3.0 ===== * Import submodules accessed by pickled functions * Support recursive functions inside closures * Fix `ResourceWarnings` and `DeprecationWarnings` * Assume modules with `__file__` attribute are not dynamic 0.2.2 ===== * Support Python 3.6 * Support Tornado Coroutines * Support builtin methods
Fixes #380.