@@ -72,19 +72,24 @@ Overriding pickle's serialization mechanism for importable constructs:
72
72
73
73
An important difference between ` cloudpickle ` and ` pickle ` is that
74
74
` cloudpickle ` can serialize a function or class ** by value** , whereas ` pickle `
75
- can only serialize it ** by reference** , e.g. by serializing its * module
76
- attribute path* (such as ` my_module.my_function ` ).
77
-
78
- By default, ` cloudpickle ` only uses serialization by value in cases where
79
- serialization by reference is usually ineffective, for example when the
80
- function/class to be pickled was constructed in an interactive Python session.
81
-
82
- Since ` cloudpickle 1.7.0 ` , it is possible to extend the use of serialization by
83
- value to functions or classes coming from ** any pure Python module** . This feature
84
- is useful when the said module is unavailable in the unpickling environment
85
- (making traditional serialization by reference ineffective). To this end,
86
- ` cloudpickle ` exposes the
87
- ` register_pickle_by_value ` /` unregister_pickle_by_value ` functions:
75
+ can only serialize it ** by reference** . Serialization by reference treats
76
+ functions and classes as attributes of modules, and pickles them through
77
+ instructions that trigger the import of their module at load time.
78
+ Serialization by reference is thus limited in that it assumes that the module
79
+ containing the function or class is available/importable in the unpickling
80
+ environment. This assumption breaks when pickling constructs defined in an
81
+ interactive session, a case that is automatically detected by ` cloudpickle ` ,
82
+ that pickles such constructs ** by value** .
83
+
84
+ Another case where the importability assumption is expected to break is when
85
+ developing a module in a distributed execution environment: the worker
86
+ processes may not have access to the said module, for example if they live on a
87
+ different machine than the process in which the module is being developed.
88
+ By itself, ` cloudpickle ` cannot detect such "locally importable" modules and
89
+ switch to serialization by value; instead, it relies on its default mode,
90
+ which is serialization by reference. However, since ` cloudpickle 1.7.0 ` , one
91
+ can explicitly specify modules for which serialization by value should be used,
92
+ using the ` register_pickle_by_value(module) ` /` /unregister_pickle(module) ` API:
88
93
89
94
``` python
90
95
>> > import cloudpickle
@@ -95,6 +100,10 @@ is useful when the said module is unavailable in the unpickling environment
95
100
>> > cloudpickle.dumps(my_module.my_function) # my_function is pickled by reference
96
101
```
97
102
103
+ Using this API, there is no need to re-install the new version of the module on
104
+ all the worker nodes nor to restart the workers: restarting the client Python
105
+ process with the new source code is enough.
106
+
98
107
Note that this feature is still ** experimental** , and may fail in the following
99
108
situations:
100
109
0 commit comments