Skip to content

Commit 1e9a48d

Browse files
committed
DOC more in-depth context description in README
1 parent e4ca37d commit 1e9a48d

File tree

1 file changed

+22
-13
lines changed

1 file changed

+22
-13
lines changed

Diff for: README.md

+22-13
Original file line numberDiff line numberDiff line change
@@ -72,19 +72,24 @@ Overriding pickle's serialization mechanism for importable constructs:
7272

7373
An important difference between `cloudpickle` and `pickle` is that
7474
`cloudpickle` can serialize a function or class **by value**, whereas `pickle`
75-
can only serialize it **by reference**, e.g. by serializing its *module
76-
attribute path* (such as `my_module.my_function`).
77-
78-
By default, `cloudpickle` only uses serialization by value in cases where
79-
serialization by reference is usually ineffective, for example when the
80-
function/class to be pickled was constructed in an interactive Python session.
81-
82-
Since `cloudpickle 1.7.0`, it is possible to extend the use of serialization by
83-
value to functions or classes coming from **any pure Python module**. This feature
84-
is useful when the said module is unavailable in the unpickling environment
85-
(making traditional serialization by reference ineffective). To this end,
86-
`cloudpickle` exposes the
87-
`register_pickle_by_value`/`unregister_pickle_by_value` functions:
75+
can only serialize it **by reference**. Serialization by reference treats
76+
functions and classes as attributes of modules, and pickles them through
77+
instructions that trigger the import of their module at load time.
78+
Serialization by reference is thus limited in that it assumes that the module
79+
containing the function or class is available/importable in the unpickling
80+
environment. This assumption breaks when pickling constructs defined in an
81+
interactive session, a case that is automatically detected by `cloudpickle`,
82+
that pickles such constructs **by value**.
83+
84+
Another case where the importability assumption is expected to break is when
85+
developing a module in a distributed execution environment: the worker
86+
processes may not have access to the said module, for example if they live on a
87+
different machine than the process in which the module is being developed.
88+
By itself, `cloudpickle` cannot detect such "locally importable" modules and
89+
switch to serialization by value; instead, it relies on its default mode,
90+
which is serialization by reference. However, since `cloudpickle 1.7.0`, one
91+
can explicitly specify modules for which serialization by value should be used,
92+
using the `register_pickle_by_value(module)`/`/unregister_pickle(module)` API:
8893

8994
```python
9095
>>> import cloudpickle
@@ -95,6 +100,10 @@ is useful when the said module is unavailable in the unpickling environment
95100
>>> cloudpickle.dumps(my_module.my_function) # my_function is pickled by reference
96101
```
97102

103+
Using this API, there is no need to re-install the new version of the module on
104+
all the worker nodes nor to restart the workers: restarting the client Python
105+
process with the new source code is enough.
106+
98107
Note that this feature is still **experimental**, and may fail in the following
99108
situations:
100109

0 commit comments

Comments
 (0)