Skip to content

Commit ea9e8f3

Browse files
committed
[SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
### What changes were proposed in this pull request? This PR aims to upgrade PySpark's embedded cloudpickle to the latest cloudpickle v1.5.0 (See https://github.com/cloudpipe/cloudpickle/blob/v1.5.0/cloudpickle/cloudpickle.py) ### Why are the changes needed? There are many bug fixes. For example, the bug described in the JIRA: dill unpickling fails because they define `types.ClassType`, which is undefined in dill. This results in the following error: ``` Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 279, in loads return dill.loads(s) File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in loads return load(file, ignore) File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in load obj = pik.load() File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 577, in _load_type return _reverse_typemap[name] KeyError: 'ClassType' ``` See also cloudpipe/cloudpickle#82. This was fixed for cloudpickle 1.3.0+ (cloudpipe/cloudpickle#337), but PySpark's cloudpickle.py doesn't have this change yet. More notably, now it supports C pickle implementation with Python 3.8 which hugely improve performance. This is already adopted in another project such as Ray. ### Does this PR introduce _any_ user-facing change? Yes, as described above, the bug fixes. Internally, users also could leverage the fast cloudpickle backed by C pickle. ### How was this patch tested? Jenkins will test it out. Closes #29114 from HyukjinKwon/SPARK-32094. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
1 parent 9747e8f commit ea9e8f3

File tree

9 files changed

+1602
-1366
lines changed

9 files changed

+1602
-1366
lines changed

Diff for: LICENSE

+1-1
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ BSD 3-Clause
229229
------------
230230

231231
python/lib/py4j-*-src.zip
232-
python/pyspark/cloudpickle.py
232+
python/pyspark/cloudpickle/*.py
233233
python/pyspark/join.py
234234
core/src/main/resources/org/apache/spark/ui/static/d3.min.js
235235

Diff for: dev/.rat-excludes

+2-2
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,8 @@ jsonFormatter.min.js
4747
.*json
4848
.*data
4949
.*log
50-
pyspark-coverage-site/
51-
cloudpickle.py
50+
pyspark-coverage-site/*
51+
cloudpickle/*
5252
heapq3.py
5353
join.py
5454
SparkExprTyper.scala

Diff for: dev/tox.ini

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,4 @@
1616
[pycodestyle]
1717
ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504
1818
max-line-length=100
19-
exclude=cloudpickle.py,heapq3.py,shared.py,python/docs/conf.py,work/*/*.py,python/.eggs/*,dist/*,.git/*
19+
exclude=python/pyspark/cloudpickle/*.py,heapq3.py,shared.py,python/docs/conf.py,work/*/*.py,python/.eggs/*,dist/*,.git/*

0 commit comments

Comments
 (0)