cythonized pydantic objects in main cannot be pickled #408

marco-neumann-by · 2021-01-21T08:34:59Z

Abstract

The following code snipped fails with cloudpickle but works with stock pickle if pydantic is cythonized (either via a platform-specific wheel or by having cython installed when calling setup.py):

# bug.py
import cloudpickle
import pydantic
import pickle

class Bar(pydantic.BaseModel):
    a: int

pickle.loads(pickle.dumps(Bar(a=1))) # This works well
cloudpickle.loads(cloudpickle.dumps(Bar(a=1))) # This fails with the error below

When using the file via main:

$ python bug.py

The error message is:

_pickle.PicklingError: Can't pickle <cyfunction int_validator at 0x7fc6808f1040>: attribute lookup lambda12 on pydantic.validators failed

Note that the issue does NOT appear when a non-cythonized pydantic version is used.

Also note that the issue does NOT appear when the file is not __main__, for example:

$ python -c "import bug"

Environment

Linux x64
Python 3.8.6
cloudpickle 1.6.0
pydantic 1.7.3 w/ cython enabled

Technical Background

In contrast to pickle, cloudpickle pickles the actual class when it resides in __main__, see the following note in the README:

Among other things, cloudpickle supports pickling for lambda functions
along with functions and classes defined interactively in the
__main__ module (for instance in a script, a shell or a Jupyter notebook).

I THINK that might be the reason why this happens. What's somewhat weird is that the object in question is pydantic.validators.int_validator which CAN actually be pickled:

from pydantic.validators import int_validator
import cloudpickle
import pickle

# both work:
pickle.dumps(int_validator)
cloudpickle.dumps(int_validator)

References

This was first reported in #403 here.

The text was updated successfully, but these errors were encountered:

ogrisel · 2021-03-23T08:34:30Z

Could you please edit the bug report to include the full traceback?

ogrisel · 2021-03-23T08:36:21Z

Also is this problem happening with the current master branch of cloudpickle?

ogrisel · 2021-03-23T09:08:56Z

I believe this was fixed by #409 as I cannot reproduce anymore. We still need to release though.

lukasmasuch · 2021-04-24T15:37:28Z

I still get the same error using the cloudpickle version from master in Python 3.8.5:

The fix from #409 only seems to target Python version < 3.7.

kylebarron · 2021-10-04T18:44:03Z

Edited to use cloudpickle from master

This issue should be reopened.

The difference between environments and likely why @ogrisel was unable to reproduce this is because pydantic can be installed with or without Cython support. The Cython version of Pydantic is unsurprisingly significantly faster than the pure-Python version and is also the default install (at least for platforms for which wheels exist).

Here are two examples using virtualenv that should be reproducible, using the same script as @marco-neumann-by defined initially:

# example.py
import cloudpickle
import pydantic
import pickle

class Bar(pydantic.BaseModel):
    a: int

pickle.loads(pickle.dumps(Bar(a=1))) # This works well
cloudpickle.loads(cloudpickle.dumps(Bar(a=1))) # This fails with the error below

Non-cython Pydantic

Note that the --no-binary pydantic tells pip to install without any Cython files.

virtualenv .venv
source ./.venv/bin/activate
pip install git+https://github.com/cloudpipe/cloudpickle pydantic --no-binary pydantic

Here you can tell that there are no cython files:

> ls ./.venv/lib/python3.8/site-packages/pydantic/
__init__.py           datetime_parse.py     json.py               tools.py
__pycache__           decorator.py          main.py               types.py
_hypothesis_plugin.py env_settings.py       mypy.py               typing.py
annotated_types.py    error_wrappers.py     networks.py           utils.py
class_validators.py   errors.py             parse.py              validators.py
color.py              fields.py             py.typed              version.py
dataclasses.py        generics.py           schema.py

And the example passes without issue

> python example.py
> echo $?
0

Cython-based Pydantic

Now we install pydantic without use of --no-binary pydantic.

deactivate
rm -rf .venv
virtualenv .venv
source ./.venv/bin/activate
pip install git+https://github.com/cloudpipe/cloudpickle pydantic

Now you can see that there are built C libraries included with Pydantic:

> ls ./.venv/lib/python3.8/site-packages/pydantic/
__init__.cpython-38-darwin.so           json.cpython-38-darwin.so
__init__.py                             json.py
__pycache__                             main.cpython-38-darwin.so
_hypothesis_plugin.cpython-38-darwin.so main.py
_hypothesis_plugin.py                   mypy.cpython-38-darwin.so
annotated_types.cpython-38-darwin.so    mypy.py
annotated_types.py                      networks.cpython-38-darwin.so
class_validators.cpython-38-darwin.so   networks.py
class_validators.py                     parse.cpython-38-darwin.so
color.cpython-38-darwin.so              parse.py
color.py                                py.typed
dataclasses.cpython-38-darwin.so        schema.cpython-38-darwin.so
dataclasses.py                          schema.py
datetime_parse.cpython-38-darwin.so     tools.cpython-38-darwin.so
datetime_parse.py                       tools.py
decorator.cpython-38-darwin.so          types.cpython-38-darwin.so
decorator.py                            types.py
env_settings.cpython-38-darwin.so       typing.cpython-38-darwin.so
env_settings.py                         typing.py
error_wrappers.cpython-38-darwin.so     utils.cpython-38-darwin.so
error_wrappers.py                       utils.py
errors.cpython-38-darwin.so             validators.cpython-38-darwin.so
errors.py                               validators.py
fields.cpython-38-darwin.so             version.cpython-38-darwin.so
fields.py                               version.py
generics.py

And running our example again, we can see that it fails:

> python example.py
Traceback (most recent call last):
  File "example.py", line 9, in <module>
    cloudpickle.loads(cloudpickle.dumps(Bar(a=1))) # This fails with the error below
  File "/Users/kbarron/tmp/.venv/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/Users/kbarron/tmp/.venv/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
    return Pickler.dump(self, obj)
_pickle.PicklingError: Can't pickle <cyfunction int_validator at 0x101cf62b0>: attribute lookup lambda12 on pydantic.validators failed

kylebarron · 2021-10-04T18:52:48Z

Also note that the issue does NOT appear when the file is not __main__, for example:

I can also reproduce this, however:

# example.py
import cloudpickle
import pickle
from models import Bar

pickle.loads(pickle.dumps(Bar(a=1))) # This works well
cloudpickle.loads(cloudpickle.dumps(Bar(a=1))) # This fails with the error below

# models.py
import pydantic

class Bar(pydantic.BaseModel):
    a: int

This works fine, so a quick workaround is to always define Pydantic models in a separate file.

ericman93 · 2021-11-09T10:10:58Z

I'm still having this issue in cloudpickle 2.0.0
it is only working with non-cython Pydantic
And my Pydantic models declared in a separated file

crclark · 2022-04-13T16:16:32Z

@ogrisel I am also still seeing this issue in 2.0.0. The workaround in #408 (comment) works for me, but I believe this issue should be reopened.

rjurney · 2022-08-19T21:30:10Z

I have this issue with pydantic and pyspark.

../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/sql/pandas/map_ops.py:91: in mapInPandas
    udf_column = udf(*[self[col] for col in self.columns])
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/sql/udf.py:276: in wrapper
    return self(*args)
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/sql/udf.py:249: in __call__
    judf = self._judf
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/sql/udf.py:215: in _judf
    self._judf_placeholder = self._create_judf(self.func)
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/sql/udf.py:224: in _create_judf
    wrapped_func = _wrap_function(sc, func, self.returnType)
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/sql/udf.py:50: in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/rdd.py:3345: in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/serializers.py:458: in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/cloudpickle/cloudpickle_fast.py:73: in dumps
    cp.dump(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <pyspark.cloudpickle.cloudpickle_fast.CloudPickler object at 0x7ff5f0410700>
obj = (<function test_graphlet_etl.<locals>.horror_to_movie at 0x7ff5d0e81480>, StructType([StructField('entity_id', StringT...ld('length', LongType(), False), StructField('gross', LongType(), False), StructField('rating', StringType(), False)]))

    def dump(self, obj):
        try:
>           return Pickler.dump(self, obj)
E           _pickle.PicklingError: Can't pickle <cyfunction str_validator at 0x7ff5b0461220>: it's not the same object as pydantic.validators.str_validator

../../opt/anaconda3/envs/graphlet/lib/python3.10/site-packages/pyspark/cloudpickle/cloudpickle_fast.py:602: PicklingError

brettc · 2022-11-04T20:25:25Z

I've just been bitten by this. @ogrisel, can we reopen this issue? The workaround is not an option if you are defining your objects inside a jupyter notebook.

simon-mo · 2022-11-04T20:34:37Z

@brettc as a workaround, you can define custom serializers to pack and unpack pydantic objects. This might help your use case.

https://github.com/ray-project/ray/blob/eed90495cedad0dc2fb6ea6d430df61e4eac24f4/python/ray/util/serialization_addons.py#L10-L35

brettc · 2022-11-04T22:11:24Z

@simon-mo thanks for the tip -- this looks very promising! The error occurs for me when I'm using dask, so I guess you had the same issues in ray. (BTW, ray is amazing. I chose dask for this job because ray seemed like overkill).

zero1zero · 2023-02-10T01:00:55Z

I'm still struggling to find a workaround for this issue. My code is not directly defining any pydantic types (although it is used by dependent libraries).

Is there a version upgrade/downgrade that might be the cause? Unclear on where the actual issue is occuring. In my case it looks to be in the chain of uvicorn and kserve:

Traceback (most recent call last):
  File "/.asdf/installs/python/3.9.11/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/.asdf/installs/python/3.9.11/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/Library/Caches/pypoetry/virtualenvs/truss-FUoNelHr-py3.9/lib/python3.9/site-packages/kserve/model_server.py", line 275, in servers_task
    await asyncio.gather(*servers)
  File "/Library/Caches/pypoetry/virtualenvs/truss-FUoNelHr-py3.9/lib/python3.9/site-packages/kserve/model_server.py", line 269, in serve
    server.start()
  File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/.asdf/installs/python/3.9.11/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <cyfunction str_validator at 0x16b57c790>: it's not the same object as pydantic.validators.str_validator

dumitrescustefan · 2023-06-19T16:56:30Z

This still happens. I have to define pydantic models in another file, otherwise I get this error. Even in a simple file where I define a pydantic param class and a Ray actor with a single method, this happens. Using the latest ray, pydantic, etc.

lesteve · 2023-12-05T15:43:41Z

I agree this issue still exists and I believe it is actually fixed in pydantic 2.5 (see issue and PR) if you run your script with Python. An issue still exists inside Jupyter/IPython pydantic/pydantic#8232.

If you get a similar error like the one below, it likely means your are using pydantic<2 and I would say this is not super likely to get fixed in pydantic (see https://docs.pydantic.dev/latest/version-policy/#pydantic-v1):

_pickle.PicklingError: Can't pickle <cyfunction int_validator at 0x7f5cb91e01e0>: it's not the same object as pydantic.validators.int_validator

In this case, the simplest work-around seems to define your pydantic model in a separate file as noted in #408 (comment)

rjurney · 2023-12-14T00:48:06Z

Can someone remind me of what it means if this is fixed? I think it means Spark can serialize numpy arrays?

marco-neumann-by mentioned this issue Jan 21, 2021

pydantic BaseModel cannot be pickled w/ cloudpickle >= 1.3 #403

Closed

marco-neumann-by changed the title ~~cythonized pydantic objects cannot be pickled~~ cythonized pydantic objects in __main__ cannot be pickled Feb 26, 2021

suquark mentioned this issue Mar 16, 2021

Update cloudpickle to commit 6e0f571 ray-project/ray#14693

Merged

6 tasks

ogrisel closed this as completed Mar 23, 2021

richardliaw mentioned this issue Jan 25, 2023

[Serialization] Cannot serialize Frozen Pydantic Model ray-project/ray#29826

Closed

jas-ho mentioned this issue Jul 29, 2023

POC hydra-intergration [do not merge] acsresearch/interlab#30

Open

shrekris-anyscale mentioned this issue Sep 26, 2023

[Serve] Make Ray Serve compatible with Pydantic 1.10.x ray-project/ray#39864

Closed

lesteve mentioned this issue Dec 5, 2023

PicklingError when using some type-annotated function joblib/joblib#1528

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cythonized pydantic objects in main cannot be pickled #408

cythonized pydantic objects in main cannot be pickled #408

marco-neumann-by commented Jan 21, 2021 •

edited

Loading

ogrisel commented Mar 23, 2021

ogrisel commented Mar 23, 2021

ogrisel commented Mar 23, 2021

lukasmasuch commented Apr 24, 2021

kylebarron commented Oct 4, 2021 •

edited

Loading

kylebarron commented Oct 4, 2021

ericman93 commented Nov 9, 2021

crclark commented Apr 13, 2022

rjurney commented Aug 19, 2022

brettc commented Nov 4, 2022

simon-mo commented Nov 4, 2022

brettc commented Nov 4, 2022

zero1zero commented Feb 10, 2023

dumitrescustefan commented Jun 19, 2023

lesteve commented Dec 5, 2023 •

edited

Loading

rjurney commented Dec 14, 2023

cythonized pydantic objects in __main__ cannot be pickled #408

cythonized pydantic objects in __main__ cannot be pickled #408

Comments

marco-neumann-by commented Jan 21, 2021 • edited Loading

Abstract

Environment

Technical Background

References

ogrisel commented Mar 23, 2021

ogrisel commented Mar 23, 2021

ogrisel commented Mar 23, 2021

lukasmasuch commented Apr 24, 2021

kylebarron commented Oct 4, 2021 • edited Loading

Non-cython Pydantic

Cython-based Pydantic

kylebarron commented Oct 4, 2021

ericman93 commented Nov 9, 2021

crclark commented Apr 13, 2022

rjurney commented Aug 19, 2022

brettc commented Nov 4, 2022

simon-mo commented Nov 4, 2022

brettc commented Nov 4, 2022

zero1zero commented Feb 10, 2023

dumitrescustefan commented Jun 19, 2023

lesteve commented Dec 5, 2023 • edited Loading

rjurney commented Dec 14, 2023

cythonized pydantic objects in main cannot be pickled #408

cythonized pydantic objects in main cannot be pickled #408

marco-neumann-by commented Jan 21, 2021 •

edited

Loading

kylebarron commented Oct 4, 2021 •

edited

Loading

lesteve commented Dec 5, 2023 •

edited

Loading