-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Upgrade cloudpickle to > 2.0.0 #6699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @chensun |
@feizerl, it's fine to merge the PR, but I think in general pickling isn't a good idea, the result of pickling is tied to a specific Python version. And we're not offering the pickling option in sdk v2. |
@chensun I agree with you that pickling in general is not a good idea. The advantage of pickling in the context of pipelines is that it ships the module in which the function is defined as compared to not pickling option which only ships the code for the function. This means that if have the following: def foo():
...
def bar():
foo() and you change But like I said, overall I agree that pickling is not a great format since it changes very frequently. |
Maybe I missed something in your use case, but with function-based component, you never need to build an image. By default, the container image is The visual difference in the container spec is like below:
|
@chensun For simple functions, the approach is equivalent. (simple = self contained) However, we have a complex codebase split across different modules. Researchers can (and do) change code in different modules such that the function code remains same, but the modules it is using are different now. Kubeflow (and fwiw python) cannot recursively go to all modules that function is using and copy their code as well. What |
update I didn't realize that the latest Hello KFP team, This PR is listed as included in 1.8.0 according to the changelog, but I'm getting some package conflicts between kfp 1.8.1 and other packages that depend on cloudpickle>=2. It seems that kfp 1.8.1 still depends on $ pip download --no-deps kfp==1.8.1
Collecting kfp==1.8.1
Using cached kfp-1.8.1.tar.gz (248 kB)
Preparing metadata (setup.py) ... done
Saved ./kfp-1.8.1.tar.gz
Successfully downloaded kfp
$ tar xzf kfp-1.8.1.tar.gz
$ grep cloudpickle kfp-1.8.1/setup.py
'cloudpickle>=1.3.0,<2', Here's what I originally ran that led to this problem: $ python -m venv pickledemo
$ source pickledemo/bin/activate
$ pip install -U pip
$ pip install kfp==1.8.1 apache-beam==2.38.0
[...]
ERROR: Cannot install apache-beam==2.38.0 and kfp==1.8.1 because these package versions have conflicting dependencies.
The conflict is caused by:
kfp 1.8.1 depends on cloudpickle<2 and >=1.3.0
apache-beam 2.38.0 depends on cloudpickle<3 and >=2.0.0
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts Could you please advise? Thanks! |
Hello,
One of the features of pipelines is Step Caching (https://www.kubeflow.org/docs/components/pipelines/caching/) to avoid running the costly computations again and again.
The key for caching is:
When using the option
use_code_pickling
frompipelines/sdk/python/kfp/components/_python_op.py
Line 516 in 74c7773
the pickle of the function gets embedded in the
ContainerSpec
(and hence becomes part of the key).So far, all good.
However, the pickle is generated with
cloudpickle
which leads to non deterministic pickles every time you run the pipeline. As you can imagine, this makes caching feature useless because it will invalidate the cache every time it is run.This non determinism was removed from
cloudpickle
with the following commit:cloudpipe/cloudpickle#428 and released as part of
2.0.0
release:https://github.com/cloudpipe/cloudpickle/releases/tag/v2.0.0
Currently,
kfp
has bounded cloudpickle to less than v2.0.0 here:pipelines/sdk/python/setup.py
Line 37 in 74c7773
Would it be possible to make a new
kfp
release with upgraded cloudpickle? Without this cloudpickle version, step caching is currently impossible to use (or at the mercy of dictionary insertion order of cloudpickle).Thanks!
The text was updated successfully, but these errors were encountered: