-
-
Notifications
You must be signed in to change notification settings - Fork 730
pickle function call uses kwargs added in python 3.8 #3851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Oof, that sounds frustrating. My apologies.
Ideally this would be something that you were trying to do with Dask. Is it easy for you to provide us an example that someone like you would try doing normally that would fail? I ask because our normal test suite passes fine on Python 3.6 and somehow failed to trigger this failure. It would be useful to know what was going on and have a test for it. cc @jakirkham because of the pickle connection. |
Unfortunately it is not very easy to show an example. It seems, that my job usually doesn't fill the buffer and executes normally. In my use-case it is rather an edge-case, when dask uses
No worries anymore :) Thx for your empathy. |
Can you verify that all of your machines have the same version of Python
running with client.get_versions(check=True) ?
…On Wed, Jun 3, 2020 at 7:42 AM michaelnarodovitch ***@***.***> wrote:
Unfortunately it is not very easy to show an example. It seems, that my
job usually doesn't fill the buffer and executes normally. It seems to be
rather an edge-case, when dask uses buffers=buffers keyword.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3851 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTDP4WGWELYXZ72OKWDRUZOOVANCNFSM4NRVXOYQ>
.
|
Yes, all have the same versions, as they run from the same docker image: Ended up with the following workaround. It seems to help from distributed.diagnostics.plugin import WorkerPlugin
import dask
class Pickle5Hack(WorkerPlugin):
def setup(self, worker: dask.distributed.Worker):
import sys
import pickle5
sys.modules['pickle'] = pickle5
client = Client('localhost:8786')
client.register_worker_plugin(Pickle5Hack()) Thx for the great project and great documentation. |
I'm glad to see that you were able to find a solution to help work around things. Let's wait a bit to see if @jakirkham has some thoughts. If memory serves he's responsible for the pickle5 changes and may know more. |
Yeah, while I believe there could be an issue, Jim and I haven't been able to find anything and we still haven't identified a reproducer, which makes it hard to do anything helpful here. Would you be able to come up with a reproducer for the behavior @michaelnarodovitch? |
Independently we would like to get |
+1 on getting a reproducer if possible If we're unable to find out what's going on then maybe we revert the change, or apply it only for appropriate versions of Python? |
FWIW someone reported an issue with this the other day and it turned out to be some other usage error ( #3843 ). So it may not be this change per se, but instead some other upstream error that is not being handled gracefully. |
Mh, I see that this is a tricky one. I was able to reproduce on my cluster. The job forced disk spills with subsequent shuffles, which collect ~4 GB pandas dataframes on 7 GB memory nodes (to make it reproduce pretty fast, I reduced the memory of the nodes). The following stacktrace might provide more insight.
Dask: 2.17.2 Thank you so much for following up on this. |
Can you please share the code? Otherwise I'm afraid we won't be able to reproduce it. |
So I've dug into this it a bit and this is what I have come up with. Sharing the reproducer for now. Though I haven't debugged it at all yet. This passes on 2.16.0 and fails on 2.17.0 using Python 3.7. from distributed.protocol import deserialize_bytes, serialize_bytes
b = 2**27 * b"a"
deserialize_bytes(serialize_bytes(b, serializers=["pickle"])) |
Heh on the bright side we already solved this before ( #3639 ). 😄 We were just missing the test, which we now have. Will clean that up. |
This is ready for testing/review 🙂 |
Oh great that you found something :) I didn't really understand the failure mode in my use-case, and the code of my use-case was not something to share directly. |
Upgrading to I can confirm that when running the branch in #3639, I no longer have this problem! |
Pulled via |
Great, thanks for the feedback! 😄 |
The fix is in |
Thank you! It works as expected now. |
What happened:
With python 3.6.9, my job got stuck at 998/1000 and left the following log message on the worker. At this point the job didn't continue for >10 minutes. I had to stop it manually.
What you expected to happen:
My job finishes
Minimal Complete Verifiable Example:
On python 3.6, run
Anything else we need to know?:
There is a backport of pickle5 to 3.6 on https://pypi.org/project/pickle5/
I am trying to work around that problem as follows. Thx @samaust for the remark in #3843 .
Environment:
The text was updated successfully, but these errors were encountered: