-
-
Notifications
You must be signed in to change notification settings - Fork 73
Make system UUID determination atomic #317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think this should work, as You mentioned in #316 that IP addresses can be the same between workers - but isn't the IP how |
I'd even consider that a feature, since in the future we might want to be able to bind disparate Julia clusters together, so having the UUID would let us find matching nodes across clusters.
MAC address would be a solid approach. Do we have an easy way to access that via libuv? If not, we might need to settle on IP addresses (although the approach I have here will generally work fine, since you don't typically share Also, I don't want to be in the business of parsing script outputs, so using a library is the only reasonable way forward. |
I might be wrong here, but I think that port number is used as well so that the IP:portnr tuple must be distinct. One case when IPs are not distinct are clusters where a single machine can have multiple workers. Iirc I even had to replace the default port randomization as the birthday problem makes it so that large machines have a too large chance to fail due to port collisions. |
Isn't the point of this field to have a value that's unique to the physical
machine, not the worker process?
…On Tue, Dec 14, 2021 at 3:42 PM DrChainsaw ***@***.***> wrote:
but isn't the IP how Distributed knows how to connect to each worker, so
shouldn't they be distinct?
I might be wrong here, but I think that port number is used as well so
that the IP:portnr tuple must be distinct. One case when IPs are not
distinct are clusters where a single machine can have multiple workers.
Iirc I even had to replace the default port randomization as the birthday
problem makes it so that large machines have a too large chance to fail due
to port collisions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#317 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEKNLX3P53BXD2NU2N3JR3UQ6T2VANCNFSM5KBEQW4Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Sorry, I might have misunderstood what this is about. It seemed to me that the issue was that multiple workers where trying to use the same resource and that having them use different resources would be desirable. It then seemed to me that using IP to give them different resources would not help in this case. Please ignore me if what I wrote makes no sense :) |
Yeah we need a UUID independent of IP since we might have many NICs, and the goal is to identify the machine so that you could figure out which processes are running on the same machine as you do and accelerate communication. I am still not sure that Dagger needs to worry about this and instead we should use a network library like UCX that will do this for us. |
Good point, we'd have that problem with MACs as well. The approach used in this PR is probably fine for now. |
Thanks all for the reviews!
Even if UCX can do optimized transfers via IPC, the scheduler still wants to know when those transfer types will be used, so that (in the future, at least) it can guesstimate their costs during scheduling. |
@mattwigway please review if you get the chance
Fixes #316