Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize remotecall pattern used in distributed loading #21745

Merged
merged 2 commits into from
May 11, 2017

Conversation

amitmurthy
Copy link
Contributor

@amitmurthy amitmurthy commented May 8, 2017

Came across this possible optimization while I was looking at the messages sent between workers in #21718

Currently the fetch/wait calls are 1) executed serially and 2) are an extra network call as compared with using a @sync for p in pids; @async remotecall_fetch.... pattern.

Have replaced the @spawnat followed by fetch/wait calls with an async remotecall_fetch

@amitmurthy amitmurthy requested a review from vtjnash May 8, 2017 17:15
base/loading.jl Outdated
for (id, ref) in refs
m = fetch(ref)

results = Any[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizehint!(Vector{Tuple{Int,Any}}(), nprocs())?

Looks like others is not required, should be ok to just skip myid() before @async block. One allocation less :)

@vtjnash
Copy link
Member

vtjnash commented May 8, 2017

I'm not sure I follow. Why is @sync for p in pids; @async remotecall_fetch.... more efficient than fetch.([@async remotecall.... for p in pids])? Is the idea that the former runs the task management locally, whereas the latter runs it distributed?

@amitmurthy
Copy link
Contributor Author

for pids (2,3,4,5) :
@sync for p in pids; @async remotecall_fetch.... results in

Message remotecall_fetch from 1 to 2
Message remotecall_fetch from 1 to 3
Message remotecall_fetch from 1 to 4
Message remotecall_fetch from 1 to 5

# responses can come back in any order

Message remotecall_fetch_response from 3 to 1
Message remotecall_fetch_response from 2 to 1
Message remotecall_fetch_response from 4 to 1
Message remotecall_fetch_response from 5 to 1

fetch.([@async remotecall.... for p in pids]) results in

Message remotecall from 1 to 2
Message remotecall from 1 to 3
Message remotecall from 1 to 4
Message remotecall from 1 to 5

# The above calls return a future that are then "fetched" serially

Message fetch from 1 to 2
Message fetch_response from 2 to 1
Message fetch from 1 to 3
Message fetch_response from 3 to 1
Message fetch from 1 to 4
Message fetch_response from 4 to 1
Message fetch from 1 to 5
Message fetch_response from 5 to 1

In the first case, the results of the remotecall_fetch are pushed asynchronously from the worker nodes. In the second case they are fetched serially.

@kshyatt kshyatt added the parallelism Parallel or distributed computation label May 8, 2017
@amitmurthy
Copy link
Contributor Author

Will merge after a green CI

@tkelman
Copy link
Contributor

tkelman commented May 9, 2017

why the rush? it's been open less than a day, give people a chance to see it

@amitmurthy
Copy link
Contributor Author

Sure, no issues.

for r in refs; wait(r); end
@sync begin
for p in filter(x -> x != 1, procs())
@async remotecall_fetch(p) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use remotecall_wait, since the old version called wait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use a remotecall_fetch (after ensuring that the remote execution returns nothing) when we we do not need the result of the remote computation. Will avoid unnecessary storage and and clean-up of the result on the remote node.

@amitmurthy
Copy link
Contributor Author

Will merge in a short while.

@amitmurthy amitmurthy merged commit 9694de7 into master May 11, 2017
@amitmurthy amitmurthy deleted the amitm/optloading branch May 11, 2017 11:34
tkelman pushed a commit that referenced this pull request May 14, 2017
* Optimize remotecall pattern used in distributed loading
(cherry picked from commit 9694de7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants