You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the pid trying to create a DArray has to do enough work, we see a slow down. The work interferes with the asynchronous dispatch of work to the other pids.
using Distributed
addprocs(2)
using DistributedArrays
# a lot of work...@everywherefunctioninit(I)
rr=rand(map(length,I)...)
ss=0.for _ in1:10
ss+=sum(rr.^2)
endexp.(ss*exp.(rr).^3)
end
n=2000@btime d1=DArray(init,(n,n),workers()[1:2])
@btime d1=DArray(init,(n,n),procs()[1:2])
@btime d1=DArray(init,(n,n),procs()[2:-1:1]);
126.553 ms (290 allocations:22.45 KiB)
156.219 ms (253 allocations:213.64 MiB)
112.563 ms (249 allocations:213.64 MiB)
Note that in the last experiment, the same pids are included, but the pid==1 is last and the slowdown disappears (it dispatches the work to others before doing its own). Which suggests the easy fix:
@everywhere@eval DistributedArrays function DistributedArrays.DArray(id, init, dims, pids, idxs, cuts)
localtypes =Vector{DataType}(undef,length(pids))
pids=copy(pids)
ind=findfirst(isequal(myid()),pids)
if ind !=nothing
pids[end],pids[ind]=pids[ind],pids[end]
end@syncbeginfor i =1:length(pids)
@asyncbeginlocal typA
ifisa(init, Function)
typA =remotecall_fetch(construct_localparts, pids[i], init, id, dims, pids, idxs, cuts)
else# constructing from an array of remote refs.
typA =remotecall_fetch(construct_localparts, pids[i], init[i], id, dims, pids, idxs, cuts)
end
localtypes[i] = typA
endendendiflength(unique(localtypes)) !=1@syncfor p in pids
@asyncremotecall_fetch(release_localpart, p, id)
endthrow(ErrorException("Constructed localparts have different `eltype`: $(localtypes)"))
end
A =first(localtypes)
ifmyid() in pids
d = registry[id]
d =isa(d, WeakRef) ? d.value : d
else
T =eltype(A)
N =length(dims)
d =DArray{T,N,A}(id, dims, pids, idxs, cuts, empty_localpart(T,N,A))
end
d
end
And after the fix:
@btime d1=DArray(init,(n,n),workers()[1:2])
@btime d1=DArray(init,(n,n),procs()[1:2])
@btime d1=DArray(init,(n,n),procs()[2:-1:1]);
79.150 ms (296 allocations:22.64 KiB)
111.481 ms (258 allocations:213.64 MiB)
111.870 ms (258 allocations:213.64 MiB)
Will submit a PR with the fix promptly :)
Cheers!
The text was updated successfully, but these errors were encountered:
raminammour
added a commit
to raminammour/DistributedArrays.jl
that referenced
this issue
Jun 6, 2019
Hello,
If the
pid
trying to create aDArray
has to do enough work, we see a slow down. The work interferes with the asynchronous dispatch of work to the otherpids
.Note that in the last experiment, the same
pids
are included, but thepid==1
is last and the slowdown disappears (it dispatches the work to others before doing its own). Which suggests the easy fix:And after the fix:
Will submit a PR with the fix promptly :)
Cheers!
The text was updated successfully, but these errors were encountered: