|
| 1 | +# Datadeps (Data Dependencies) |
| 2 | + |
| 3 | +For many programs, the restriction that tasks cannot write to their arguments |
| 4 | +feels overly restrictive and makes certain kinds of programs (such as in-place |
| 5 | +linear algebra) hard to express efficiently in Dagger. Thankfully, there is a |
| 6 | +solution: `spawn_datadeps`. This function constructs a "datadeps region", |
| 7 | +within which tasks are allowed to write to their arguments, with parallelism |
| 8 | +controlled via dependencies specified via argument annotations. Let's look at |
| 9 | +a simple example to make things concrete: |
| 10 | + |
| 11 | +```julia |
| 12 | +A = rand(1000) |
| 13 | +B = rand(1000) |
| 14 | +C = zeros(1000) |
| 15 | +add!(X, Y) = X .+= Y |
| 16 | +Dagger.spawn_datadeps() do |
| 17 | + Dagger.@spawn add!(InOut(B), In(A)) |
| 18 | + Dagger.@spawn copyto!(Out(C), In(B)) |
| 19 | +end |
| 20 | +``` |
| 21 | + |
| 22 | +In this example, we have two Dagger tasks being launched, one adding `A` into |
| 23 | +`B`, and the other copying `B` into `C`. The `add!` task is specifying that |
| 24 | +`A` is being only read from (`In` for "input"), and that `B` is being read |
| 25 | +from and written to (`Out` for "output", `InOut` for "input and output"). The |
| 26 | +`copyto` task, similarly, is specifying that `B` is being read from, and `C` |
| 27 | +is only being written to. |
| 28 | + |
| 29 | +Without `spawn_datadeps` and `In`, `Out`, and `InOut`, the result of these |
| 30 | +tasks would be undefined; the two tasks could execute in parallel, or the |
| 31 | +`copyto!` could occur before the `add!`, resulting in all kinds of mayhem. |
| 32 | +However, `spawn_datadeps` changes things: because we have told Dagger how our |
| 33 | +tasks access their arguments, Dagger knows to control the parallelism and |
| 34 | +ordering, and ensure that `add!` executes and finishes before `copyto!` |
| 35 | +begins, ensuring that `copyto!` "sees" the changes to `B` before executing. |
| 36 | + |
| 37 | +There is another important aspect of `spawn_datadeps` that makes the above |
| 38 | +code work: if all of the `Dagger.@spawn` macros are removed, along with the |
| 39 | +dependency specifiers, the program would still produce the same results, |
| 40 | +without using Dagger. In other words, the parallel (Dagger) version of the |
| 41 | +program produces identical results to the serial (non-Dagger) version of the |
| 42 | +program. This is similar to using Dagger with purely functional tasks and |
| 43 | +without `spawn_datadeps` - removing `Dagger.@spawn` will still result in a |
| 44 | +correct (sequential and possibly slower) version of the program. Basically, |
| 45 | +`spawn_datadeps` will ensure that Dagger respects the ordering and |
| 46 | +dependencies of a program, while still providing parallelism, where possible. |
| 47 | + |
| 48 | +But where is the parallelism? The above example doesn't actually have any |
| 49 | +parallelism to exploit! Let's take a look at another example to see the |
| 50 | +datadeps model truly shine: |
| 51 | + |
| 52 | +```julia |
| 53 | +# Tree reduction of multiple arrays into the first array |
| 54 | +function tree_reduce!(op::Base.Callable, As::Vector{<:Array}) |
| 55 | + Dagger.spawn_datadeps() do |
| 56 | + to_reduce = Vector[] |
| 57 | + push!(to_reduce, As) |
| 58 | + while !isempty(to_reduce) |
| 59 | + As = pop!(to_reduce) |
| 60 | + n = length(As) |
| 61 | + if n == 2 |
| 62 | + Dagger.@spawn Base.mapreducedim!(identity, op, InOut(As[1]), In(As[2])) |
| 63 | + elseif n > 2 |
| 64 | + push!(to_reduce, [As[1], As[div(n,2)+1]]) |
| 65 | + push!(to_reduce, As[1:div(n,2)]) |
| 66 | + push!(to_reduce, As[div(n,2)+1:end]) |
| 67 | + end |
| 68 | + end |
| 69 | + end |
| 70 | + return As[1] |
| 71 | +end |
| 72 | + |
| 73 | +As = [rand(1000) for _ in 1:1000] |
| 74 | +Bs = copy.(As) |
| 75 | +tree_reduce!(+, As) |
| 76 | +@assert isapprox(As[1], reduce((x,y)->x .+ y, Bs)) |
| 77 | +``` |
| 78 | + |
| 79 | +In the above implementation of `tree_reduce!` (which is designed to perform an |
| 80 | +elementwise reduction across a vector of arrays), we have a tree reduction |
| 81 | +operation where pairs of arrays are reduced, starting with neighboring pairs, |
| 82 | +and then reducing pairs of reduction results, etc. until the final result is in |
| 83 | +`As[1]`. We can see that the application of Dagger to this algorithm is simple - |
| 84 | +only the single `Base.mapreducedim!` call is passed to Dagger - yet due to the |
| 85 | +data dependencies and the algorithm's structure, there should be plenty of |
| 86 | +parallelism to be exploited across each of the parallel reductions at each |
| 87 | +"level" of the reduction tree. Specifically, any two `Dagger.@spawn` calls |
| 88 | +which access completely different pairs of arrays can execute in parallel, |
| 89 | +while any call which has an `In` on an array will wait for any previous call |
| 90 | +which has an `InOut` on that same array. |
| 91 | + |
| 92 | +Additionally, we can notice a powerful feature of this model - if the |
| 93 | +`Dagger.@spawn` macro is removed, the code still remains correct, but simply |
| 94 | +runs sequentially. This means that the structure of the program doesn't have to |
| 95 | +change in order to use Dagger for parallelization, which can make applying |
| 96 | +Dagger to existing algorithms quite effortless. |
0 commit comments