From 8736041be9008b33722d44dab352701216ec31fe Mon Sep 17 00:00:00 2001 From: JamesWrigley Date: Fri, 29 Mar 2024 15:39:20 +0100 Subject: [PATCH 1/2] Add options docstrings to the docs This mostly just adds the ThunkOptions and SchedulerOptions docstrings to the docs, along with some other minor improvements: - Enabled showing docstrings for Chunk and Shared - Added more links to task-spawning.md - Fixed the formatting of the argument lists in ThunkOptions and SchedulerOptions so they're displayed properly. --- docs/src/api-dagger/types.md | 4 +- docs/src/task-spawning.md | 36 ++++++++------ src/eager_thunk.jl | 7 +++ src/sch/Sch.jl | 96 ++++++++++++++++++------------------ 4 files changed, 78 insertions(+), 65 deletions(-) diff --git a/docs/src/api-dagger/types.md b/docs/src/api-dagger/types.md index f3db74332..7fd8d30fe 100644 --- a/docs/src/api-dagger/types.md +++ b/docs/src/api-dagger/types.md @@ -14,14 +14,14 @@ EagerThunk ``` ## Task Options Types -``` +```@docs Options Sch.ThunkOptions Sch.SchedulerOptions ``` ## Data Management Types -``` +```@docs Chunk Shard ``` diff --git a/docs/src/task-spawning.md b/docs/src/task-spawning.md index be44e0a48..c55f7056f 100644 --- a/docs/src/task-spawning.md +++ b/docs/src/task-spawning.md @@ -1,3 +1,7 @@ +```@meta +CurrentModule = Dagger +``` + # Task Spawning The main entrypoint to Dagger is `@spawn`: @@ -8,8 +12,8 @@ or `spawn` if it's more convenient: `Dagger.spawn(f, Dagger.Options(options), args...; kwargs...)` -When called, it creates an `EagerThunk` (also known as a "thunk" or "task") -object representing a call to function `f` with the arguments `args` and +When called, it creates an [`EagerThunk`](@ref) (also known as a "thunk" or +"task") object representing a call to function `f` with the arguments `args` and keyword arguments `kwargs`. If it is called with other thunks as args/kwargs, such as in `Dagger.@spawn f(Dagger.@spawn g())`, then, in this example, the function `f` gets passed the results of executing `g()`, once that result is @@ -18,21 +22,23 @@ waits on `g()` to complete before executing. An important observation to make is that, for each argument to `@spawn`/`spawn`, if the argument is the result of another `@spawn`/`spawn` -call (thus it's an `EagerThunk`), the argument will be computed first, and then +call (thus it's an [`EagerThunk`](@ref)), the argument will be computed first, and then its result will be passed into the function receiving the argument. If the -argument is *not* an `EagerThunk` (instead, some other type of Julia object), +argument is *not* an [`EagerThunk`](@ref) (instead, some other type of Julia object), it'll be passed as-is to the function `f` (with some exceptions). ## Options -The `Options` struct in the second argument position is optional; if provided, -it is passed to the scheduler to control its behavior. `Options` contains a -`NamedTuple` of option key-value pairs, which can be any of: -- Any field in `Dagger.Sch.ThunkOptions` (see [Scheduler and Thunk options](@ref)) -- `meta::Bool` -- Pass the input `Chunk` objects themselves to `f` and not the value contained in them +The [`Options`](@ref Dagger.Options) struct in the second argument position is +optional; if provided, it is passed to the scheduler to control its +behavior. [`Options`](@ref Dagger.Options) contains a `NamedTuple` of option +key-value pairs, which can be any of: +- Any field in [`Sch.ThunkOptions`](@ref) (see [Scheduler and Thunk options](@ref)) +- `meta::Bool` -- Pass the input [`Chunk`](@ref) objects themselves to `f` and + not the value contained in them. There are also some extra optionss that can be passed, although they're considered advanced options to be used only by developers or library authors: -- `get_result::Bool` -- return the actual result to the scheduler instead of `Chunk` objects. Used when `f` explicitly constructs a Chunk or when return value is small (e.g. in case of reduce) +- `get_result::Bool` -- return the actual result to the scheduler instead of [`Chunk`](@ref) objects. Used when `f` explicitly constructs a [`Chunk`](@ref) or when return value is small (e.g. in case of reduce) - `persist::Bool` -- the result of this Thunk should not be released after it becomes unused in the DAG - `cache::Bool` -- cache the result of this Thunk such that if the thunk is evaluated again, one can just reuse the cached value. If it’s been removed from cache, recompute the value. @@ -68,9 +74,9 @@ The final result (from `fetch(s)`) is the obvious consequence of the operation: ### Eager Execution Dagger's `@spawn` macro works similarly to `@async` and `Threads.@spawn`: when -called, it wraps the function call specified by the user in an `EagerThunk` -object, and immediately places it onto a running scheduler, to be executed once -its dependencies are fulfilled. +called, it wraps the function call specified by the user in an +[`EagerThunk`](@ref) object, and immediately places it onto a running scheduler, +to be executed once its dependencies are fulfilled. ```julia x = rand(400,400) @@ -181,8 +187,8 @@ Note that, as a legacy API, usage of the lazy API is generally discouraged for m While Dagger generally "just works", sometimes one needs to exert some more fine-grained control over how the scheduler allocates work. There are two parallel mechanisms to achieve this: Scheduler options (from -`Dagger.Sch.SchedulerOptions`) and Thunk options (from -`Dagger.Sch.ThunkOptions`). These two options structs contain many shared +[`Sch.SchedulerOptions`](@ref)) and Thunk options (from +[`Sch.ThunkOptions`](@ref)). These two options structs contain many shared options, with the difference being that Scheduler options operate globally across an entire DAG, and Thunk options operate on a thunk-by-thunk basis. diff --git a/src/eager_thunk.jl b/src/eager_thunk.jl index 8a021737e..131dc9873 100644 --- a/src/eager_thunk.jl +++ b/src/eager_thunk.jl @@ -23,6 +23,13 @@ function Base.fetch(t::ThunkFuture; proc=OSProc(), raw=false) end Base.put!(t::ThunkFuture, x; error=false) = put!(t.future, (error, x)) +""" + Options(::NamedTuple) + Options(; kwargs...) + +Options for thunks and the scheduler. See [Task Spawning](@ref) for more +information. +""" struct Options options::NamedTuple end diff --git a/src/sch/Sch.jl b/src/sch/Sch.jl index 5303ed918..10dc16691 100644 --- a/src/sch/Sch.jl +++ b/src/sch/Sch.jl @@ -153,23 +153,23 @@ Stores DAG-global options to be passed to the Dagger.Sch scheduler. # Arguments - `single::Int=0`: (Deprecated) Force all work onto worker with specified id. -`0` disables this option. + `0` disables this option. - `proclist=nothing`: (Deprecated) Force scheduler to use one or more -processors that are instances/subtypes of a contained type. Alternatively, a -function can be supplied, and the function will be called with a processor as -the sole argument and should return a `Bool` result to indicate whether or not -to use the given processor. `nothing` enables all default processors. + processors that are instances/subtypes of a contained type. Alternatively, a + function can be supplied, and the function will be called with a processor as + the sole argument and should return a `Bool` result to indicate whether or not + to use the given processor. `nothing` enables all default processors. - `allow_errors::Bool=true`: Allow thunks to error without affecting -non-dependent thunks. + non-dependent thunks. - `checkpoint=nothing`: If not `nothing`, uses the provided function to save -the final result of the current scheduler invocation to persistent storage, for -later retrieval by `restore`. + the final result of the current scheduler invocation to persistent storage, for + later retrieval by `restore`. - `restore=nothing`: If not `nothing`, uses the provided function to return the -(cached) final result of the current scheduler invocation, were it to execute. -If this returns a `Chunk`, all thunks will be skipped, and the `Chunk` will be -returned. If `nothing` is returned, restoring is skipped, and the scheduler -will execute as usual. If this function throws an error, restoring will be -skipped, and the error will be displayed. + (cached) final result of the current scheduler invocation, were it to execute. + If this returns a `Chunk`, all thunks will be skipped, and the `Chunk` will be + returned. If `nothing` is returned, restoring is skipped, and the scheduler + will execute as usual. If this function throws an error, restoring will be + skipped, and the error will be displayed. """ Base.@kwdef struct SchedulerOptions single::Union{Int,Nothing} = nothing @@ -186,52 +186,52 @@ Stores Thunk-local options to be passed to the Dagger.Sch scheduler. # Arguments - `single::Int=0`: (Deprecated) Force thunk onto worker with specified id. `0` -disables this option. + disables this option. - `proclist=nothing`: (Deprecated) Force thunk to use one or more processors -that are instances/subtypes of a contained type. Alternatively, a function can -be supplied, and the function will be called with a processor as the sole -argument and should return a `Bool` result to indicate whether or not to use -the given processor. `nothing` enables all default processors. + that are instances/subtypes of a contained type. Alternatively, a function can + be supplied, and the function will be called with a processor as the sole + argument and should return a `Bool` result to indicate whether or not to use + the given processor. `nothing` enables all default processors. - `time_util::Dict{Type,Any}`: Indicates the maximum expected time utilization -for this thunk. Each keypair maps a processor type to the utilization, where -the value can be a real (approximately the number of nanoseconds taken), or -`MaxUtilization()` (utilizes all processors of this type). By default, the -scheduler assumes that this thunk only uses one processor. + for this thunk. Each keypair maps a processor type to the utilization, where + the value can be a real (approximately the number of nanoseconds taken), or + `MaxUtilization()` (utilizes all processors of this type). By default, the + scheduler assumes that this thunk only uses one processor. - `alloc_util::Dict{Type,UInt64}`: Indicates the maximum expected memory -utilization for this thunk. Each keypair maps a processor type to the -utilization, where the value is an integer representing approximately the -maximum number of bytes allocated at any one time. + utilization for this thunk. Each keypair maps a processor type to the + utilization, where the value is an integer representing approximately the + maximum number of bytes allocated at any one time. - `occupancy::Dict{Type,Real}`: Indicates the maximum expected processor -occupancy for this thunk. Each keypair maps a processor type to the -utilization, where the value can be a real between 0 and 1 (the occupancy -ratio, where 1 is full occupancy). By default, the scheduler assumes that this -thunk has full occupancy. + occupancy for this thunk. Each keypair maps a processor type to the + utilization, where the value can be a real between 0 and 1 (the occupancy + ratio, where 1 is full occupancy). By default, the scheduler assumes that this + thunk has full occupancy. - `allow_errors::Bool=true`: Allow this thunk to error without affecting -non-dependent thunks. + non-dependent thunks. - `checkpoint=nothing`: If not `nothing`, uses the provided function to save -the result of the thunk to persistent storage, for later retrieval by -`restore`. + the result of the thunk to persistent storage, for later retrieval by + `restore`. - `restore=nothing`: If not `nothing`, uses the provided function to return the -(cached) result of this thunk, were it to execute. If this returns a `Chunk`, -this thunk will be skipped, and its result will be set to the `Chunk`. If -`nothing` is returned, restoring is skipped, and the thunk will execute as -usual. If this function throws an error, restoring will be skipped, and the -error will be displayed. + (cached) result of this thunk, were it to execute. If this returns a `Chunk`, + this thunk will be skipped, and its result will be set to the `Chunk`. If + `nothing` is returned, restoring is skipped, and the thunk will execute as + usual. If this function throws an error, restoring will be skipped, and the + error will be displayed. - `storage::Union{Chunk,Nothing}=nothing`: If not `nothing`, references a -`MemPool.StorageDevice` which will be passed to `MemPool.poolset` internally -when constructing `Chunk`s (such as when constructing the return value). The -device must support `MemPool.CPURAMResource`. When `nothing`, uses -`MemPool.GLOBAL_DEVICE[]`. + `MemPool.StorageDevice` which will be passed to `MemPool.poolset` internally + when constructing `Chunk`s (such as when constructing the return value). The + device must support `MemPool.CPURAMResource`. When `nothing`, uses + `MemPool.GLOBAL_DEVICE[]`. - `storage_root_tag::Any=nothing`: If not `nothing`, -specifies the MemPool storage leaf tag to associate with the thunk's result. -This tag can be used by MemPool's storage devices to manipulate their behavior, -such as the file name used to store data on disk." + specifies the MemPool storage leaf tag to associate with the thunk's result. + This tag can be used by MemPool's storage devices to manipulate their behavior, + such as the file name used to store data on disk." - `storage_leaf_tag::MemPool.Tag,Nothing}=nothing`: If not `nothing`, -specifies the MemPool storage leaf tag to associate with the thunk's result. -This tag can be used by MemPool's storage devices to manipulate their behavior, -such as the file name used to store data on disk." + specifies the MemPool storage leaf tag to associate with the thunk's result. + This tag can be used by MemPool's storage devices to manipulate their behavior, + such as the file name used to store data on disk." - `storage_retain::Bool=false`: The value of `retain` to pass to -`MemPool.poolset` when constructing the result `Chunk`. + `MemPool.poolset` when constructing the result `Chunk`. """ Base.@kwdef struct ThunkOptions single::Union{Int,Nothing} = nothing From 7f6560373d7d8c96add907d292e2a3407ba715bb Mon Sep 17 00:00:00 2001 From: JamesWrigley Date: Fri, 29 Mar 2024 16:12:16 +0100 Subject: [PATCH 2/2] Fix typos --- docs/src/scheduler-internals.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/src/scheduler-internals.md b/docs/src/scheduler-internals.md index 627cac758..52108a1ff 100644 --- a/docs/src/scheduler-internals.md +++ b/docs/src/scheduler-internals.md @@ -69,7 +69,7 @@ execution (called "firing"). Once all tasks are either waiting or running, the scheduler may sleep until actions need to be performed When fired tasks have completed executing, an entry will exist in the inbound -queue signaling the task's result and other metadata. At this point, the most +queue signalling the task's result and other metadata. At this point, the most recently-queued task is removed from the queue, "finished", and placed in the "finished" state. Finishing usually unlocks downstream tasks from the waiting state and allows them to transition to the ready state. @@ -117,7 +117,7 @@ outdated, or when its estimates about the task's behavior are inaccurate. To minimize the possibility of workload imbalance, the worker schedulers' processors will attempt to steal tasks from each other when they are under-occupied. Tasks will only be stolen if the task's [scope](scopes.md) is -compatibl with the processor attempting the steal, so tasks with wider scopes +compatible with the processor attempting the steal, so tasks with wider scopes have better balancing potential. ## Core: Finishing