Skip to content

Commit d348b90

Browse files
committed
logging: Make IDs consistent, add helpers and visualization utils
Makes the :id field consistent (NamedTuple or nothing) Adds enable/disable logging helpers and log fetch helper Adds MultiEventLog events to collect task dependency info Adds utility for creating digraph from task dependency log info Adds interface for IO-based and display-based DAG rendering Add new GraphViz-based DAG renderer Add new Plots-based Gantt chart renderer Remove Context's log_file infrastructure Switches GraphViz rendering to use Pkg extensions
1 parent 49d85b5 commit d348b90

24 files changed

+1027
-490
lines changed

Diff for: Project.toml

+13-4
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,19 @@ StatsBase = "0.28, 0.29, 0.30, 0.31, 0.32, 0.33, 0.34"
3737
TimespanLogging = "0.1"
3838
julia = "1.8"
3939

40+
[extensions]
41+
GraphVizSimpleExt = "Colors"
42+
GraphVizExt = "GraphViz"
43+
PlotsExt = ["DataFrames", "Plots"]
44+
4045
[extras]
4146
Colors = "5ae59095-9a9b-59fe-a467-6f913c188581"
42-
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
43-
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
47+
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
48+
GraphViz = "f526b714-d49f-11e8-06ff-31ed36ee7ee0"
49+
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
4450

45-
[targets]
46-
test = ["Colors", "Pkg", "Test"]
51+
[weakdeps]
52+
Colors = "5ae59095-9a9b-59fe-a467-6f913c188581"
53+
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
54+
GraphViz = "f526b714-d49f-11e8-06ff-31ed36ee7ee0"
55+
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"

Diff for: benchmarks/benchmark.jl

-8
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ Environment Variables:
1818
- BENCHMARK_VISUALIZE - Whether to run the `visualize.jl` script on the output results. May be any value that can parse as a `Bool`.
1919
- BENCHMARK_RENDER - Which rendering mode to use. May be "live" to use the old (and soon to be removed) web renderer, "webdash" to use the DaggerWebDash renderer, or "offline" to use the old (and soon to be removed) offline renderer. The default of "" disables rendering.
2020
- BENCHMARK_LIVE_PORT - Which port to use for web rendering. Defaults to port 8000.
21-
- BENCHMARK_GRAPH - Whether to use dotviz graph rendering. Only useable if using "live" or "offline" rendering methods. Defaults to off, and may be any value that can parse as a `Bool`.
2221
- BENCHMARK_PROFILE - Whether to enable real-time profiling. Defaults to off, and may be any value with parses as a `Bool`. Currently experimental and very, very slow.
2322
- BENCHMARK_SAVE_LOGS - Whether to save logs collected at runtime to the output file. Defaults to off, and may be any value that can parse as a `Bool`.
2423
@@ -123,10 +122,6 @@ end
123122
const RENDERS = Dict{Int,Dict}()
124123
const live_port = parse(Int, get(ENV, "BENCHMARK_LIVE_PORT", "8000"))
125124

126-
const graph = parse(Bool, get(ENV, "BENCHMARK_GRAPH", "0"))
127-
if graph && render == "webdash"
128-
@warn "BENCHMARK_GRAPH=1 is not compatible with BENCHMARK_RENDER=webdash; disabling graphing"
129-
end
130125
const profile = parse(Bool, get(ENV, "BENCHMARK_PROFILE", "0"))
131126
const savelogs = if parse(Bool, get(ENV, "BENCHMARK_SAVE_LOGS", "0"))
132127
if render == "live " || render == "offline"
@@ -155,9 +150,6 @@ function main()
155150
opts = (;profile=profile)
156151
if render == "live"
157152
opts = merge(opts, (;log_sink=LocalEventLog()))
158-
if graph
159-
opts = merge(opts, (;log_file=output_prefix*".dot"))
160-
end
161153
elseif render == "webdash" || savelogs
162154
ml = TimespanLogging.MultiEventLog()
163155
ml[:core] = TimespanLogging.Events.CoreMetrics()

Diff for: docs/make.jl

+5-2
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,12 @@ makedocs(;
2525
"Task Queues" => "task-queues.md",
2626
"Datadeps" => "datadeps.md",
2727
"Option Propagation" => "propagation.md",
28-
"Logging and Graphing" => "logging.md",
28+
"Logging and Visualization" => [
29+
"Logging: Basics" => "logging.md",
30+
"Logging: Visualization" => "logging-visualization.md",
31+
"Logging: Advanced" => "logging-advanced.md",
32+
],
2933
"Checkpointing" => "checkpointing.md",
30-
"Scheduler Visualization" => "scheduler-visualization.md",
3134
"Benchmarking" => "benchmarking.md",
3235
"Dynamic Scheduler Control" => "dynamic.md",
3336
"Scheduler Internals" => "scheduler-internals.md",

Diff for: docs/src/logging-advanced.md

+56
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Logging: Advanced Details
2+
3+
## MultiEventLog
4+
5+
The `MultiEventLog` is intended to be configurable to exclude unnecessary
6+
information, and to include any built-in or user-defined metrics. It stores a
7+
set of "sub-log" streams internally, appending a single element to each of them
8+
when an event is generated. This element can be called a "sub-event" (to
9+
distinguish it from the higher-level "event" that Dagger creates), and is
10+
created by a "consumer". A consumer is a function or callable struct that, when
11+
called with the `Event` object generated by TimespanLogging, returns a sub-event
12+
characterizing whatever information the consumer represents. For example, the
13+
`Dagger.Events.BytesAllocd` consumer calculates the total bytes allocated and
14+
live at any given time within Dagger, and returns the current value when
15+
called. Let's construct one:
16+
17+
```julia
18+
ctx = Dagger.Sch.eager_context()
19+
ml = TimespanLogging.MultiEventLog()
20+
21+
# Add the BytesAllocd consumer to the log as `:bytes`
22+
ml[:bytes] = Dagger.Events.BytesAllocd()
23+
24+
ctx.log_sink = ml
25+
```
26+
27+
As we can see above, each consumer gets a unique name as a `Symbol` that
28+
identifies it. Now that the log sink is attached with a consumer, we can
29+
execute some Dagger tasks, and then collect the sub-events generated by
30+
`BytesAllocd`:
31+
32+
```julia
33+
# Allocates memory for their results
34+
t1 = Dagger.@spawn 3*4
35+
fetch(Dagger.@spawn 1+t1)
36+
log = Dagger.fetch_logs!(ctx)[1] # Get the logs for worker 1
37+
@show log[:bytes]
38+
```
39+
40+
!!! note
41+
`TimespanLogging.get_logs!` clears out the event logs, so that old events
42+
don't mix with new ones from future DAGs.
43+
44+
You'll then see that some number of bytes are allocated and then freed during
45+
the process of executing and completing those tasks.
46+
47+
There are a variety of other consumers built-in to TimespanLogging and Dagger,
48+
under the `TimespanLogging.Events` and `Dagger.Events` modules, respectively;
49+
see [Dagger Types](@ref) and [TimespanLogging Types](@ref) for details.
50+
51+
The `MultiEventLog` also has a mechanism to call a set of functions, called
52+
"aggregators", after all consumers have been executed, and are passed the full
53+
set of log streams as a `Dict{Symbol,Vector{Any}}`. The only one currently
54+
shipped with TimespanLogging directly is the `LogWindow`, and DaggerWebDash.jl
55+
has the `TableStorage` which integrates with it; see
56+
[DaggerWebDash Types](@ref) for details.

Diff for: docs/src/scheduler-visualization.md renamed to docs/src/logging-visualization.md

+37-7
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,42 @@
1-
# Scheduler Visualization with DaggerWebDash
1+
# Logs Visualization
2+
3+
To make Dagger's logging facilities useful without having to write custom code,
4+
Dagger has built-in and easily accessible logs visualization capabilities.
5+
Currently, there are two general mechanisms to visualize logs:
6+
`show_logs`/`render_logs`, and `MultiEventLog` consumers.
7+
8+
The former (`show_logs`/`render_logs`) renders a logs `Dict` (acquired from
9+
`fetch_logs!`) either to an `IO` (via `show_logs`) or by returning a renderable
10+
object (via `render_logs`). This system is designed for rendering a single
11+
snapshot of logs into one or a few renderable objects, and is easily extensible
12+
by libraries or directly by the user, using multiple dispatch on
13+
`show_logs(io::IO, logs::Dict, ::Val{mode})` and
14+
`render_logs(logs::Dict, ::Val{mode})`, where `mode` is a unique `Symbol`
15+
identifying the rendering mode to use. From the user's perspective, `show_logs`
16+
and `render_logs` take not a `Val` but a raw `Symbol`, which will be internally
17+
converted to a `Val` for dispatch purposes
18+
(i.e. `render_logs(logs::Dict, :myrenderer)` ->
19+
`render_logs(logs, Val{:myrenderer}())`).
20+
21+
Built-in rendering support exists for:
22+
- `render_logs(logs, :graphviz)` to generate a graph diagram of executed tasks and their dependencies
23+
- `render_logs(logs, :plots_gantt)` to generate a Gantt chart of task execution across all processors
24+
25+
The latter (`MultiEventLog`) allows for continuously rendering logs as they're
26+
generated, permitting real-time visualization of Dagger's operations. This
27+
logic is utilized in `DaggerWebDash`, which provides a web-based dashboard for
28+
visualizing Dagger's operations as a real-time Gantt chart and set of plots for
29+
various system metrics (CPU usage, memory usage, worker utilization, etc.).
30+
31+
## Visualization with DaggerWebDash
232

333
When working with Dagger, especially when working with its scheduler, it can be
4-
helpful to visualize what Dagger is doing internally. To assist with this, a
5-
web dashboard is available in the DaggerWebDash.jl package. This web dashboard
6-
uses a web server running within each Dagger worker, along with event logging
7-
information, to expose details about the scheduler. Information like worker and
8-
processor saturation, memory allocations, profiling traces, and much more are
9-
available in easy-to-interpret plots.
34+
helpful to visualize what Dagger is doing internally in near-real-time. To
35+
assist with this, a web dashboard is available in the DaggerWebDash.jl package.
36+
This web dashboard uses a web server running within each Dagger worker, along
37+
with event logging information, to expose details about the scheduler.
38+
Information like worker and processor saturation, memory allocations, profiling
39+
traces, and much more are available in easy-to-interpret plots.
1040

1141
Using the dashboard is relatively simple and straightforward; if you run
1242
Dagger's benchmarking script, it's enabled for you automatically if the

Diff for: docs/src/logging.md

+28-123
Original file line numberDiff line numberDiff line change
@@ -5,126 +5,31 @@ actions it does, such as moving data between workers or executing thunks, and
55
tracks how much time and memory allocations these operations consume, among
66
other things. It does it through the `TimespanLogging.jl` package (which used
77
to be directly integrated into Dagger). Saving this information somewhere
8-
accessible is disabled by default, but it's quite easy to turn it on, by
9-
setting a "log sink" in the `Context` being used, as `ctx.log_sink`. A variety
10-
of log sinks are built-in to TimespanLogging; the `NoOpLog` is the default log
11-
sink when one isn't explicitly specified, and disables logging entirely (to
12-
minimize overhead). There are currently two other log sinks of interest; the
13-
first and newer of the two is the `MultiEventLog`, which generates multiple
14-
independent log streams, one per "consumer" (details in the next section). The
15-
second and older sink is the `LocalEventLog`, which is explained later in this
16-
document. Most users are recommended to use the `MultiEventLog` since it's far
17-
more flexible and extensible, and is more performant in general.
18-
19-
## MultiEventLog
20-
21-
The `MultiEventLog` is intended to be configurable to exclude unnecessary
22-
information, and to include any built-in or user-defined metrics. It stores a
23-
set of "sub-log" streams internally, appending a single element to each of them
24-
when an event is generated. This element can be called a "sub-event" (to
25-
distinguish it from the higher-level "event" that Dagger creates), and is
26-
created by a "consumer". A consumer is a function or callable struct that, when
27-
called with the `Event` object generated by TimespanLogging, returns a sub-event
28-
characterizing whatever information the consumer represents. For example, the
29-
`Dagger.Events.BytesAllocd` consumer calculates the total bytes allocated and
30-
live at any given time within Dagger, and returns the current value when
31-
called. Let's construct one:
32-
33-
```julia
34-
ctx = Context()
35-
ml = TimespanLogging.MultiEventLog()
36-
37-
# Add the BytesAllocd consumer to the log as `:bytes`
38-
ml[:bytes] = Dagger.Events.BytesAllocd()
39-
40-
ctx.log_sink = ml
41-
```
42-
43-
As we can see above, each consumer gets a unique name as a `Symbol` that
44-
identifies it. Now that the log sink is attached with a consumer, we can
45-
execute some Dagger tasks, and then collect the sub-events generated by
46-
`BytesAllocd`:
47-
48-
```julia
49-
# Using the lazy API, for explanatory purposes
50-
collect(ctx, delayed(+)(1, delayed(*)(3, 4))) # Allocates 8 bytes
51-
log = TimspanLogging.get_logs!(ctx)[1] # Get the logs for worker 1
52-
@show log[:bytes]
53-
```
54-
55-
You'll then see that 8 bytes are allocated and then freed during the process of
56-
executing and completing those tasks.
57-
58-
Note that the `MultiEventLog` can also be used perfectly well when using
59-
Dagger's eager API:
60-
61-
```julia
62-
ctx = Dagger.Sch.eager_context()
63-
ctx.log_sink = ml
64-
65-
a = Dagger.@spawn 3*4
66-
Dagger.@spawn 1+a
67-
```
68-
69-
There are a variety of other consumers built-in to TimespanLogging and Dagger,
70-
under the `TimespanLogging.Events` and `Dagger.Events` modules, respectively;
71-
see [Dagger Types](@ref) and [TimespanLogging Types](@ref) for details.
72-
73-
The `MultiEventLog` also has a mechanism to call a set of functions, called
74-
"aggregators", after all consumers have been executed, and are passed the full
75-
set of log streams as a `Dict{Symbol,Vector{Any}}`. The only one currently
76-
shipped with TimespanLogging directly is the `LogWindow`, and DaggerWebDash.jl
77-
has the `TableStorage` which integrates with it; see
78-
[DaggerWebDash Types](@ref) for details.
79-
80-
## LocalEventLog
81-
82-
The `LocalEventLog` is generally only useful when you want combined events
83-
(event start and finish combined as a single unit), and only care about a few
84-
simple built-in generated events. Let's attach one to our context:
85-
86-
```julia
87-
ctx = Context()
88-
log = TimespanLogging.LocalEventLog()
89-
ctx.log_sink = log
90-
```
91-
92-
Now anytime `ctx` is used as the context for a scheduler, the scheduler will
93-
log events into `log`.
94-
95-
Once sufficient data has been accumulated into a `LocalEventLog`, it can be
96-
gathered to a single host via `TimespanLogging.get_logs!(log)`. The result is a
97-
`Vector` of `TimespanLogging.Timespan` objects, which describe some metadata
98-
about an operation that occured and the scheduler logged. These events may be
99-
introspected directly, or may also be rendered to a DOT-format string:
100-
101-
```julia
102-
logs = TimespanLogging.get_logs!(log)
103-
str = Dagger.show_plan(logs)
104-
```
105-
106-
`Dagger.show_plan` can also be called as `Dagger.show_plan(io::IO, logs)` to
107-
write the graph to a file or other `IO` object. The string generated by this
108-
function may be passed to an external tool like `Graphviz` for rendering. Note
109-
that this method doesn't display input arguments to the DAG (non-`Thunk`s);
110-
you can call `Dagger.show_plan(logs, thunk)`, where `thunk` is the output
111-
`Thunk` of the DAG, to render argument nodes.
112-
113-
!!! note
114-
`TimespanLogging.get_logs!` clears out the event logs, so that old events
115-
don't mix with new ones from future DAGs.
116-
117-
As a convenience, it's possible to set `ctx.log_file` to the path to an output
118-
file, and then calls to `compute(ctx, ...)`/`collect(ctx, ...)` will
119-
automatically write the graph in DOT format to that path. There is also a
120-
benefit to this approach over manual calls to `get_logs!` and `show_plan`: DAGs
121-
which aren't `Thunk`s (such as operations on the `Dagger.DArray`) will be
122-
properly rendered with input arguments (which normally aren't rendered because
123-
a `Thunk` is dynamically generated from such operations by Dagger before
124-
scheduling).
125-
126-
## FilterLog
127-
128-
The `FilterLog` exists to allow writing events to a user-defined location (such
129-
as a database, file, or network socket). It is not currently tested or
130-
documented.
8+
accessible is disabled by default, but it's quite easy to turn it on, through
9+
two mechanisms.
10+
11+
The first is `Dagger.enable_logging!`, which provides an easy-to-use interface
12+
to both enable and configure logging. The defaults are usually sufficient for
13+
most users, but can be tweaked with keyword arguments.
14+
15+
The second is done by setting a "log sink" in the Dagger `Context` being used,
16+
as `ctx.log_sink`. These log sinks drive how Dagger's logging behaves, and are
17+
configurable by the user, without the need to tweak any of Dagger's internal
18+
code.
19+
20+
A variety of log sinks are built-in to TimespanLogging; the `NoOpLog` is the
21+
default log sink when one isn't explicitly specified, and disables logging
22+
entirely (to minimize overhead). There are currently two other log sinks of
23+
interest; the first and newer of the two is the `MultiEventLog`, which
24+
generates multiple independent log streams, one per "consumer" (details in the
25+
next section). This is the log sink that `enable_logging!` uses, as it's easily
26+
the most flexible. The second and older sink is the `LocalEventLog`, which is
27+
explained later in this document. Most users are recommended to use the
28+
`MultiEventLog` (ideally via `enable_logging!`) since it's far more flexible
29+
and extensible, and is more performant in general.
30+
31+
Log sinks are explained in detail in [Logging: Advanced](@ref); however, if
32+
using `enable_logging!`, everything is already configured for you. Then, all
33+
you need to do is call `Dagger.fetch_logs!()` to get the logs for all workers
34+
as a `Dict`. A variety of tools can operate on these logs, including
35+
visualization through `show_logs` and `render_logs`.

0 commit comments

Comments
 (0)