Skip to content

Commit 63f0782

Browse files
author
Julian Wollersberger
committed
Document the usage of cargo-llvm-lines and -Ztimings.
1 parent 6159dde commit 63f0782

File tree

1 file changed

+80
-1
lines changed

1 file changed

+80
-1
lines changed

src/profiling.md

+80-1
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,84 @@ Depending on what you're trying to measure, there are several different approach
1212
See [their docs](https://github.com/rust-lang/measureme/blob/master/summarize/Readme.md) for more information.
1313

1414
- If you want function level performance data or even just more details than the above approaches:
15-
- Consider using a native code profiler such as [perf](profiling/with_perf.html).
15+
- Consider using a native code profiler such as [perf](profiling/with_perf.html)
16+
- or [tracy](https://github.com/nagisa/rust_tracy_client) for a nanosecond-precision,
17+
full-featured graphical interface.
1618

19+
- If you want a nice visual representation of the compile times of your crate graph,
20+
you can use [cargo's `-Ztimings` flag](https://doc.rust-lang.org/cargo/reference/unstable.html#timings),
21+
eg. `cargo -Ztimings build`.
22+
You can use this flag on the compiler itself with `CARGOFLAGS="-Ztimings" ./x.py build`
23+
24+
## Optimizing rustc's self-compile-times with cargo-llvm-lines
25+
26+
Using [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) you can count the
27+
number of lines of LLVM IR across all instantiations of a generic function.
28+
Since most of the time compiling rustc is spent in LLVM, the idea is that by
29+
reducing the amount of code passed to LLVM, compiling rustc gets faster.
30+
31+
Example usage:
32+
```
33+
cargo install cargo-llvm-lines
34+
# On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P
35+
36+
# Do a clean before every run, to not mix in the results from previous runs.
37+
./x.py clean
38+
RUSTFLAGS="--emit=llvm-ir" ./x.py build --stage 0 compiler/rustc
39+
40+
# Single crate, eg. rustc_middle
41+
cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/rustc_middle* > llvm-lines-middle.txt
42+
# Whole compiler at once
43+
cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/*.ll > llvm-lines.txt
44+
```
45+
46+
Example output:
47+
```
48+
Lines Copies Function name
49+
----- ------ -------------
50+
11802479 (100%) 52848 (100%) (TOTAL)
51+
1663902 (14.1%) 400 (0.8%) rustc_query_system::query::plumbing::get_query_impl::{{closure}}
52+
683526 (5.8%) 10579 (20.0%) core::ptr::drop_in_place
53+
568523 (4.8%) 528 (1.0%) rustc_query_system::query::plumbing::get_query_impl
54+
472715 (4.0%) 1134 (2.1%) hashbrown::raw::RawTable<T>::reserve_rehash
55+
306782 (2.6%) 1320 (2.5%) rustc_middle::ty::query::plumbing::<impl rustc_query_system::query::QueryContext for rustc_middle::ty::context::TyCtxt>::start_query::{{closure}}::{{closure}}::{{closure}}
56+
212800 (1.8%) 514 (1.0%) rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl
57+
194813 (1.7%) 124 (0.2%) rustc_query_system::query::plumbing::force_query_impl
58+
158488 (1.3%) 1 (0.0%) rustc_middle::ty::query::<impl rustc_middle::ty::context::TyCtxt>::alloc_self_profile_query_strings
59+
119768 (1.0%) 418 (0.8%) core::ops::function::FnOnce::call_once
60+
119644 (1.0%) 1 (0.0%) rustc_target::spec::load_specific
61+
104153 (0.9%) 7 (0.0%) rustc_middle::ty::context::_DERIVE_rustc_serialize_Decodable_D_FOR_TypeckResults::<impl rustc_serialize::serialize::Decodable<__D> for rustc_middle::ty::context::TypeckResults>::decode::{{closure}}
62+
81173 (0.7%) 1 (0.0%) rustc_middle::ty::query::stats::query_stats
63+
80306 (0.7%) 2029 (3.8%) core::ops::function::FnOnce::call_once{{vtable.shim}}
64+
78019 (0.7%) 1611 (3.0%) stacker::grow::{{closure}}
65+
69720 (0.6%) 3286 (6.2%) <&T as core::fmt::Debug>::fmt
66+
56327 (0.5%) 186 (0.4%) rustc_query_system::query::plumbing::incremental_verify_ich
67+
49714 (0.4%) 14 (0.0%) rustc_mir::dataflow::framework::graphviz::BlockFormatter<A>::write_node_label
68+
```
69+
70+
Since this doesn't seem to work with incremental compilation or `x.py check`,
71+
you will be compiling rustc _a lot_.
72+
I recommend changing a few settings in `config.toml` to make it bearable:
73+
```
74+
[rust]
75+
# A debug build takes _a fourth_ as long on my machine,
76+
# but compiling more than stage0 rustc becomes unbearably slow.
77+
optimize = false
78+
79+
# We can't use incremental anyway, so we disable it for a little speed boost.
80+
incremental = false
81+
# We won't be running it, so no point in compiling debug checks.
82+
debug = false
83+
84+
# Caution: This changes the output of llvm-lines.
85+
# Using a single codegen unit gives more accurate output, but is slower to compile.
86+
# Changing it to the number of cores on my machine increased the output
87+
# from 3.5GB to 4.1GB and decreased compile times from 5½ min to 4 min.
88+
codegen-units = 1
89+
#codegen-units = 0 # num_cpus
90+
```
91+
92+
What I'm still not sure about is if inlining in MIR optimizations affect llvm-lines.
93+
The output with `-Zmir-opt-level=0` and `-Zmir-opt-level=1` is the same,
94+
but it feels like that some functions that show up at the top should be to small
95+
to have such a high impact. Inlining should only happens in LLVM though.

0 commit comments

Comments
 (0)