@@ -12,5 +12,84 @@ Depending on what you're trying to measure, there are several different approach
12
12
See [ their docs] ( https://github.com/rust-lang/measureme/blob/master/summarize/Readme.md ) for more information.
13
13
14
14
- If you want function level performance data or even just more details than the above approaches:
15
- - Consider using a native code profiler such as [ perf] ( profiling/with_perf.html ) .
15
+ - Consider using a native code profiler such as [ perf] ( profiling/with_perf.html )
16
+ - or [ tracy] ( https://github.com/nagisa/rust_tracy_client ) for a nanosecond-precision,
17
+ full-featured graphical interface.
16
18
19
+ - If you want a nice visual representation of the compile times of your crate graph,
20
+ you can use [ cargo's ` -Ztimings ` flag] ( https://doc.rust-lang.org/cargo/reference/unstable.html#timings ) ,
21
+ eg. ` cargo -Ztimings build ` .
22
+ You can use this flag on the compiler itself with ` CARGOFLAGS="-Ztimings" ./x.py build `
23
+
24
+ ## Optimizing rustc's self-compile-times with cargo-llvm-lines
25
+
26
+ Using [ cargo-llvm-lines] ( https://github.com/dtolnay/cargo-llvm-lines ) you can count the
27
+ number of lines of LLVM IR across all instantiations of a generic function.
28
+ Since most of the time compiling rustc is spent in LLVM, the idea is that by
29
+ reducing the amount of code passed to LLVM, compiling rustc gets faster.
30
+
31
+ Example usage:
32
+ ```
33
+ cargo install cargo-llvm-lines
34
+ # On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P
35
+
36
+ # Do a clean before every run, to not mix in the results from previous runs.
37
+ ./x.py clean
38
+ RUSTFLAGS="--emit=llvm-ir" ./x.py build --stage 0 compiler/rustc
39
+
40
+ # Single crate, eg. rustc_middle
41
+ cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/rustc_middle* > llvm-lines-middle.txt
42
+ # Whole compiler at once
43
+ cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/*.ll > llvm-lines.txt
44
+ ```
45
+
46
+ Example output:
47
+ ```
48
+ Lines Copies Function name
49
+ ----- ------ -------------
50
+ 11802479 (100%) 52848 (100%) (TOTAL)
51
+ 1663902 (14.1%) 400 (0.8%) rustc_query_system::query::plumbing::get_query_impl::{{closure}}
52
+ 683526 (5.8%) 10579 (20.0%) core::ptr::drop_in_place
53
+ 568523 (4.8%) 528 (1.0%) rustc_query_system::query::plumbing::get_query_impl
54
+ 472715 (4.0%) 1134 (2.1%) hashbrown::raw::RawTable<T>::reserve_rehash
55
+ 306782 (2.6%) 1320 (2.5%) rustc_middle::ty::query::plumbing::<impl rustc_query_system::query::QueryContext for rustc_middle::ty::context::TyCtxt>::start_query::{{closure}}::{{closure}}::{{closure}}
56
+ 212800 (1.8%) 514 (1.0%) rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl
57
+ 194813 (1.7%) 124 (0.2%) rustc_query_system::query::plumbing::force_query_impl
58
+ 158488 (1.3%) 1 (0.0%) rustc_middle::ty::query::<impl rustc_middle::ty::context::TyCtxt>::alloc_self_profile_query_strings
59
+ 119768 (1.0%) 418 (0.8%) core::ops::function::FnOnce::call_once
60
+ 119644 (1.0%) 1 (0.0%) rustc_target::spec::load_specific
61
+ 104153 (0.9%) 7 (0.0%) rustc_middle::ty::context::_DERIVE_rustc_serialize_Decodable_D_FOR_TypeckResults::<impl rustc_serialize::serialize::Decodable<__D> for rustc_middle::ty::context::TypeckResults>::decode::{{closure}}
62
+ 81173 (0.7%) 1 (0.0%) rustc_middle::ty::query::stats::query_stats
63
+ 80306 (0.7%) 2029 (3.8%) core::ops::function::FnOnce::call_once{{vtable.shim}}
64
+ 78019 (0.7%) 1611 (3.0%) stacker::grow::{{closure}}
65
+ 69720 (0.6%) 3286 (6.2%) <&T as core::fmt::Debug>::fmt
66
+ 56327 (0.5%) 186 (0.4%) rustc_query_system::query::plumbing::incremental_verify_ich
67
+ 49714 (0.4%) 14 (0.0%) rustc_mir::dataflow::framework::graphviz::BlockFormatter<A>::write_node_label
68
+ ```
69
+
70
+ Since this doesn't seem to work with incremental compilation or ` x.py check ` ,
71
+ you will be compiling rustc _ a lot_ .
72
+ I recommend changing a few settings in ` config.toml ` to make it bearable:
73
+ ```
74
+ [rust]
75
+ # A debug build takes _a fourth_ as long on my machine,
76
+ # but compiling more than stage0 rustc becomes unbearably slow.
77
+ optimize = false
78
+
79
+ # We can't use incremental anyway, so we disable it for a little speed boost.
80
+ incremental = false
81
+ # We won't be running it, so no point in compiling debug checks.
82
+ debug = false
83
+
84
+ # Caution: This changes the output of llvm-lines.
85
+ # Using a single codegen unit gives more accurate output, but is slower to compile.
86
+ # Changing it to the number of cores on my machine increased the output
87
+ # from 3.5GB to 4.1GB and decreased compile times from 5½ min to 4 min.
88
+ codegen-units = 1
89
+ #codegen-units = 0 # num_cpus
90
+ ```
91
+
92
+ What I'm still not sure about is if inlining in MIR optimizations affect llvm-lines.
93
+ The output with ` -Zmir-opt-level=0 ` and ` -Zmir-opt-level=1 ` is the same,
94
+ but it feels like that some functions that show up at the top should be to small
95
+ to have such a high impact. Inlining should only happens in LLVM though.
0 commit comments