@@ -12,5 +12,80 @@ Depending on what you're trying to measure, there are several different approach
12
12
See [ their docs] ( https://github.com/rust-lang/measureme/blob/master/summarize/Readme.md ) for more information.
13
13
14
14
- If you want function level performance data or even just more details than the above approaches:
15
- - Consider using a native code profiler such as [ perf] ( profiling/with_perf.html ) .
15
+ - Consider using a native code profiler such as [ perf] ( profiling/with_perf.html )
16
+ - or [ tracy] ( https://github.com/nagisa/rust_tracy_client ) for a nanosecond-precision, full-featured graphical interface.
16
17
18
+ - If you want a nice visual representation of the compile times of your crate graph,
19
+ you can use [ cargo's ` -Ztimings ` flag] ( https://doc.rust-lang.org/cargo/reference/unstable.html#timings ) , eg. ` cargo -Ztimings build ` .
20
+ You can use this flag on the compiler itself with ` CARGOFLAGS="-Ztimings" ./x.py build `
21
+
22
+ ## Optimizing rustc's self-compile-times with cargo-llvm-lines
23
+
24
+ Using [ cargo-llvm-lines] ( https://github.com/dtolnay/cargo-llvm-lines ) you can count the
25
+ number of lines of LLVM IR across all instantiations of a generic function.
26
+ Since most of the time compiling rustc is spent in LLVM, the idea is that by
27
+ reducing the amount of code passed to LLVM, compiling rustc gets faster.
28
+
29
+ Example usage:
30
+ ```
31
+ cargo install cargo-llvm-lines
32
+ # On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P
33
+
34
+ # Do a clean before every run, to not mix in the results from previous runs.
35
+ ./x.py clean
36
+ RUSTFLAGS="--emit=llvm-ir" ./x.py build --stage 0 compiler/rustc
37
+
38
+ # Single crate, eg. rustc_middle
39
+ cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/rustc_middle* > llvm-lines-middle.txt
40
+ # Whole compiler at once
41
+ cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/*.ll > llvm-lines.txt
42
+ ```
43
+
44
+ Example output:
45
+ ```
46
+ Lines Copies Function name
47
+ ----- ------ -------------
48
+ 11802479 (100%) 52848 (100%) (TOTAL)
49
+ 1663902 (14.1%) 400 (0.8%) rustc_query_system::query::plumbing::get_query_impl::{{closure}}
50
+ 683526 (5.8%) 10579 (20.0%) core::ptr::drop_in_place
51
+ 568523 (4.8%) 528 (1.0%) rustc_query_system::query::plumbing::get_query_impl
52
+ 472715 (4.0%) 1134 (2.1%) hashbrown::raw::RawTable<T>::reserve_rehash
53
+ 306782 (2.6%) 1320 (2.5%) rustc_middle::ty::query::plumbing::<impl rustc_query_system::query::QueryContext for rustc_middle::ty::context::TyCtxt>::start_query::{{closure}}::{{closure}}::{{closure}}
54
+ 212800 (1.8%) 514 (1.0%) rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl
55
+ 194813 (1.7%) 124 (0.2%) rustc_query_system::query::plumbing::force_query_impl
56
+ 158488 (1.3%) 1 (0.0%) rustc_middle::ty::query::<impl rustc_middle::ty::context::TyCtxt>::alloc_self_profile_query_strings
57
+ 119768 (1.0%) 418 (0.8%) core::ops::function::FnOnce::call_once
58
+ 119644 (1.0%) 1 (0.0%) rustc_target::spec::load_specific
59
+ 104153 (0.9%) 7 (0.0%) rustc_middle::ty::context::_DERIVE_rustc_serialize_Decodable_D_FOR_TypeckResults::<impl rustc_serialize::serialize::Decodable<__D> for rustc_middle::ty::context::TypeckResults>::decode::{{closure}}
60
+ 81173 (0.7%) 1 (0.0%) rustc_middle::ty::query::stats::query_stats
61
+ 80306 (0.7%) 2029 (3.8%) core::ops::function::FnOnce::call_once{{vtable.shim}}
62
+ 78019 (0.7%) 1611 (3.0%) stacker::grow::{{closure}}
63
+ 69720 (0.6%) 3286 (6.2%) <&T as core::fmt::Debug>::fmt
64
+ 56327 (0.5%) 186 (0.4%) rustc_query_system::query::plumbing::incremental_verify_ich
65
+ 49714 (0.4%) 14 (0.0%) rustc_mir::dataflow::framework::graphviz::BlockFormatter<A>::write_node_label
66
+ ```
67
+
68
+ Since this doesn't seem to work with incremental compilation or ` x.py check ` , you will be compiling rustc _ a lot_ .
69
+ I recommend changing a few settings in ` config.toml ` to make it bearable:
70
+ ```
71
+ [rust]
72
+ # A debug build takes _a fourth_ as long on my machine,
73
+ # but compiling more than stage0 rustc becomes unbearably slow.
74
+ optimize = false
75
+
76
+ # We can't use incremental anyway, so we disable it for a little speed boost.
77
+ incremental = false
78
+ # We won't be running it, so no point in compiling debug checks.
79
+ debug = false
80
+
81
+ # Caution: This changes the output of llvm-lines.
82
+ # Using a single codegen unit gives more accurate output, but is slower to compile.
83
+ # Changing it to the number of cores on my machine increased the output from 3.5GB to 4.1GB and decreased compile times from 5½ min to 4 min.
84
+ codegen-units = 1
85
+ #codegen-units = 0 # num_cpus
86
+ ```
87
+
88
+ What I'm still not sure about is if inlining in MIR optimizations affect llvm-lines.
89
+ The output with ` -Zmir-opt-level=0 ` and ` -Zmir-opt-level=1 ` is the same,
90
+ but it feels like that some functions that show up at the top should be to small
91
+ to have such a high impact. Inlining should only happens in LLVM though.
0 commit comments