|
| 1 | +# Profile Guided Optimization |
| 2 | + |
| 3 | +`rustc` supports doing profile-guided optimization (PGO). |
| 4 | +This chapter describes what PGO is, what it is good for, and how it can be used. |
| 5 | + |
| 6 | +## What Is Profiled-Guided Optimization? |
| 7 | + |
| 8 | +The basic concept of PGO is to collect data about the typical execution of |
| 9 | +a program (e.g. which branches it is likely to take) and then use this data |
| 10 | +to inform optimizations such as inlining, machine-code layout, |
| 11 | +register allocation, etc. |
| 12 | + |
| 13 | +There are different ways of collecting data about a program's execution. |
| 14 | +One is to run the program inside a profiler (such as `perf`) and another |
| 15 | +is to create an instrumented binary, that is, a binary that has data |
| 16 | +collection built into it, and run that. |
| 17 | +The latter usually provides more accurate data and it is also what is |
| 18 | +supported by `rustc`. |
| 19 | + |
| 20 | +## Usage |
| 21 | + |
| 22 | +Generating a PGO-optimized program involves following a workflow with four steps: |
| 23 | + |
| 24 | +1. Compile the program with instrumentation enabled |
| 25 | + (e.g. `rustc -Cprofile-generate=/tmp/pgo-data main.rs`) |
| 26 | +2. Run the instrumented program (e.g. `./main`) which generates a |
| 27 | + `default_<id>.profraw` file |
| 28 | +3. Convert the `.profraw` file into a `.profdata` file using |
| 29 | + LLVM's `llvm-profdata` tool |
| 30 | +4. Compile the program again, this time making use of the profiling data |
| 31 | + (for example `rustc -Cprofile-use=merged.profdata main.rs`) |
| 32 | + |
| 33 | +An instrumented program will create one or more `.profraw` files, one for each |
| 34 | +instrumented binary. E.g. an instrumented executable that loads two instrumented |
| 35 | +dynamic libraries at runtime will generate three `.profraw` files. Running an |
| 36 | +instrumented binary multiple times, on the other hand, will re-use the |
| 37 | +respective `.profraw` files, updating them in place. |
| 38 | + |
| 39 | +These `.profraw` files have to be post-processed before they can be fed back |
| 40 | +into the compiler. This is done by the `llvm-profdata` tool. This tool |
| 41 | +is most easily installed via |
| 42 | + |
| 43 | +```bash |
| 44 | +rustup component add llvm-tools-preview |
| 45 | +``` |
| 46 | + |
| 47 | +Note that installing the `llvm-tools-preview` component won't add |
| 48 | +`llvm-profdata` to the `PATH`. Rather, the tool can be found in: |
| 49 | + |
| 50 | +```bash |
| 51 | +~/.rustup/toolchains/<toolchain>/lib/rustlib/<target-triple>/bin/ |
| 52 | +``` |
| 53 | + |
| 54 | +Alternatively, an `llvm-profdata` coming with a recent LLVM or Clang |
| 55 | +version usually works too. |
| 56 | + |
| 57 | +The `llvm-profdata` tool merges multiple `.profraw` files into a single |
| 58 | +`.profdata` file that can then be fed back into the compiler via |
| 59 | +`-Cprofile-use`: |
| 60 | + |
| 61 | +```bash |
| 62 | +# STEP 1: Compile the binary with instrumentation |
| 63 | +rustc -Cprofile-generate=/tmp/pgo-data -O ./main.rs |
| 64 | + |
| 65 | +# STEP 2: Run the binary a few times, maybe with common sets of args. |
| 66 | +# Each run will create or update `.profraw` files in /tmp/pgo-data |
| 67 | +./main mydata1.csv |
| 68 | +./main mydata2.csv |
| 69 | +./main mydata3.csv |
| 70 | + |
| 71 | +# STEP 3: Merge and post-process all the `.profraw` files in /tmp/pgo-data |
| 72 | +llvm-profdata merge -o ./merged.profdata /tmp/pgo-data |
| 73 | + |
| 74 | +# STEP 4: Use the merged `.profdata` file during optimization. All `rustc` |
| 75 | +# flags have to be the same. |
| 76 | +rustc -Cprofile-use=./merged.profdata -O ./main.rs |
| 77 | +``` |
| 78 | + |
| 79 | +### A Complete Cargo Workflow |
| 80 | + |
| 81 | +Using this feature with Cargo works very similar to using it with `rustc` |
| 82 | +directly. Again, we generate an instrumented binary, run it to produce data, |
| 83 | +merge the data, and feed it back into the compiler. Some things of note: |
| 84 | + |
| 85 | +- We use the `RUSTFLAGS` environment variable in order to pass the PGO compiler |
| 86 | + flags to the compilation of all crates in the program. |
| 87 | + |
| 88 | +- We pass the `--target` flag to Cargo, which prevents the `RUSTFLAGS` |
| 89 | + arguments to be passed to Cargo build scripts. We don't want the build |
| 90 | + scripts to generate a bunch of `.profraw` files. |
| 91 | + |
| 92 | +- We pass `--release` to Cargo because that's where PGO makes the most sense. |
| 93 | + In theory, PGO can also be done on debug builds but there is little reason |
| 94 | + to do so. |
| 95 | + |
| 96 | +- It is recommended to use *absolute paths* for the argument of |
| 97 | + `-Cprofile-generate` and `-Cprofile-use`. Cargo can invoke `rustc` with |
| 98 | + varying working directories, meaning that `rustc` will not be able to find |
| 99 | + the supplied `.profdata` file. With absolute paths this is not an issue. |
| 100 | + |
| 101 | +- It is good practice to make sure that there is no left-over profiling data |
| 102 | + from previous compilation sessions. Just deleting the directory is a simple |
| 103 | + way of doing so (see `STEP 0` below). |
| 104 | + |
| 105 | +This is what the entire workflow looks like: |
| 106 | + |
| 107 | +```bash |
| 108 | +# STEP 0: Make sure there is no left-over profiling data from previous runs |
| 109 | +rm -rf /tmp/pgo-data |
| 110 | + |
| 111 | +# STEP 1: Build the instrumented binaries |
| 112 | +RUSTFLAGS="-Cprofile-generate=/tmp/pgo-data" \ |
| 113 | + cargo build --release --target=x86_64-unknown-linux-gnu |
| 114 | + |
| 115 | +# STEP 2: Run the instrumented binaries with some typical data |
| 116 | +./target/x86_64-unknown-linux-gnu/release/myprogram mydata1.csv |
| 117 | +./target/x86_64-unknown-linux-gnu/release/myprogram mydata2.csv |
| 118 | +./target/x86_64-unknown-linux-gnu/release/myprogram mydata3.csv |
| 119 | + |
| 120 | +# STEP 3: Merge the `.profraw` files into a `.profdata` file |
| 121 | +llvm-profdata merge -o /tmp/pgo-data/merged.profdata /tmp/pgo-data |
| 122 | + |
| 123 | +# STEP 4: Use the `.profdata` file for guiding optimizations |
| 124 | +RUSTFLAGS="-Cprofile-use=/tmp/pgo-data/merged.profdata" \ |
| 125 | + cargo build --release --target=x86_64-unknown-linux-gnu |
| 126 | +``` |
| 127 | + |
| 128 | +## Further Reading |
| 129 | + |
| 130 | +`rustc`'s PGO support relies entirely on LLVM's implementation of the feature |
| 131 | +and is equivalent to what Clang offers via the `-fprofile-generate` / |
| 132 | +`-fprofile-use` flags. The [Profile Guided Optimization][clang-pgo] section |
| 133 | +in Clang's documentation is therefore an interesting read for anyone who wants |
| 134 | +to use PGO with Rust. |
| 135 | + |
| 136 | +[clang-pgo]: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization |
0 commit comments