Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain stages in terms of the compiler currently running (take N+1) #857

Merged
merged 5 commits into from
Oct 4, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
241 changes: 127 additions & 114 deletions src/building/bootstrapping.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ It must have been written in a different language. In Rust's case it was
only way to build a modern version of rustc is a slightly less modern
version.

This is exactly how `x.py` works: it downloads the current `beta` release of
This is exactly how `x.py` works: it downloads the current beta release of
rustc, then uses it to compile the new compiler.

## Stages of bootstrapping
Expand Down Expand Up @@ -71,6 +71,8 @@ These defaults are as follows:

You can always override the stage by passing `--stage N` explicitly.

For more information about stages, [see below](#understanding-stages-of-bootstrap).

## Complications of bootstrapping

Since the build system uses the current beta compiler to build the stage-1
Expand Down Expand Up @@ -122,50 +124,135 @@ contribution [here][bootstrap-build].

## Understanding stages of bootstrap

This is a detailed look into the separate bootstrap stages. When running
`x.py` you will see output such as:

```txt
Building stage0 std artifacts
Copying stage0 std from stage0
Building stage0 compiler artifacts
Copying stage0 rustc from stage0
Building LLVM for x86_64-apple-darwin
Building stage0 codegen artifacts
Assembling stage1 compiler
Building stage1 std artifacts
Copying stage1 std from stage1
Building stage1 compiler artifacts
Copying stage1 rustc from stage1
Building stage1 codegen artifacts
Assembling stage2 compiler
Uplifting stage1 std
Copying stage2 std from stage1
Generating unstable book md files
Building stage0 tool unstable-book-gen
Building stage0 tool rustbook
Documenting standalone
Building rustdoc for stage2
Documenting book redirect pages
Documenting stage2 std
Building rustdoc for stage1
Documenting stage2 whitelisted compiler
Documenting stage2 compiler
Documenting stage2 rustdoc
Documenting error index
Uplifting stage1 rustc
Copying stage2 rustc from stage1
Building stage2 tool error_index_generator
```
### Overview

This is a detailed look into the separate bootstrap stages.

The convention `x.py` uses is that:
- A `--stage N` flag means to run the stage N compiler (`stageN/rustc`).
- A "stage N artifact" is a build artifact that is _produced_ by the stage N compiler.
- The "stage (N+1) compiler" is assembled from "stage N artifacts". This
process is called _uplifting_.
Comment on lines +134 to +135
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean when you say "assembled"?

Copy link
Member Author

@jyn514 jyn514 Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not actually sure ... @Mark-Simulacrum what does Assemble actually do? It can't just be copying files because I tried running the binary in stage0-rustc and it gave an error about shared objects:

build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/rustc-main: error while loading shared libraries: librustc_driver-3b0ece280f85df92.so: cannot open shared object file: No such file or directory


#### Build artifacts

Anything you can build with `x.py` is a _build artifact_.
Build artifacts include, but are not limited to:

- binaries, like `stage0-rustc/rustc-main`
- shared objects, like `stage0-sysroot/rustlib/libstd-6fae108520cf72fe.so`
- [rlib] files, like `stage0-sysroot/rustlib/libstd-6fae108520cf72fe.rlib`
- HTML files generated by rustdoc, like `doc/std`

[rlib]: ../serialization.md

#### Examples

- `x.py build --stage 0` means to build with the beta `rustc`.
- `x.py doc --stage 0` means to document using the beta `rustdoc`.
- `x.py test --stage 0 library/std` means to run tests on the standard library
without building `rustc` from source ('build with stage 0, then test the
artifacts'). If you're working on the standard library, this is normally the
test command you want.
- `x.py test src/test/ui` means to build the stage 1 compiler and run
`compiletest` on it. If you're working on the compiler, this is normally the
test command you want.

#### Examples of what *not* to do

- `x.py test --stage 0 src/test/ui` is not meaningful: it runs tests on the
_beta_ compiler and doesn't build `rustc` from source. Use `test src/test/ui`
instead, which builds stage 1 from source.
- `x.py test --stage 0 compiler/rustc` builds the compiler but runs no tests:
it's running `cargo test -p rustc`, but cargo doesn't understand Rust's
tests. You shouldn't need to use this, use `test` instead (without arguments).
- `x.py build --stage 0 compiler/rustc` builds the compiler, but does not make
it usable: the build artifacts are not assembled into the final compiler
([#73519]). Use `x.py build library/std` instead, which puts the compiler in
`stage1/rustc`.

[#73519]: https://github.com/rust-lang/rust/issues/73519

Copy link
Member

@camelid camelid Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about having a table like this? (Correct me if I got things mixed up; it's confusing :) )

I think this will help clear things up for people (me).

Stage Flag for building this stage Flag for running this stage Output directory What it's used for
stage 0 N/A --stage 0 N/A This is a recent beta version of rustc, equivalent to rustc +beta ...
stage 1 --stage 1 --stage 1 build/<toolchain>/stage1 Usually you want to build this when working on Rust
stage 2 --stage 2 --stage 2 build<toolchain>/stage2 This is what's released to rustup; you probably don't want to build this when working on Rust
Suggested change
### Stages
| Stage | Flag for _building_ this stage | Flag for _running_ this stage | Output directory | What it's used for |
| --- | --- | --- | --- | --- |
| stage 0 | N/A | `--stage 0` | N/A | This is a recent beta version of rustc, equivalent to `rustc +beta ...` |
| stage 1 | `--stage 1` | `--stage 1` | `build/<toolchain>/stage1` | Usually you want to build this when working on Rust |
| stage 2 | `--stage 2` | `--stage 2` | `build<toolchain>/stage2` | This is what's released to rustup; you probably don't want to build this when working on Rust |

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, this is basically #843. Let's hold off on that for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, maybe I'll open a follow-up PR then :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's not entirely correct, test --stage 1 src/test/ui runs the stage1 compiler but test --stage library/std tests the stage1 artifacts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean... I guess that's the point :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a link to
image
but you said yourself it confused you more than it helps. I don't know any better way to say it; any view that doesn't have different sections for the compiler and standard library is going to be incomplete and misleading.

Copy link
Contributor

@Julian-Wollersberger Julian-Wollersberger Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this table with the arrows is really helpful.
Could you add it to the ### Overview section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a lot of people in #843 that said it confused more than it helps. I don't have the energy to push this through to an official document, I'm happy to host it on my blog or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least, if this is added I would like it to not be in this PR. This PR is mostly uncontroversial changes and gets rid of the wall of text and I don't want to lose those in a debate over what the model of bootstrapping should be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I haven't read #843. Nevermind then.

### Building vs. Running


A deeper look into `x.py`'s phases can be seen here:
Note that `build --stage N compiler/rustc` **does not** build the stage N compiler:
instead it builds the stage _N+1_ compiler _using_ the stage N compiler.

In short, _stage 0 uses the stage0 compiler to create stage0 artifacts which
will later be uplifted to be the stage1 compiler_.

In each stage, two major steps are performed:

1. `std` is compiled by the stage N compiler.
2. That `std` is linked to programs built by the stage N compiler, including
the stage N artifacts (stage (N+1) compiler).

This is somewhat intuitive if one thinks of the stage N artifacts as "just"
another program we are building with the stage N compiler:
`build --stage N compiler/rustc` is linking the stage N artifacts to the `std`
built by the stage N compiler.

Here is a chart of a full build using `x.py`:

<img alt="A diagram of the rustc compilation phases" src="../img/rustc_stages.svg" class="center" />

Keep in mind this diagram is a simplification, i.e. `rustdoc` can be built at
different stages, the process is a bit different when passing flags such as
`--keep-stage`, or if there are non-host targets.

The stage 2 compiler is what is shipped to end-users.

### Stages and `std`

Note that there are two `std` libraries in play here:
1. The library _linked_ to `stageN/rustc`, which was built by stage N-1 (stage N-1 `std`)
2. The library _used to compile programs_ with `stageN/rustc`, which was
built by stage N (stage N `std`).

Stage N `std` is pretty much necessary for any useful work with the stage N compiler.
Without it, you can only compile programs with `#![no_core]` -- not terribly useful!

The reason these need to be different is because they aren't necessarily ABI-compatible:
there could be a new layout optimizations, changes to MIR, or other changes
to Rust metadata on nightly that aren't present in beta.

This is also where `--keep-stage 1 library/std` comes into play. Since most
changes to the compiler don't actually change the ABI, once you've produced a
`std` in stage 1, you can probably just reuse it with a different compiler.
If the ABI hasn't changed, you're good to go, no need to spend time
recompiling that `std`.
`--keep-stage` simply assumes the previous compile is fine and copies those
artifacts into the appropriate place, skipping the cargo invocation.

### Cross-compiling

Building stage2 `std` is different depending on whether you are cross-compiling or not
(see in the table how stage2 only builds non-host `std` targets).
This is because `x.py` uses a trick: if `HOST` and `TARGET` are the same,
it will reuse stage1 `std` for stage2! This is sound because stage1 `std`
was compiled with the stage1 compiler, i.e. a compiler using the source code
you currently have checked out. So it should be identical (and therefore ABI-compatible)
to the `std` that `stage2/rustc` would compile.

However, when cross-compiling, stage1 `std` will only run on the host.
So the stage2 compiler has to recompile `std` for the target.

### Why does only libstd use `cfg(bootstrap)`?

The `rustc` generated by the stage0 compiler is linked to the freshly-built
`std`, which means that for the most part only `std` needs to be cfg-gated,
so that `rustc` can use features added to std immediately after their addition,
without need for them to get into the downloaded beta.

Note this is different from any other Rust program: stage1 `rustc`
is built by the _beta_ compiler, but using the _master_ version of libstd!

The only time `rustc` uses `cfg(bootstrap)` is when it adds internal lints
that use diagnostic items. This happens very rarely.

### Directories and artifacts generated by x.py

The following tables indicate the outputs of various stage actions:

| Stage 0 Action | Output |
Expand All @@ -178,7 +265,7 @@ The following tables indicate the outputs of various stage actions:
| copy `stage0-rustc (except executable)` | `build/HOST/stage0-sysroot/lib/rustlib/HOST` |
| build `llvm` | `build/HOST/llvm` |
| `stage0` builds `codegen` with `stage0-sysroot` | `build/HOST/stage0-codegen/HOST` |
| `stage0` builds `rustdoc` with `stage0-sysroot` | `build/HOST/stage0-tools/HOST` |
| `stage0` builds `rustdoc`, `clippy`, `miri`, with `stage0-sysroot` | `build/HOST/stage0-tools/HOST` |

`--stage=0` stops here.

Expand All @@ -201,93 +288,19 @@ The following tables indicate the outputs of various stage actions:
| copy (uplift) `stage1-sysroot` | `build/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST` |
| `stage2` builds `test`/`std` (not HOST targets) | `build/HOST/stage2-std/TARGET` |
| copy `stage2-std` (not HOST targets) | `build/HOST/stage2/lib/rustlib/TARGET` |
| `stage2` builds `rustdoc` | `build/HOST/stage2-tools/HOST` |
| `stage2` builds `rustdoc`, `clippy`, `miri` | `build/HOST/stage2-tools/HOST` |
| copy `rustdoc` | `build/HOST/stage2/bin` |

`--stage=2` stops here.

Note that the convention `x.py` uses is that:
- A "stage N artifact" is an artifact that is _produced_ by the stage N compiler.
- The "stage (N+1) compiler" is assembled from "stage N artifacts".
- A `--stage N` flag means build _with_ stage N.

In short, _stage 0 uses the stage0 compiler to create stage0 artifacts which
will later be uplifted to stage1_.

Every time any of the main artifacts (`std` and `rustc`) are compiled, two
steps are performed.
When `std` is compiled by a stage N compiler, that `std` will be linked to
programs built by the stage N compiler (including `rustc` built later
on). It will also be used by the stage (N+1) compiler to link against itself.
This is somewhat intuitive if one thinks of the stage (N+1) compiler as "just"
another program we are building with the stage N compiler. In some ways, `rustc`
(the binary, not the `rustbuild` step) could be thought of as one of the few
`no_core` binaries out there.

So "stage0 std artifacts" are in fact the output of the downloaded stage0
compiler, and are going to be used for anything built by the stage0 compiler:
e.g. `rustc` artifacts. When it announces that it is "building stage1
std artifacts" it has moved on to the next bootstrapping phase. This pattern
continues in latter stages.

Also note that building host `std` and target `std` are different based on the
stage (e.g. see in the table how stage2 only builds non-host `std` targets.
This is because during stage2, the host `std` is uplifted from the "stage 1"
`std` -- specifically, when "Building stage 1 artifacts" is announced, it is
later copied into stage2 as well (both the compiler's `libdir` and the
`sysroot`).

This `std` is pretty much necessary for any useful work with the compiler.
Specifically, it's used as the `std` for programs compiled by the newly compiled
compiler (so when you compile `fn main() { }` it is linked to the last `std`
compiled with `x.py build library/std`).

The `rustc` generated by the stage0 compiler is linked to the freshly-built
`std`, which means that for the most part only `std` needs to be cfg-gated,
so that `rustc` can use featured added to std immediately after their addition,
without need for them to get into the downloaded beta. The `std` built by the
`stage1/bin/rustc` compiler, also known as "stage1 std artifacts", is not
necessarily ABI-compatible with that compiler.
That is, the `rustc` binary most likely could not use this `std` itself.
It is however ABI-compatible with any programs that the `stage1/bin/rustc`
binary builds (including itself), so in that sense they're paired.

This is also where `--keep-stage 1 library/std` comes into play. Since most
changes to the compiler don't actually change the ABI, once you've produced a
`std` in stage 1, you can probably just reuse it with a different compiler.
If the ABI hasn't changed, you're good to go, no need to spend the time
recompiling that `std`.
`--keep-stage` simply assumes the previous compile is fine and copies those
artifacts into the appropriate place, skipping the cargo invocation.

The reason we first build `std`, then `rustc`, is largely just
because we want to minimize `cfg(stage0)` in the code for `rustc`.
Currently `rustc` is always linked against a "new" `std` so it doesn't
ever need to be concerned with differences in std; it can assume that the std is
as fresh as possible.

The reason we need to build it twice is because of ABI compatibility.
The beta compiler has it's own ABI, and then the `stage1/bin/rustc` compiler
will produce programs/libraries with the new ABI.
We used to build three times, but because we assume that the ABI is constant
within a codebase, we presume that the libraries produced by the "stage2"
compiler (produced by the `stage1/bin/rustc` compiler) is ABI-compatible with
the `stage1/bin/rustc` compiler's produced libraries.
What this means is that we can skip that final compilation -- and simply use the
same libraries as the `stage2/bin/rustc` compiler uses itself for programs it
links against.

This `stage2/bin/rustc` compiler is shipped to end-users, along with the
`stage 1 {std,rustc}` artifacts.

## Passing stage-specific flags to `rustc`

`x.py` allows you to pass stage-specific flags to `rustc` when bootstrapping.
The `RUSTFLAGS_STAGE_0`, `RUSTFLAGS_STAGE_1` and `RUSTFLAGS_STAGE_2`
environment variables pass the given flags when building stage 0, 1, and 2
artifacts respectively.

Additionally, the `RUSTFLAGS_STAGE_NOT_0` variable, as its name suggests, pass
Additionally, the `RUSTFLAGS_STAGE_NOT_0` variable, as its name suggests, passes
the given arguments if the stage is not 0.

## Environment Variables
Expand Down