Skip to content

compiletest: Re-land using the new non-libtest executor by default #140288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 27, 2025

Conversation

Zalathar
Copy link
Contributor

This PR re-lands #139998, which had the misfortune of triggering download-rustc in its CI jobs, so we didn't get proper test metrics for comparison with the old implementation. So that was PR was reverted in #140233, with the intention of re-landing it alongside a dummy compiler change to inhibit download-rustc.


Original PR description for #139998:

The new executor was implemented in #139660, but required a manual opt-in. This PR activates the new executor by default, but leaves the old libtest-based executor in place (temporarily) to make reverting easier if something unexpectedly goes horribly wrong.

Currently the new executor can be explicitly disabled by passing the -N flag to compiletest (e.g. ./x test ui -- -N), but eventually that flag will be removed, alongside the removal of the libtest dependency. The flag is mostly there to make manual comparative testing easier if something does go wrong.

As before, there should be no user-visible difference between the old executor and the new executor.


r? jieyouxu

(Re-landing rust-lang#139998, with a compiler change to inhibit download-rustc.)

Currently the new executor can be explicitly disabled by passing the `-N` flag
to compiletest (e.g. `./x test ui -- -N`), but eventually that flag will be
removed, alongside the removal of the libtest dependency.
@rustbot rustbot added A-compiletest Area: The compiletest test runner A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 25, 2025
@rustbot
Copy link
Collaborator

rustbot commented Apr 25, 2025

Some changes occurred in src/tools/compiletest

cc @jieyouxu

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@Zalathar
Copy link
Contributor Author

@bors rollup=never

Copy link
Member

@jieyouxu jieyouxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks

@jieyouxu
Copy link
Member

@bors r+

@bors
Copy link
Collaborator

bors commented Apr 25, 2025

📌 Commit 1670de4 has been approved by jieyouxu

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 25, 2025
@bors
Copy link
Collaborator

bors commented Apr 26, 2025

⌛ Testing commit 1670de4 with merge 43e62a7...

@bors
Copy link
Collaborator

bors commented Apr 27, 2025

☀️ Test successful - checks-actions
Approved by: jieyouxu
Pushing 43e62a7 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 27, 2025
@bors bors merged commit 43e62a7 into rust-lang:master Apr 27, 2025
7 checks passed
@rustbot rustbot added this to the 1.88.0 milestone Apr 27, 2025
Copy link

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 10fa3c4 (parent) -> 43e62a7 (this PR)

Test differences

No test diffs found

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 43e62a789c772642f79086f2cceef171cff30e63 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-aarch64-linux: 5215.3s -> 8154.4s (56.4%)
  2. dist-x86_64-apple: 7512.3s -> 9320.8s (24.1%)
  3. x86_64-apple-2: 5502.6s -> 4297.1s (-21.9%)
  4. dist-apple-various: 7821.8s -> 6120.8s (-21.7%)
  5. x86_64-gnu-stable: 7498.7s -> 6693.7s (-10.7%)
  6. aarch64-apple: 4081.6s -> 3747.4s (-8.2%)
  7. dist-i586-gnu-i586-i686-musl: 5211.7s -> 4936.9s (-5.3%)
  8. i686-gnu-2: 6633.6s -> 6287.8s (-5.2%)
  9. dist-armhf-linux: 5315.4s -> 5044.6s (-5.1%)
  10. x86_64-apple-1: 7218.4s -> 6861.6s (-4.9%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@Zalathar Zalathar deleted the new-executor branch April 27, 2025 01:57
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (43e62a7): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.5% [-0.5%, -0.5%] 1
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary -0.4%, secondary -2.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.4%, 0.6%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.1% [-2.1%, -2.1%] 1
Improvements ✅
(secondary)
-2.5% [-2.5%, -2.5%] 1
All ❌✅ (primary) -0.4% [-2.1%, 0.6%] 3

Cycles

Results (primary 0.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.5%, 0.5%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [0.5%, 0.5%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 763.667s -> 764.554s (0.12%)
Artifact size: 365.13 MiB -> 365.12 MiB (-0.00%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-compiletest Area: The compiletest test runner A-testsuite Area: The testsuite used to check the correctness of rustc merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants