[experimental] Use `#[thread_local]` instead of `thread_local!` in rustc_middle #106270

jyn514 · 2022-12-29T20:54:27Z

This avoid monomorphizing LocalKey::try_with 5 times for each query. It has the downside that destructors for the thread-local are never run; but we don't depend on the destructors for ImplicitContext itself anywhere.

Helps with #65031 (comment)

r? @jyn514

jyn514 · 2022-12-29T20:54:34Z

@bors try @rust-timer queue

bors · 2022-12-29T20:54:43Z

⌛ Trying commit afd4524ac844d1f2dc3ed87a61d25021897f2993 with merge 389dafa688a3ad30c1a062340c53cdf031a3f1e1...

Noratrieb · 2022-12-29T21:40:51Z

compiler/rustc_middle/src/ty/context.rs

-        static TLV: Cell<usize> = const { Cell::new(0) };
-    }
+    #[thread_local]
+    static TLV: Cell<usize> = const { Cell::new(0) };


Why do we have an inline_const here? Couldn't that just call Cell::new directly?

bjorn3 · 2022-12-29T21:41:50Z

This breaks on systems without #[thread_local] support like I believe MinGW.

Destructors not running doesn't matter for TLV as Cell<usize> doesn't have one anyway.

jyn514 · 2022-12-29T21:46:20Z

This breaks on systems without #[thread_local] support like I believe MinGW.

How can I tell which systems aren't supported by thread_local?

thomcc · 2022-12-29T22:09:16Z

This breaks on systems without #[thread_local] support like I believe MinGW.

How can I tell which systems aren't supported by thread_local?

#[cfg(target_thread_local)], for the most part.

bors · 2022-12-29T23:42:30Z

☀️ Try build successful - checks-actions
Build commit: 389dafa688a3ad30c1a062340c53cdf031a3f1e1 (389dafa688a3ad30c1a062340c53cdf031a3f1e1)

rust-timer · 2022-12-30T01:00:48Z

Finished benchmarking commit (389dafa688a3ad30c1a062340c53cdf031a3f1e1): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.4%	[1.4%, 1.4%]	1
Improvements ✅ (primary)	-0.4%	[-0.5%, -0.3%]	4
Improvements ✅ (secondary)	-0.9%	[-1.4%, -0.3%]	2
All ❌✅ (primary)	-0.4%	[-0.5%, -0.3%]	4

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.8%	[2.1%, 3.4%]	5
Regressions ❌ (secondary)	2.2%	[0.7%, 3.7%]	49
Improvements ✅ (primary)	-0.9%	[-0.9%, -0.9%]	1
Improvements ✅ (secondary)	-1.7%	[-2.3%, -1.4%]	3
All ❌✅ (primary)	2.1%	[-0.9%, 3.4%]	6

Cycles

This benchmark run did not return any relevant results for this metric.

jyn514 · 2022-12-30T21:31:17Z

Comparison of -Z dump-mono-stats before and after:

$ jq -c 'sort_by(.total_estimate, .name) | .[]' rustc_query_impl.mono_items-thread_local_attr.json | diff - <(jq -c 'sort_by(.total_estimate, .name) | .[]' rustc_query_impl.mono_items-thread_local_macro.json)
1893a1894
> {"name":"rustc_middle::ty::tls::get_tlv::{closure#0}","instantiation_count":1,"size_estimate":5,"total_estimate":5}
2997a2999
> {"name":"std::thread::__FastLocalKeyInner::<T>::register_dtor","instantiation_count":1,"size_estimate":10,"total_estimate":10}
9137a9140
> {"name":"rustc_middle::ty::tls::TLV::__getit","instantiation_count":3,"size_estimate":75,"total_estimate":225}
9175d9177
< {"name":"std::mem::needs_drop","instantiation_count":125,"size_estimate":2,"total_estimate":250}
9182a9185
> {"name":"std::mem::needs_drop","instantiation_count":128,"size_estimate":2,"total_estimate":256}
9267d9269
< {"name":"std::thread::LocalKey::<T>::with","instantiation_count":19,"size_estimate":17,"total_estimate":323}
9384c9386
< {"name":"std::mem::drop","instantiation_count":257,"size_estimate":2,"total_estimate":514}
---
> {"name":"std::mem::drop","instantiation_count":256,"size_estimate":2,"total_estimate":512}
9386d9387
< {"name":"<std::result::Result<T, F> as std::ops::FromResidual<std::result::Result<std::convert::Infallible, E>>>::from_residual","instantiation_count":40,"size_estimate":13,"total_estimate":520}
9395a9397
> {"name":"<std::result::Result<T, F> as std::ops::FromResidual<std::result::Result<std::convert::Infallible, E>>>::from_residual","instantiation_count":41,"size_estimate":13,"total_estimate":533}
9828d9829
< {"name":"std::thread::LocalKey::<T>::try_with","instantiation_count":19,"size_estimate":58,"total_estimate":1102}
9880a9882
> {"name":"std::cell::Cell::<T>::get","instantiation_count":192,"size_estimate":8,"total_estimate":1536}
9883d9884
< {"name":"std::cell::Cell::<T>::get","instantiation_count":194,"size_estimate":8,"total_estimate":1552}
9989,9990c9990,9991
< {"name":"std::mem::MaybeUninit::<T>::uninit","instantiation_count":1454,"size_estimate":2,"total_estimate":2908}
< {"name":"std::mem::ManuallyDrop::<T>::into_inner","instantiation_count":1456,"size_estimate":2,"total_estimate":2912}
---
> {"name":"std::mem::MaybeUninit::<T>::uninit","instantiation_count":1453,"size_estimate":2,"total_estimate":2906}
> {"name":"std::mem::ManuallyDrop::<T>::into_inner","instantiation_count":1455,"size_estimate":2,"total_estimate":2910}
9998a10000
> {"name":"std::cell::Cell::<T>::set","instantiation_count":192,"size_estimate":16,"total_estimate":3072}
10001d10002
< {"name":"std::cell::Cell::<T>::set","instantiation_count":193,"size_estimate":16,"total_estimate":3088}
10059a10061
> {"name":"std::cell::Cell::<T>::replace","instantiation_count":192,"size_estimate":25,"total_estimate":4800}
10062d10063
< {"name":"std::cell::Cell::<T>::replace","instantiation_count":193,"size_estimate":25,"total_estimate":4825}
10137c10138
< {"name":"std::cell::UnsafeCell::<T>::get","instantiation_count":1257,"size_estimate":8,"total_estimate":10056}
---
> {"name":"std::cell::UnsafeCell::<T>::get","instantiation_count":1255,"size_estimate":8,"total_estimate":10040}
10151c10152
< {"name":"std::mem::MaybeUninit::<T>::assume_init","instantiation_count":1456,"size_estimate":8,"total_estimate":11648}
---
> {"name":"std::mem::MaybeUninit::<T>::assume_init","instantiation_count":1455,"size_estimate":8,"total_estimate":11640}
10157c10158
< {"name":"std::mem::MaybeUninit::<T>::as_mut_ptr","instantiation_count":1544,"size_estimate":8,"total_estimate":12352}
---
> {"name":"std::mem::MaybeUninit::<T>::as_mut_ptr","instantiation_count":1543,"size_estimate":8,"total_estimate":12344}
10172a10174,10175
> {"name":"rustc_middle::ty::tls::set_tlv::{closure#0}::{closure#0}","instantiation_count":1597,"size_estimate":9,"total_estimate":14373}
> {"name":"rustc_middle::ty::tls::set_tlv::{closure#1}","instantiation_count":1597,"size_estimate":9,"total_estimate":14373}
10181,10182c10184
< {"name":"rustc_middle::ty::tls::set_tlv::{closure#0}","instantiation_count":1597,"size_estimate":11,"total_estimate":17567}
< {"name":"std::ptr::read::runtime","instantiation_count":1426,"size_estimate":13,"total_estimate":18538}
---
> {"name":"std::ptr::read::runtime","instantiation_count":1425,"size_estimate":13,"total_estimate":18525}
10184c10186
< {"name":"std::result::Result::<T, E>::expect","instantiation_count":968,"size_estimate":20,"total_estimate":19360}
---
> {"name":"std::result::Result::<T, E>::expect","instantiation_count":969,"size_estimate":20,"total_estimate":19380}
10185a10188
> {"name":"rustc_middle::ty::tls::set_tlv::{closure#0}","instantiation_count":1597,"size_estimate":13,"total_estimate":20761}
10189c10192
< {"name":"std::mem::replace","instantiation_count":1140,"size_estimate":20,"total_estimate":22800}
---
> {"name":"std::mem::replace","instantiation_count":1139,"size_estimate":20,"total_estimate":22780}
10199c10202
< {"name":"std::ptr::write::runtime","instantiation_count":1819,"size_estimate":16,"total_estimate":29104}
---
> {"name":"std::ptr::write::runtime","instantiation_count":1818,"size_estimate":16,"total_estimate":29088}
10212,10213c10215,10216
< {"name":"std::ptr::write","instantiation_count":1790,"size_estimate":21,"total_estimate":37590}
< {"name":"std::ptr::read","instantiation_count":1426,"size_estimate":28,"total_estimate":39928}
---
> {"name":"std::ptr::write","instantiation_count":1789,"size_estimate":21,"total_estimate":37569}
> {"name":"std::ptr::read","instantiation_count":1425,"size_estimate":28,"total_estimate":39900}
10227a10231
> {"name":"std::thread::LocalKey::<T>::with","instantiation_count":3214,"size_estimate":17,"total_estimate":54638}
10229c10233
< {"name":"rustc_middle::ty::tls::set_tlv","instantiation_count":1597,"size_estimate":37,"total_estimate":59089}
---
> {"name":"rustc_middle::ty::tls::set_tlv","instantiation_count":1597,"size_estimate":39,"total_estimate":62283}
10240a10245
> {"name":"std::thread::LocalKey::<T>::try_with","instantiation_count":3214,"size_estimate":58,"total_estimate":186412}

In particular, notice that the LocalKey::try_with total_estimate has gone from 186412 -> 1102, and LocalKey::with has gone from 54638 -> 323. tls::set_tlv has also gone from 62283 -> 59089, and while there are some other changes none of them look too significant.

Mark-Simulacrum · 2022-12-30T21:33:45Z

Do you have a sense of what percentage that is of the total sum? It looks like wall time is estimated to have gone down by about 3% -- I wonder how good our estimate is. (Note that the wall time is a little noisy, so it might actually be less or more than 3%. It's also true that we have a -j 2 build for this so there's some parallelism involved).

jyn514 · 2022-12-30T21:38:48Z

$ jq 'map(.total_estimate) | add' rustc_query_impl.mono_items-thread_local_macro.json | head
5153413

thread_local_macro is before this PR (29d76cc). dump-mono-stats reports a total reduction of (186412-1102) + (54638 - 323) + (62283 - 59089) = 242819. So 242819 / 5153413 is a 4.7% reduction; and I think the overcounting is only because I've cherry-picked the LocalKey examples, let me try doing it for everything in the file.

jyn514 · 2022-12-30T21:41:44Z

ah no, apparently I was undercounting.

$ jq 'map(.total_estimate) | add' rustc_query_impl.mono_items-thread_local_macro.json
5153413
$ jq 'map(.total_estimate) | add' rustc_query_impl.mono_items-thread_local_attr.json
4878568
$ python
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
>>> 1 - 4878568/5153413
0.05333261665618494

So according to dump-mono-stats this is a 5.3% reduction in code that needs to be sent in LLVM :) it says "estimate" so that might not be 100% accurate but it seems roughly comparable to what perf.rlo says.

jyn514 · 2022-12-30T21:54:03Z

@bjorn3 hmm, x check --host x86_64-pc-windows-gnu compiler works ok for me locally - how could I verify whether this breaks mingw? Does it only break at runtime?

lqd · 2022-12-30T22:12:01Z

The size estimate is a rough approximation here, it's still basically the number of MIR statements (here cargo-llvm-lines stats should be more realistic wrt LLVM compile times).

It's still useful of course, but improving the estimates to better reflect the backends' actual compilation-to-native-code work (rather than, say, the lowering to their own IR) is tracked in issues like #69382 and others.

thomcc · 2022-12-30T22:13:22Z

Does it only break at runtime?

Yes, but not deterministically.

mati865 · 2022-12-30T22:24:02Z

@bjorn3 hmm, x check --host x86_64-pc-windows-gnu compiler works ok for me locally - how could I verify whether this breaks mingw? Does it only break at runtime?

You need to build the compiler that will use TLS so at the very least use build instead of check.

This avoid monomorphizing `LocalKey::try_with` 5 times for each query. It has the downside that destructors for the thread-local are never run; but we don't depend on the destructors for ImplicitContext itself anywhere.

jyn514 · 2023-01-02T23:08:48Z

#106311 has a much better perf improvement, and also avoids needing multiple versions of the code for different platforms.

rustbot assigned jyn514 Dec 29, 2022

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 29, 2022

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 29, 2022

Noratrieb reviewed Dec 29, 2022

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Dec 30, 2022

jyn514 force-pushed the faster-thread-local branch from afd4524 to 2a845e8 Compare December 30, 2022 22:40

This comment has been minimized.

Sign in to view

Noratrieb mentioned this pull request Dec 30, 2022

Don't monomorphize ty::tls closures in rustc_query_impl #106311

Closed

Use #[thread_local] instead of thread_local!

0aa4956

This avoid monomorphizing `LocalKey::try_with` 5 times for each query. It has the downside that destructors for the thread-local are never run; but we don't depend on the destructors for ImplicitContext itself anywhere.

jyn514 force-pushed the faster-thread-local branch from 2a845e8 to 0aa4956 Compare December 30, 2022 23:02

jyn514 closed this Jan 2, 2023

jyn514 deleted the faster-thread-local branch January 5, 2023 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experimental] Use `#[thread_local]` instead of `thread_local!` in rustc_middle #106270

[experimental] Use `#[thread_local]` instead of `thread_local!` in rustc_middle #106270

jyn514 commented Dec 29, 2022

jyn514 commented Dec 29, 2022

This comment has been minimized.

bors commented Dec 29, 2022

Noratrieb Dec 29, 2022

bjorn3 commented Dec 29, 2022

jyn514 commented Dec 29, 2022

thomcc commented Dec 29, 2022

bors commented Dec 29, 2022

This comment has been minimized.

rust-timer commented Dec 30, 2022

jyn514 commented Dec 30, 2022 •

edited

Loading

Mark-Simulacrum commented Dec 30, 2022

jyn514 commented Dec 30, 2022

jyn514 commented Dec 30, 2022 •

edited

Loading

jyn514 commented Dec 30, 2022

lqd commented Dec 30, 2022 •

edited

Loading

thomcc commented Dec 30, 2022

mati865 commented Dec 30, 2022

This comment has been minimized.

jyn514 commented Jan 2, 2023

[experimental] Use #[thread_local] instead of thread_local! in rustc_middle #106270

[experimental] Use #[thread_local] instead of thread_local! in rustc_middle #106270

Conversation

jyn514 commented Dec 29, 2022

jyn514 commented Dec 29, 2022

This comment has been minimized.

bors commented Dec 29, 2022

Noratrieb Dec 29, 2022

Choose a reason for hiding this comment

bjorn3 commented Dec 29, 2022

jyn514 commented Dec 29, 2022

thomcc commented Dec 29, 2022

bors commented Dec 29, 2022

This comment has been minimized.

rust-timer commented Dec 30, 2022

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

jyn514 commented Dec 30, 2022 • edited Loading

Mark-Simulacrum commented Dec 30, 2022

jyn514 commented Dec 30, 2022

jyn514 commented Dec 30, 2022 • edited Loading

jyn514 commented Dec 30, 2022

lqd commented Dec 30, 2022 • edited Loading

thomcc commented Dec 30, 2022

mati865 commented Dec 30, 2022

This comment has been minimized.

jyn514 commented Jan 2, 2023

[experimental] Use `#[thread_local]` instead of `thread_local!` in rustc_middle #106270

[experimental] Use `#[thread_local]` instead of `thread_local!` in rustc_middle #106270

jyn514 commented Dec 30, 2022 •

edited

Loading

jyn514 commented Dec 30, 2022 •

edited

Loading

lqd commented Dec 30, 2022 •

edited

Loading