-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile queries #43345
Profile queries #43345
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
src/librustc_driver/profile/trace.rs
Outdated
depth, | ||
t.extent.len(), | ||
/* Heuristic for 'important' CSS class: */ | ||
if t.extent.len() > 5 || percent >= 1.0 as f64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do these as f64
needed? 1.0
should be f64
by default, or you could force the literal as f64 with 1.0_f64
(Similarly 100.0_f64
several lines above)
src/librustc_driver/profile/trace.rs
Outdated
data.push((cons.clone(), qm.count.clone(), qm.duration.clone())); | ||
}; | ||
data.sort_by(|&(_,_,d1),&(_,_,d2)| | ||
if d1 > d2 { Ordering::Less } else { Ordering::Greater } ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data.sort_by_key(|entry| std::cmp::Reverse(entry.2))
@@ -0,0 +1,167 @@ | |||
// Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2015? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 07daf55
src/librustc/util/common.rs
Outdated
if let None = chan.borrow().as_ref() { | ||
true | ||
} else { false } | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let is_none = chan.borrow().is_none();
src/librustc_driver/profile/mod.rs
Outdated
assert!(frame.parse_st == trace::ParseState::NoQuery); | ||
{ | ||
// write log | ||
if false { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the if false
here expected?
Thanks for the PR @matthewhammer! We'll check in now and again to make sure @nikomatsakis or another reviewer gets to this soon. |
cc @eddyb |
src/librustc/ty/maps.rs
Outdated
// If enabled, send a message to the profile-queries thread | ||
macro_rules! profq_msg { | ||
($tcx:expr, $msg:expr) => { | ||
if $tcx.sess.opts.debugging_opts.profile_queries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you wrap this if
in if cfg!(debug_assertions) { ... }
? Then it will be filtered out at compile time if debug assertions are disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will do; thanks!
src/librustc_driver/profile/mod.rs
Outdated
}, | ||
(ParseState::HaveQuery(q,_), | ||
ProfileQueriesMsg::ProviderEnd) => { | ||
panic!("parse error: unexpected ProviderEnd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use a \
to indicate that the string continues at the first non-whitespace character on the next line, like in:
panic!("parse error: unexpected QueryBegin; \
earlier query is unfinished: {:?} and now {:?}",
q1, Query{span:span2, msg:querymsg2})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
return sum | ||
} | ||
|
||
fn duration_div(nom: Duration, den: Duration) -> f64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something like:
fn duration_div(nom: Duration, den: Duration) -> f64 {
fn to_nanos(d: Duration) -> u64 {
d.as_secs() * 1_000_000_000 + d.subsec_nanos() as u64
}
to_nanos(nom) as f64 / to_nanos(den) as f64
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 689d4ab
@@ -29,6 +33,59 @@ pub struct ErrorReported; | |||
|
|||
thread_local!(static TIME_DEPTH: Cell<usize> = Cell::new(0)); | |||
|
|||
/// Initialized for -Z profile-queries | |||
thread_local!(static PROFQ_CHAN: RefCell<Option<Sender<ProfileQueriesMsg>>> = RefCell::new(None)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a separate thread? Isn't the overhead of synchronization more than that of pushing to a Vec
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. The separate thread design was @nikomatsakis's suggestion, so I'll let him respond first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that explains a bunch. If I get a satisfactory explanation out of him this may be good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we can measure. Just to be sure we all share the same concerns: I presume you are worried that the overhead of making and sending messages will distort the measurements, right? I admit my inclination is that separate threads are basically useful and good overall, if we can fit them in easily enough. =)
I think the reason I suggested a thread was that, with incremental compilation, we found that when we moved processing of the dependency graph into one thread, there was a significant hit, and that copying data to a separate thread was cheaper. But it's not obviously comparable -- in the incremental case, we did some effort to keep overhead low, such as pushing a series of messages into a vec and then sending the entire vec over to the thread to be processed (and using a double-buffering scheme).
In this case, in contrast, using a separate thread causes some difficulty since our keys contain region pointers and hence have to be converted to strings. And of course it's just using a channel rather than a vector, so there may be some overhead there (I'm not sure).
One question also: how many events would we wind up accumulating in this vector? I could imagine it being quite a few! But I guess we just accumulating in the current code too so that doesn't make much difference. I think I was imagining before that we would be doing some "consolidation" (e.g., summing up data or whatever) in the separate thread.
Regardless, seems like we can measure the "distortion" of a separate thread (versus a vector) just by measuring how much time the "main thread" of compilation takes, right?
TL;DR: I am happy either way really, just want to do the thing that gives us the best results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, another point --
There are things we'd eventually like to profile that occur before the tcx is created, or after it is destroyed. I suspect that we will eventually refactor all the things that occur before the tcx is created away completely, but I'm not sure so sure about the stuff that comes after it is destroyed -- in particular, I think it'll always be an important win to drop the arenas before we do LLVM translation, to reduce peak memory usage. So if we have the vector containing references into those arenas, that will make it harder to include those LLVM optimizations into our measurements, presumably (but not impossible, I suppose).
src/librustc/ty/maps.rs
Outdated
@@ -580,10 +598,17 @@ macro_rules! define_maps { | |||
key, | |||
span); | |||
|
|||
profq_msg!(tcx, | |||
ProfileQueriesMsg::QueryBegin(span.clone(), | |||
QueryMsg::$name(format!("{:?}", key)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely happy with this formatting the key. I would prefer it if we started with just the query type, and not the key. Also, I would definitely not record individual keys except by having a threshold of a minimal duration that someone might be interested in. And there is a way to measure time on the spot here, that could be placed in tcx.maps
- basically, adjust the "last query start time" after finishing an inner query so the inner time is not included in the outer one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, is your concern here with the time required to format the key, the space required to store it (or time/space to write it out to output files), or all of the above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time, formatting in the compiler is not optimized for speed but rather code size (to avoid bloat from the presence of every debug!
etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we want multiple options. There are various things we might want data for:
- Which passes take the most time?
- For this, we don't care about cache hits, and we probably don't care about cache keys either.
- Which bits of the input took the most time?
- For this, we don't care about cache hits, but we do care about cache keys, but maybe only those that took significant time.
- An overview of what happened, e.g. because something went wrong with incremental (re-use failed to occur, or re-use occurred improperly) and we'd like to know why
- For this, knowing the precise cache keys is very important, and we do care I think about cache hits.
I think this PR started off targeting the last one, and kind of grew to target the others. I suspect we care about all three. We may want to allow for options choosing which one you want, I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to separate the mechanisms for debugging and profiling more -- it would mean losing the ability to answer "which bits of the input took the most time", but it's not 100% clear to me that this is a useful query. It seems useful in the abstract, but I can't remember asking that question very frequently. But that might be because there wasn't an easy way to do so. Certainly it could be useful when you get random perf regressions. I do often want "show me all debug logs pertaining to foo
", but that's more a debugging thing.
ping @eddyb, @michaelwoerister, @nikomatsakis, I think this may be ready for another round of review? |
@matthewhammer and I talked about this today. It seems like the "best approach" is not obvious. The plan then is to land something and open an issue with the various bits of feedback and iterate. That said, I believe @matthewhammer had planned to do the following before landing:
One thing I don't really know is what final set of command-line switches we want for this. I think in my ideal world we would remove
Using the second switch would subsume the first. Using either switch would dump information to stderr at the end showing where we spent most of our time. Before we just remove |
I'm happy to help with the perfect server update, though we'll need to keep both forms of perf data scraping for backwards compatibility reasons. |
☔ The latest upstream changes (presumably #43506) made this pull request unmergeable. Please resolve the merge conflicts. |
@nikomatsakis I've done what you describe above, that is:
Further, the output file I'll now work on summarizing the discussion above in some detail. In brief, I think the main (unanswered) questions concern the time/space overhead of this tracing thread, and whether we should have a thread or some other place for this global state (a question raised by @eddyb). Also, I believe that I have resolved the merge issues mentioned just above by bors. |
@matthewhammer You should rebase on master to remove the merge commits. |
@matthewhammer great! However, the commit history looks really messy. I think it'd be fine to rebase and/or just squash everything together into one commit, this PR is not so big. |
Looks like the rebase failed and duplicated master commits. This is bad. |
Hey @nikomatsakis and @eddyb, I talked to eddyb briefly on IRC about rebasing, but I totally goofed it up, and haven't gotten back to fixing it. Obviously, I need to do some background reading about @eddyb Thanks for your offer to help, I think I need it. :) |
@matthewhammer if you just want to rebase and squash all this into a single commit, you can just
You'll see your changes in |
@aidanhs I believe we already dealt with it. |
@eddyb the merge is clean but I still see a ton of commits on the github UI? |
@aidanhs I don't think @matthewhammer had a chance to push the result since we talked on IRC. |
825166b
to
0424a98
Compare
@nikomatsakis I got some more help from @eddyb today, and fixed up the rebase in this PR |
@matthewhammer : @nikomatsakis is on vacation until Aug 15, and will probably be busy afterwards. Do you want someone else to review this PR, or are you ok with waiting? |
@arielb1 I'm OK with waiting for @nikomatsakis. Thanks! |
☔ The latest upstream changes (presumably #43522) made this pull request unmergeable. Please resolve the merge conflicts. |
@matthewhammer just a ping on the merge conflicts so this can be reviewed! |
@aidanhs Thanks! I'll get to this rebase later today. |
@nikomatsakis rebased. |
src/librustc/ty/maps.rs
Outdated
@@ -510,6 +511,28 @@ impl<'tcx> QueryDescription for queries::extern_crate<'tcx> { | |||
impl<'tcx> QueryDescription for queries::lint_levels<'tcx> { | |||
fn describe(_tcx: TyCtxt, _: CrateNum) -> String { | |||
format!("computing the lint levels for items in this crate") | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot a }
.
[00:07:43] error: this file contains an un-closed delimiter
[00:07:43] --> /checkout/src/librustc/ty/maps.rs:1116:3
[00:07:43] |
[00:07:43] 1116 | }
[00:07:43] | ^
[00:07:43] |
[00:07:43] help: did you mean to close this delimiter?
[00:07:43] --> /checkout/src/librustc/ty/maps.rs:511:60
[00:07:43] |
[00:07:43] 511 | impl<'tcx> QueryDescription for queries::lint_levels<'tcx> {
[00:07:43] | ^
@bors r+ |
📌 Commit 43335ae has been approved by |
(The travis failure looks spurious.) |
Profile queries This PR implements the "profile queries" debugging feature described here: https://github.com/rust-lang-nursery/rust-forge/blob/master/profile-queries.md In particular, it implements the debugging flag `-Z profile-queries` FYI: This PR is my second attempt at pushing these changes. My original PR required a rebase; I have now done that rebase manually, after messing up with git's "interactive" rebase support. The original (now closed/cancelled) PR is this one: #43156 r? @nikomatsakis
☀️ Test successful - status-appveyor, status-travis |
This PR implements the "profile queries" debugging feature described here:
https://github.com/rust-lang-nursery/rust-forge/blob/master/profile-queries.md
In particular, it implements the debugging flag
-Z profile-queries
FYI: This PR is my second attempt at pushing these changes. My original PR required a rebase; I have now done that rebase manually, after messing up with git's "interactive" rebase support. The original (now closed/cancelled) PR is this one: #43156
r? @nikomatsakis