Profile queries #43345

matthewhammer · 2017-07-19T21:17:21Z

This PR implements the "profile queries" debugging feature described here:
https://github.com/rust-lang-nursery/rust-forge/blob/master/profile-queries.md

In particular, it implements the debugging flag -Z profile-queries

FYI: This PR is my second attempt at pushing these changes. My original PR required a rebase; I have now done that rebase manually, after messing up with git's "interactive" rebase support. The original (now closed/cancelled) PR is this one: #43156

r? @nikomatsakis

rust-highfive · 2017-07-19T21:17:33Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

kennytm · 2017-07-19T21:34:33Z

src/librustc_driver/profile/trace.rs

+               depth,
+               t.extent.len(),
+               /* Heuristic for 'important' CSS class: */
+               if t.extent.len() > 5 || percent >= 1.0 as f64 {


Why do these as f64 needed? 1.0 should be f64 by default, or you could force the literal as f64 with 1.0_f64 (Similarly 100.0_f64 several lines above)

kennytm · 2017-07-19T21:36:47Z

src/librustc_driver/profile/trace.rs

+        data.push((cons.clone(), qm.count.clone(), qm.duration.clone()));
+    };
+    data.sort_by(|&(_,_,d1),&(_,_,d2)|
+                 if d1 > d2 { Ordering::Less } else { Ordering::Greater } );


data.sort_by_key(|entry| std::cmp::Reverse(entry.2))

kennytm · 2017-07-19T21:41:16Z

src/librustc_driver/profile/mod.rs

@@ -0,0 +1,167 @@
+// Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT


fixed in 07daf55

kennytm · 2017-07-19T21:43:25Z

src/librustc/util/common.rs

+            if let None = chan.borrow().as_ref() {
+                true
+            } else { false }
+        ;


let is_none = chan.borrow().is_none();

kennytm · 2017-07-19T21:44:25Z

src/librustc_driver/profile/mod.rs

+                assert!(frame.parse_st == trace::ParseState::NoQuery);
+                {
+                    // write log
+                    if false {


Is the if false here expected?

alexcrichton · 2017-07-19T22:28:07Z

r? @nikomatsakis

aidanhs · 2017-07-20T11:11:50Z

Thanks for the PR @matthewhammer! We'll check in now and again to make sure @nikomatsakis or another reviewer gets to this soon.

michaelwoerister · 2017-07-20T14:04:04Z

cc @eddyb

michaelwoerister · 2017-07-20T14:12:15Z

src/librustc/ty/maps.rs

+// If enabled, send a message to the profile-queries thread
+macro_rules! profq_msg {
+    ($tcx:expr, $msg:expr) => {
+        if $tcx.sess.opts.debugging_opts.profile_queries {


Could you wrap this if in if cfg!(debug_assertions) { ... }? Then it will be filtered out at compile time if debug assertions are disabled.

OK, will do; thanks!

michaelwoerister · 2017-07-20T14:26:23Z

src/librustc_driver/profile/mod.rs

+                    },
+                    (ParseState::HaveQuery(q,_),
+                     ProfileQueriesMsg::ProviderEnd) => {
+                        panic!("parse error: unexpected ProviderEnd;


You can use a \ to indicate that the string continues at the first non-whitespace character on the next line, like in:

panic!("parse error: unexpected QueryBegin; \ earlier query is unfinished: {:?} and now {:?}", q1, Query{span:span2, msg:querymsg2})

michaelwoerister · 2017-07-20T14:33:39Z

src/librustc_driver/profile/trace.rs

+    return sum
+}
+
+fn duration_div(nom: Duration, den: Duration) -> f64 {


How about something like:

fn duration_div(nom: Duration, den: Duration) -> f64 { fn to_nanos(d: Duration) -> u64 { d.as_secs() * 1_000_000_000 + d.subsec_nanos() as u64 } to_nanos(nom) as f64 / to_nanos(den) as f64 }

Awesome! Thank you!

done 689d4ab

eddyb · 2017-07-20T18:43:45Z

src/librustc/util/common.rs

@@ -29,6 +33,59 @@ pub struct ErrorReported;

 thread_local!(static TIME_DEPTH: Cell<usize> = Cell::new(0));

+/// Initialized for -Z profile-queries
+thread_local!(static PROFQ_CHAN: RefCell<Option<Sender<ProfileQueriesMsg>>> = RefCell::new(None));


Do we need a separate thread? Isn't the overhead of synchronization more than that of pushing to a Vec or something?

Good question. The separate thread design was @nikomatsakis's suggestion, so I'll let him respond first.

Ah, that explains a bunch. If I get a satisfactory explanation out of him this may be good to go.

Well, we can measure. Just to be sure we all share the same concerns: I presume you are worried that the overhead of making and sending messages will distort the measurements, right? I admit my inclination is that separate threads are basically useful and good overall, if we can fit them in easily enough. =)

I think the reason I suggested a thread was that, with incremental compilation, we found that when we moved processing of the dependency graph into one thread, there was a significant hit, and that copying data to a separate thread was cheaper. But it's not obviously comparable -- in the incremental case, we did some effort to keep overhead low, such as pushing a series of messages into a vec and then sending the entire vec over to the thread to be processed (and using a double-buffering scheme).

In this case, in contrast, using a separate thread causes some difficulty since our keys contain region pointers and hence have to be converted to strings. And of course it's just using a channel rather than a vector, so there may be some overhead there (I'm not sure).

One question also: how many events would we wind up accumulating in this vector? I could imagine it being quite a few! But I guess we just accumulating in the current code too so that doesn't make much difference. I think I was imagining before that we would be doing some "consolidation" (e.g., summing up data or whatever) in the separate thread.

Regardless, seems like we can measure the "distortion" of a separate thread (versus a vector) just by measuring how much time the "main thread" of compilation takes, right?

TL;DR: I am happy either way really, just want to do the thing that gives us the best results.

Ah, another point --

There are things we'd eventually like to profile that occur before the tcx is created, or after it is destroyed. I suspect that we will eventually refactor all the things that occur before the tcx is created away completely, but I'm not sure so sure about the stuff that comes after it is destroyed -- in particular, I think it'll always be an important win to drop the arenas before we do LLVM translation, to reduce peak memory usage. So if we have the vector containing references into those arenas, that will make it harder to include those LLVM optimizations into our measurements, presumably (but not impossible, I suppose).

eddyb · 2017-07-20T18:48:25Z

src/librustc/ty/maps.rs

@@ -580,10 +598,17 @@ macro_rules! define_maps {
                       key,
                       span);

+                profq_msg!(tcx,
+                    ProfileQueriesMsg::QueryBegin(span.clone(),
+                                                  QueryMsg::$name(format!("{:?}", key))));


I'm not entirely happy with this formatting the key. I would prefer it if we started with just the query type, and not the key. Also, I would definitely not record individual keys except by having a threshold of a minimal duration that someone might be interested in. And there is a way to measure time on the spot here, that could be placed in tcx.maps - basically, adjust the "last query start time" after finishing an inner query so the inner time is not included in the outer one.

Out of curiosity, is your concern here with the time required to format the key, the space required to store it (or time/space to write it out to output files), or all of the above?

The time, formatting in the compiler is not optimized for speed but rather code size (to avoid bloat from the presence of every debug! etc.).

I wonder if we want multiple options. There are various things we might want data for:

Which passes take the most time?

For this, we don't care about cache hits, and we probably don't care about cache keys either.

Which bits of the input took the most time?

For this, we don't care about cache hits, but we do care about cache keys, but maybe only those that took significant time.

An overview of what happened, e.g. because something went wrong with incremental (re-use failed to occur, or re-use occurred improperly) and we'd like to know why

For this, knowing the precise cache keys is very important, and we do care I think about cache hits.

I think this PR started off targeting the last one, and kind of grew to target the others. I suspect we care about all three. We may want to allow for options choosing which one you want, I'm not sure.

Maybe we want to separate the mechanisms for debugging and profiling more -- it would mean losing the ability to answer "which bits of the input took the most time", but it's not 100% clear to me that this is a useful query. It seems useful in the abstract, but I can't remember asking that question very frequently. But that might be because there wasn't an easy way to do so. Certainly it could be useful when you get random perf regressions. I do often want "show me all debug logs pertaining to foo", but that's more a debugging thing.

alexcrichton · 2017-07-27T23:28:21Z

ping @eddyb, @michaelwoerister, @nikomatsakis, I think this may be ready for another round of review?

nikomatsakis · 2017-07-29T17:30:35Z

@matthewhammer and I talked about this today. It seems like the "best approach" is not obvious. The plan then is to land something and open an issue with the various bits of feedback and iterate. That said, I believe @matthewhammer had planned to do the following before landing:

Add a way to disable tracking individual keys and instead just track queries by query name
- this should lower potential overhead from running format!
- for now we figured we'd keep the option to track indnividual keys, but you opt into it
Since this PR now includes the full time-passes information, and hence subsumes that switch, add a way to get the top queries, sorted by total time (or self time?)
Summarize other feedback and open an issue for discussion.

One thing I don't really know is what final set of command-line switches we want for this. I think in my ideal world we would remove -Ztime-passes and instead just have something like:

-Zprofile-queries -- tracks time spent by query kind
-Zprofile-queries-and-keys -- as above, but also includes data about individual keys

Using the second switch would subsume the first. Using either switch would dump information to stderr at the end showing where we spent most of our time.

Before we just remove -Ztime-passes, though -- or modify its output! -- we'll have to update the perf server.

Mark-Simulacrum · 2017-07-29T17:37:00Z

I'm happy to help with the perfect server update, though we'll need to keep both forms of perf data scraping for backwards compatibility reasons.

bors · 2017-08-01T20:02:53Z

☔ The latest upstream changes (presumably #43506) made this pull request unmergeable. Please resolve the merge conflicts.

matthewhammer · 2017-08-02T20:12:17Z

@nikomatsakis I've done what you describe above, that is:

-Zprofile-queries -- tracks time spent by query kind
-Zprofile-queries-and-keys -- as above, but also includes data about individual keys

Further, the output file profile_queries.counts.txt is a CSV file that consists of "provider name", "invocation count", "total time", "self time".

I'll now work on summarizing the discussion above in some detail. In brief, I think the main (unanswered) questions concern the time/space overhead of this tracing thread, and whether we should have a thread or some other place for this global state (a question raised by @eddyb).

Also, I believe that I have resolved the merge issues mentioned just above by bors.

eddyb · 2017-08-02T20:49:20Z

@matthewhammer You should rebase on master to remove the merge commits.

nikomatsakis · 2017-08-04T06:59:31Z

@matthewhammer great! However, the commit history looks really messy. I think it'd be fine to rebase and/or just squash everything together into one commit, this PR is not so big.

eddyb · 2017-08-04T07:36:49Z

Looks like the rebase failed and duplicated master commits. This is bad.
@matthewhammer Please contact me on IRC. If you want I can also attempt fixing the PR myself.

matthewhammer · 2017-08-04T15:47:26Z

Hey @nikomatsakis and @eddyb, I talked to eddyb briefly on IRC about rebasing, but I totally goofed it up, and haven't gotten back to fixing it. Obviously, I need to do some background reading about git before I try this again.

@eddyb Thanks for your offer to help, I think I need it. :)

aidanhs · 2017-08-04T16:46:42Z

@matthewhammer if you just want to rebase and squash all this into a single commit, you can just

make a clean branch of rust off master
curl -sSL https://github.com/rust-lang/rust/pull/43345.diff | git apply

You'll see your changes in git status.

eddyb · 2017-08-04T18:02:16Z

@aidanhs I believe we already dealt with it. git reflog is magic ;)

aidanhs · 2017-08-04T19:30:45Z

@eddyb the merge is clean but I still see a ton of commits on the github UI?

eddyb · 2017-08-04T19:36:46Z

@aidanhs I don't think @matthewhammer had a chance to push the result since we talked on IRC.

matthewhammer · 2017-08-04T22:13:01Z

@nikomatsakis I got some more help from @eddyb today, and fixed up the rebase in this PR
cc @aidanhs

arielb1 · 2017-08-08T09:51:39Z

@matthewhammer : @nikomatsakis is on vacation until Aug 15, and will probably be busy afterwards. Do you want someone else to review this PR, or are you ok with waiting?

matthewhammer · 2017-08-08T14:39:50Z

@arielb1 I'm OK with waiting for @nikomatsakis. Thanks!

bors · 2017-08-10T13:52:33Z

☔ The latest upstream changes (presumably #43522) made this pull request unmergeable. Please resolve the merge conflicts.

aidanhs · 2017-08-16T23:54:44Z

@matthewhammer just a ping on the merge conflicts so this can be reviewed!

matthewhammer · 2017-08-22T18:06:29Z

@aidanhs Thanks! I'll get to this rebase later today.

…t-lang-nursery/rust-forge/blob/master/profile-queries.md

matthewhammer · 2017-08-23T16:37:44Z

@nikomatsakis rebased.
cc @aidanhs

kennytm · 2017-08-23T18:57:44Z

src/librustc/ty/maps.rs

@@ -510,6 +511,28 @@ impl<'tcx> QueryDescription for queries::extern_crate<'tcx> {
 impl<'tcx> QueryDescription for queries::lint_levels<'tcx> {
    fn describe(_tcx: TyCtxt, _: CrateNum) -> String {
        format!("computing the lint levels for items in this crate")
+}


You forgot a }.

[00:07:43] error: this file contains an un-closed delimiter [00:07:43] --> /checkout/src/librustc/ty/maps.rs:1116:3 [00:07:43] | [00:07:43] 1116 | } [00:07:43] | ^ [00:07:43] | [00:07:43] help: did you mean to close this delimiter? [00:07:43] --> /checkout/src/librustc/ty/maps.rs:511:60 [00:07:43] | [00:07:43] 511 | impl<'tcx> QueryDescription for queries::lint_levels<'tcx> { [00:07:43] | ^

…s string option

nikomatsakis · 2017-08-24T15:19:52Z

@bors r+

bors · 2017-08-24T15:19:53Z

📌 Commit 43335ae has been approved by nikomatsakis

nikomatsakis · 2017-08-24T15:20:02Z

(The travis failure looks spurious.)

bors · 2017-08-24T16:04:58Z

⌛ Testing commit 43335ae with merge 2c0558f...

@nikomatsakis

Profile queries This PR implements the "profile queries" debugging feature described here: https://github.com/rust-lang-nursery/rust-forge/blob/master/profile-queries.md In particular, it implements the debugging flag `-Z profile-queries` FYI: This PR is my second attempt at pushing these changes. My original PR required a rebase; I have now done that rebase manually, after messing up with git's "interactive" rebase support. The original (now closed/cancelled) PR is this one: #43156 r? @nikomatsakis

bors · 2017-08-24T19:09:11Z

☀️ Test successful - status-appveyor, status-travis
Approved by: nikomatsakis
Pushing 2c0558f to master...

rust-highfive assigned alexcrichton Jul 19, 2017

kennytm reviewed Jul 19, 2017

View reviewed changes

rust-highfive assigned nikomatsakis and unassigned alexcrichton Jul 19, 2017

aidanhs added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 20, 2017

michaelwoerister reviewed Jul 20, 2017

View reviewed changes

eddyb reviewed Jul 20, 2017

View reviewed changes

This was referenced Jul 21, 2017

Profile queries #43156

Closed

Update profile queries documentation rust-lang/rust-forge#82

Merged

eddyb mentioned this pull request Jul 27, 2017

Borrow checking time regression from stable to nightly #43018

Closed

matthewhammer force-pushed the master branch 2 times, most recently from 825166b to 0424a98 Compare August 4, 2017 20:41

lqd mentioned this pull request Aug 11, 2017

Possibly use a known profiling format for timings ? #43804

Closed

aidanhs added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 16, 2017

Matthew Hammer added 4 commits August 23, 2017 09:26

inc comp: -Z profile-queries support; see also https://github.com/rus…

d7ec3e1

…t-lang-nursery/rust-forge/blob/master/profile-queries.md

profiling with -Z profile-queries recognizes -Z time-passes

28cb03d

-Z profile-queries: remove panic when channel is unset

3c24fea

-Z profile-queries includes dep_graph.with_task uses in output

4251032

matthewhammer force-pushed the master branch from 0424a98 to 5dfe98b Compare August 23, 2017 16:32

kennytm reviewed Aug 23, 2017

View reviewed changes

-Z profile-query-and-key, separate from -Z profile-query; query key i…

43335ae

…s string option

matthewhammer force-pushed the master branch from 5dfe98b to 43335ae Compare August 23, 2017 21:22

bors merged commit 43335ae into rust-lang:master Aug 24, 2017

bors mentioned this pull request Aug 24, 2017

rustc: Capture diagnostics from all queries #44046

Merged

alexcrichton mentioned this pull request Sep 6, 2017

Re-add time-passes to the collected data rust-lang/rustc-perf#147

Closed

		@@ -0,0 +1,167 @@
		// Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT

Profile queries #43345

Profile queries #43345

Conversation

matthewhammer commented Jul 19, 2017 • edited Loading

rust-highfive commented Jul 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Jul 19, 2017

aidanhs commented Jul 20, 2017

michaelwoerister commented Jul 20, 2017

Choose a reason for hiding this comment

matthewhammer Jul 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eddyb Jul 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Jul 27, 2017

nikomatsakis commented Jul 29, 2017

Mark-Simulacrum commented Jul 29, 2017

bors commented Aug 1, 2017

matthewhammer commented Aug 2, 2017 • edited Loading

eddyb commented Aug 2, 2017

nikomatsakis commented Aug 4, 2017

eddyb commented Aug 4, 2017

matthewhammer commented Aug 4, 2017

aidanhs commented Aug 4, 2017

eddyb commented Aug 4, 2017

aidanhs commented Aug 4, 2017

eddyb commented Aug 4, 2017

matthewhammer commented Aug 4, 2017

arielb1 commented Aug 8, 2017 • edited Loading

matthewhammer commented Aug 8, 2017

bors commented Aug 10, 2017

aidanhs commented Aug 16, 2017

matthewhammer commented Aug 22, 2017

matthewhammer commented Aug 23, 2017

Choose a reason for hiding this comment

nikomatsakis commented Aug 24, 2017

bors commented Aug 24, 2017

nikomatsakis commented Aug 24, 2017

bors commented Aug 24, 2017

bors commented Aug 24, 2017

matthewhammer commented Jul 19, 2017 •

edited

Loading

matthewhammer Jul 20, 2017 •

edited

Loading

eddyb Jul 20, 2017 •

edited

Loading

matthewhammer commented Aug 2, 2017 •

edited

Loading

arielb1 commented Aug 8, 2017 •

edited

Loading