Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing data: filling in #249

Closed
Mark-Simulacrum opened this issue Jun 26, 2018 · 3 comments
Closed

Missing data: filling in #249

Mark-Simulacrum opened this issue Jun 26, 2018 · 3 comments

Comments

@Mark-Simulacrum
Copy link
Member

@nnethercote left an excellent comment on the rustc tracking issue for compiler performance: rust-lang/rust#48750 (comment).

This issue tracks the implementation of the data fill-in described there.

  • The failure on run 5 occurred in the middle of the series. Use average of the values for runs 4 and 6 for run 5 of benchmark B.
  • The failure on run 1 occurred at the start of the series. Use the value from run 2 for run 1 of benchmark B.
  • The failures on runs 9 and 10 occurred at the end of the series. Use the value from run 8 for runs 9 and 10 of benchmark B.

These are the cases we need to support.

The data structure containing all data is currently a BTreeMap<Commit, CommitData>. The rustc-perf server is not aware of what commits exist (i.e., in rust-lang/rust master).

I make the contention that if we are missing data for a commit then we should not attempt to fill that data in, at least not yet. It will not affect graphs/summaries since no code will see that data. This also simplifies implementation.

In order to determine whether data needs to be "filled in" I think we'll want to be able to find the "previous" and "next" nodes for every commit. I don't believe the API exposed by BTreeMap trivially allows this for an arbitrary node -- at least not for our keys. As such, we'll want to have a separate list of keys (probably Vec<Commit>, maybe Vec<CommitHash>) and a HashMap<Commit, usize> to find ourselves in that vector. We can then find the previous/next commits.

This index should be populated when reloading data, not dynamically created on each lookup.

I believe that mostly settles the implementation concerns here; if a better way exists, I have not thought of it just now, but I'm of course open to it.

@nnethercote
Copy link
Contributor

On a related note: it's not obvious when a benchmark stops compiling. E.g. I just noticed that tokio-webpush-simple has been busted since June 9. If we could have some visual representation of failures on the per-benchmark graphs (interpolated dots in red, maybe?) that would make diagnosis much quicker.

(I'm mentioning that here because it seems related to this issue. But perhaps it should be filed as a separate issue.)

@nnethercote
Copy link
Contributor

This is relevant for the dashboard view, too. I was puzzled why the dashboard was showing regressions with recent versions. sentry-cli was added in late May, but its runtime is actually very close to the average runtime, so that probably didn't have much effect. But the omission of tokio-webpush-simple, whose runtime is substantially smaller than the average, would push the average up by something like 7%.

@nnethercote
Copy link
Contributor

nnethercote commented Jun 28, 2018

I was puzzled why the dashboard was showing regressions with recent versions.

After more investigation: most of the versions have measurements for 11 benchmarks, but "master" currently only has 9 because futures and tokio-webpush-simple aren't building. They are both substantially shorter-running than average, so the average gets pushed up to 5.1; it would be 4.3 if the missing values were interpolated from prior versions. (In contrast, if style-servo stopped building, the average would artificially plummet because it accounts for about 60% of the combined runtime.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants