Skip to content
This repository was archived by the owner on Dec 29, 2022. It is now read-only.

Running tests in parallel #524

Closed
DSpeckhals opened this issue Oct 17, 2017 · 9 comments
Closed

Running tests in parallel #524

DSpeckhals opened this issue Oct 17, 2017 · 9 comments
Assignees
Milestone

Comments

@DSpeckhals
Copy link
Contributor

When running the RLS test suite in the rust-lang/rust submodule for rls (./x.py test src/tools/rls), tests are run in parallel by default. However, failures almost always occur with recent RLS commits (e.g. 96d7461).

The failures happen in either test_bin_lib_project or test_bin_lib_project_no_cfg_test (it changes across test runs). Sometimes one of the tests lags until it times out, and other times the wrong configuration is set up in the environment for test_data/bin_lib (messages sent out of order, or the wrong message).

I'm unable to reproduce the failures by running cargo test specifically for the RLS; it's only when testing the submodule in rustc. I can make the tests pass, though, if I run ./x.py test src/tools/rls -j 1 (only one test at a time), which indicates even more that this is a issue with separate threads using the same test data directory.

@nrc
Copy link
Member

nrc commented Oct 20, 2017

This has happened in the past. The usual cause is either Cargo overwriting data (i.e., using the same target directory for multiple tests) or environment variable overlap (i.e., one test sets an env var to one thing and another test sets it to something else). I suspect the current case is one or the other. Whether it happens with cargo test or only in rustc is probably a timing issue. It might be possible to repro by changing the timeouts (either in the tests or the RLS itself).

cc @Xanewok - if you have some time to investigate this, that would be awesome.

cc @alexcrichton - while this problem exists, we should probably not update the RLS in the Rust repo, or if we do turn off tests.

@nrc
Copy link
Member

nrc commented Oct 30, 2017

Is this fixed by #532 ?

@nrc nrc added the testing label Oct 30, 2017
@nrc nrc added this to the 1.0 milestone Oct 30, 2017
@DSpeckhals
Copy link
Contributor Author

I just commented out the code in that PR as a temporary fix. I think we should keep this open to keep track of the underlying problem.

@Xanewok Xanewok self-assigned this Oct 30, 2017
@DSpeckhals
Copy link
Contributor Author

I was just looking at how Cargo tests projects, and was curious as to whether this would be a reasonable idea for the RLS: https://github.com/rust-lang/cargo/blob/d2d7b3008220d37d6960368954c274d826ca9437/tests/workspaces.rs#L15-L34

They essentially have a "project builder" that lays out whole projects in code, build the directory and file tree, then tests on it. The RLS has the test data in actual files in the repo, which either leads to duplication of test data or tests stomping on one another.

Does the Cargo approach seem reasonable for the RLS? If it does, I think we could incrementally move tests to the new format, perhaps starting with the ones that are giving us problems here. Or does this seem like an overkill?

@Xanewok
Copy link
Member

Xanewok commented Nov 1, 2017

So right now the biggest problem with parallel tests is that running a single integration test requires an exclusive access to the process environment, because we run cargo and rustc in-process (sometimes even nested) rely a lot on environment variables and that these won't change externally during their execution.

This often causes timeouts, since sometimes 4 or $RUST_TEST_THREADS tests are scheduled, while in practice they execute almost sequentially, because of the env locking back and forth, while the timeout timer is ticking during which a test waits to acquire the environment lock.

To solve the timeout issue I think we'll have to spawn each test rls process separately for each test or at least run a certain group of test sequentially.

Concurrent access to the same test_data directory should not be a problem now, since iirc we also use different cargo target-dir for rls artifacts.

@DSpeckhals yep, that might be a possible approach to consider.
While I agree it might be convenient to have a builder that creates a test directory structure for a given test, one thing to keep in mind is that recreating many files at once might incur some I/O overhead and interrupting the testing routine might mean that some generated projects might not be cleaned up and left there. Nothing blocking, though.

@Xanewok
Copy link
Member

Xanewok commented Nov 1, 2017

To ensure no problems on the Rust CI front, for now I'd move all the integration tests to use Cargo integration test format or at least make sure these run sequentially (e.g. with Mutex trick). Using the Cargo format would require splitting RLS into bin+lib, but will also guarantee that these will be run in separate processes.

Right now we have very simple projects, so the analysis per each test is somewhat fast and I believe Rust and RLS CI can afford running these ~4-5 seconds longer in total (measured on both my desktop and laptop with RUST_TEST_THREADS=1) if we'll have consistent runs and all our tests enabled with that.

With that, we can then further explore parallelizing those with a more custom harness or similar.
@nrc what do you think?

@nrc
Copy link
Member

nrc commented Nov 2, 2017

Using a mutex is probably the easiest path going forward. I'm a bit worried because bugs that can cause parallel test issues in the past have been real bugs. But it sounds like this is probably a timeout issue.

@Xanewok
Copy link
Member

Xanewok commented Jan 12, 2018

#607 allows us to write integration tests that are executed in parallel and each runs its own rls binary. Currently there are 2 integration tests and what's left to do is to move all our integration tests to the new test harness.

@nrc
Copy link
Member

nrc commented Mar 22, 2018

This works now.

@nrc nrc closed this as completed Mar 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants