You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* QA-273: Feature/python test runner (#401)
Co-authored-by: Jan <[email protected]>
Co-authored-by: Vadim <[email protected]>
* Feature/python test runner (#417)
* start implementing a python launch controller
* make it work for the first time
* try launching outside of oskar.
* no more pipes needed
* adjust report directory
* fix paths, thread naming.
* fallback if no env is configured
* lint
* more work on cluster etc
* silence, proper error message for missing variable
* convert params
* lint
* fix slot
* fix arangosh.conf, launching of subsequent testruns
* try to launch it from fish
* implement 7zip
* add modules to the docker container
* more printing
* fix handling
* Add pip3
* Fix typo
* Typo 2
* handle INNERWORKDIR
* fix missing line break
* export settings
* fix typo
* on windows skip !windows tests
* lint, refactor, simplify
* install 7z
* export core directory
* work on fish integration
* similarize for new python job scheduler
* work on reprot generating
* try to implement timeout
* also upload 7z and txt
* also upload 7z and txt
* fix deadline
* fix workspace handling
* fix temporary directory handling
* make sure out temp directory exists
* RTFM fail
* don't put it to the workspace
* implement gtest invoking
* cleanup
* sort, lint
* prefer INNERWORKDIR
* implement writing test.log
* implement html report
* bring back function deletet to early
* install the windows boomerang handler on top level
* fix include
* fix reference
* print before killing shit
* work on timeout
* finish deadline handling, rename script
* fix exit code handling
* lint
* thanks @mpoeter for ps aid
* make the thread identifier the test plus a growing number
* implement central final deadline, which will kick in after 2 minutes
* remove debug output
* use /usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin to locate python
* wintendo next try
* wintendo next try
* wintendo go home
* fix calculation of hard time limit
* make sure nobody changes the exit code to good
* add monkey patches
* cleanup deadline
* ignore exceptions if no process is there
* deadline handling: prioritize incomming lines over timeout counting
* fix directory handling
* work on result presentation
* cleanup
* let the file remain open for further info
* fix environment variable handling
* documentation
* fix port handling
* work on deadline
* fix hard deadline handling
* make it 20s
* need more time
* list processes so we may guess whats actually going on
* kill all, then waitpid all
* make threads provide half a slot.
* be sure to catch
* resume just in case, then kill
* resume just in case, then kill
* ignore resume errors
* increase volume
* lint
* lint
* catch more
* add multipliers
* more load, print load avg
* fix sorting by prio - biggest values first
* cleanup crash report for size
* if test indicates its been crashing create report as well.
* more threat to the machine.
* timeout
* fix typo
* delete tzdata subdir first
* use load and sockets for throttle control
* install required python libs
* only see for load [0, 1]
* increase container version
* anounce deadline at start
* don't print to logfile
* give better feedback if arangosh fails to launch in first place, thangs @maierlars for bringing up the topic
* Update helper.linux.fish
* tschuess ruby
* re-sync to be stock RTA
* fix container numbers, adjust #3
* sync to rta
* resync
* this is not needed anymore
* add --fix-missing
* fresh python?
* revert to tar.gz
* chaos tests in nightlies demand for longer timeouts, since tests run longer.
* Update README.md
Co-authored-by: Jan <[email protected]>
* Update README.md
Co-authored-by: Jan <[email protected]>
* Update README.md
Co-authored-by: Jan <[email protected]>
* Update README.md
Co-authored-by: Jan <[email protected]>
* Update README.md
Co-authored-by: Jan <[email protected]>
* remove more old stuff
* ignore encoding errors
* increase timeout to hard self kill
* switch to one environment variable name
* env
* limit the amount of coredumps
* ignore access denied to open sockets
* if we need to wait for the system to cool down on start...
* make sure we don't come back good if nothing launched at all
* them tiny boxes need more time
* need more time
* add deadline status to testfailurs.txt
* need more time
* beautify testfailures.txt
* give machine estimate reasons at the start of the run
* case may matter
* one more environment variable
Co-authored-by: Vadim <[email protected]>
Co-authored-by: Jan <[email protected]>
* Feature/python test runner (#418)
* start implementing a python launch controller
* make it work for the first time
* try launching outside of oskar.
* no more pipes needed
* adjust report directory
* fix paths, thread naming.
* fallback if no env is configured
* lint
* more work on cluster etc
* silence, proper error message for missing variable
* convert params
* lint
* fix slot
* fix arangosh.conf, launching of subsequent testruns
* try to launch it from fish
* implement 7zip
* add modules to the docker container
* more printing
* fix handling
* Add pip3
* Fix typo
* Typo 2
* handle INNERWORKDIR
* fix missing line break
* export settings
* fix typo
* on windows skip !windows tests
* lint, refactor, simplify
* install 7z
* export core directory
* work on fish integration
* similarize for new python job scheduler
* work on reprot generating
* try to implement timeout
* also upload 7z and txt
* also upload 7z and txt
* fix deadline
* fix workspace handling
* fix temporary directory handling
* make sure out temp directory exists
* RTFM fail
* don't put it to the workspace
* implement gtest invoking
* cleanup
* sort, lint
* prefer INNERWORKDIR
* implement writing test.log
* implement html report
* bring back function deletet to early
* install the windows boomerang handler on top level
* fix include
* fix reference
* print before killing shit
* work on timeout
* finish deadline handling, rename script
* fix exit code handling
* lint
* thanks @mpoeter for ps aid
* make the thread identifier the test plus a growing number
* implement central final deadline, which will kick in after 2 minutes
* remove debug output
* use /usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin to locate python
* wintendo next try
* wintendo next try
* wintendo go home
* fix calculation of hard time limit
* make sure nobody changes the exit code to good
* add monkey patches
* cleanup deadline
* ignore exceptions if no process is there
* deadline handling: prioritize incomming lines over timeout counting
* fix directory handling
* work on result presentation
* cleanup
* let the file remain open for further info
* fix environment variable handling
* documentation
* fix port handling
* work on deadline
* fix hard deadline handling
* make it 20s
* need more time
* list processes so we may guess whats actually going on
* kill all, then waitpid all
* make threads provide half a slot.
* be sure to catch
* resume just in case, then kill
* resume just in case, then kill
* ignore resume errors
* increase volume
* lint
* lint
* catch more
* add multipliers
* more load, print load avg
* fix sorting by prio - biggest values first
* cleanup crash report for size
* if test indicates its been crashing create report as well.
* more threat to the machine.
* timeout
* fix typo
* delete tzdata subdir first
* use load and sockets for throttle control
* install required python libs
* only see for load [0, 1]
* increase container version
* anounce deadline at start
* don't print to logfile
* give better feedback if arangosh fails to launch in first place, thangs @maierlars for bringing up the topic
* Update helper.linux.fish
* tschuess ruby
* re-sync to be stock RTA
* fix container numbers, adjust #3
* sync to rta
* resync
* this is not needed anymore
* add --fix-missing
* fresh python?
* revert to tar.gz
* chaos tests in nightlies demand for longer timeouts, since tests run longer.
* Update README.md
Co-authored-by: Jan <[email protected]>
* Update README.md
Co-authored-by: Jan <[email protected]>
* Update README.md
Co-authored-by: Jan <[email protected]>
* Update README.md
Co-authored-by: Jan <[email protected]>
* Update README.md
Co-authored-by: Jan <[email protected]>
* remove more old stuff
* ignore encoding errors
* increase timeout to hard self kill
* switch to one environment variable name
* env
* limit the amount of coredumps
* ignore access denied to open sockets
* if we need to wait for the system to cool down on start...
* make sure we don't come back good if nothing launched at all
* them tiny boxes need more time
* need more time
* add deadline status to testfailurs.txt
* need more time
* beautify testfailures.txt
* give machine estimate reasons at the start of the run
* case may matter
* one more environment variable
* anounce test directory
* switch sequence, print first
* one more var exported
Co-authored-by: Vadim <[email protected]>
Co-authored-by: Jan <[email protected]>
* Fixed 7z and Signing
Added fixes for signing and 7z
* Feature/python test runner (#419)
* start implementing a python launch controller
* make it work for the first time
* try launching outside of oskar.
* no more pipes needed
* adjust report directory
* fix paths, thread naming.
* fallback if no env is configured
* lint
* more work on cluster etc
* silence, proper error message for missing variable
* convert params
* lint
* fix slot
* fix arangosh.conf, launching of subsequent testruns
* try to launch it from fish
* implement 7zip
* add modules to the docker container
* more printing
* fix handling
* Add pip3
* Fix typo
* Typo 2
* handle INNERWORKDIR
* fix missing line break
* export settings
* fix typo
* on windows skip !windows tests
* lint, refactor, simplify
* install 7z
* export core directory
* work on fish integration
* similarize for new python job scheduler
* work on reprot generating
* try to implement timeout
* also upload 7z and txt
* also upload 7z and txt
* fix deadline
* fix workspace handling
* fix temporary directory handling
* make sure out temp directory exists
* RTFM fail
* don't put it to the workspace
* implement gtest invoking
* cleanup
* sort, lint
* prefer INNERWORKDIR
* implement writing test.log
* implement html report
* bring back function deletet to early
* install the windows boomerang handler on top level
* fix include
* fix reference
* print before killing shit
* work on timeout
* finish deadline handling, rename script
* fix exit code handling
* lint
* thanks @mpoeter for ps aid
* make the thread identifier the test plus a growing number
* implement central final deadline, which will kick in after 2 minutes
* remove debug output
* use /usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin to locate python
* wintendo next try
* wintendo next try
* wintendo go home
* fix calculation of hard time limit
* make sure nobody changes the exit code to good
* add monkey patches
* cleanup deadline
* ignore exceptions if no process is there
* deadline handling: prioritize incomming lines over timeout counting
* fix directory handling
* work on result presentation
* cleanup
* let the file remain open for further info
* fix environment variable handling
* documentation
* fix port handling
* work on deadline
* fix hard deadline handling
* make it 20s
* need more time
* list processes so we may guess whats actually going on
* kill all, then waitpid all
* make threads provide half a slot.
* be sure to catch
* resume just in case, then kill
* resume just in case, then kill
* ignore resume errors
* increase volume
* lint
* lint
* catch more
* add multipliers
* more load, print load avg
* fix sorting by prio - biggest values first
* cleanup crash report for size
* if test indicates its been crashing create report as well.
* more threat to the machine.
* timeout
* fix typo
* delete tzdata subdir first
* use load and sockets for throttle control
* install required python libs
* only see for load [0, 1]
* increase container version
* anounce deadline at start
* don't print to logfile
* give better feedback if arangosh fails to launch in first place, thangs @maierlars for bringing up the topic
* Update helper.linux.fish
* tschuess ruby
* re-sync to be stock RTA
* fix container numbers, adjust #3
* sync to rta
* resync
* this is not needed anymore
* add --fix-missing
* fresh python?
* revert to tar.gz
* chaos tests in nightlies demand for longer timeouts, since tests run longer.
* Update README.md
Co-authored-by: Jan <[email protected]>
* Update README.md
Co-authored-by: Jan <[email protected]>
* Update README.md
Co-authored-by: Jan <[email protected]>
* Update README.md
Co-authored-by: Jan <[email protected]>
* Update README.md
Co-authored-by: Jan <[email protected]>
* remove more old stuff
* ignore encoding errors
* increase timeout to hard self kill
* switch to one environment variable name
* env
* limit the amount of coredumps
* ignore access denied to open sockets
* if we need to wait for the system to cool down on start...
* make sure we don't come back good if nothing launched at all
* them tiny boxes need more time
* need more time
* add deadline status to testfailurs.txt
* need more time
* beautify testfailures.txt
* give machine estimate reasons at the start of the run
* case may matter
* one more environment variable
* anounce test directory
* switch sequence, print first
* one more var exported
* add disk i/o to the output
* better work with M1 performance cores
* print other sequence; enable more load[1]
* more threads doesn't cut it
* print platform
* precise M1 detection
* two places on mac to collect cores
* properly append
* fix default directory
* use iso-ish datetime format for filenames
Co-authored-by: Vadim <[email protected]>
Co-authored-by: Jan <[email protected]>
Co-authored-by: Markus Pfeiffer <[email protected]>
Co-authored-by: Jan <[email protected]>
Co-authored-by: Vadim <[email protected]>
Co-authored-by: Sven Luschgy <[email protected]>
Co-authored-by: Markus Pfeiffer <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+94
Original file line number
Diff line number
Diff line change
@@ -234,6 +234,100 @@ if supported, set number of concurrent builds to `PARALLELISM`
234
234
235
235
## Testing
236
236
237
+
`jenkins/helper/test_launch_controller.py` is used to control multiple test executions.
238
+
239
+
### Its dependencies over stock python3 are:
240
+
- psutil to control subprocesses
241
+
- py7zr (optional) to build 7z reports instead of tar.bz2
242
+
243
+
### It's reading these environment variables:
244
+
-`INNERWORKDIR` - as the directory to place the report files
245
+
-`WORKDIR` - used instead if `INNERWORKDIR` hasn't been set.
246
+
-`TEMP` - temporary directory if not `INNERWORKDIR`/ArangoDB
247
+
-`TMPDIR` and `TEMP` are passed to the executors.
248
+
-`TSHARK` passed as value to `--sniffProgram`
249
+
-`DUMPDEVICE` passed as value to `--sniffDevice`
250
+
-`SKIPNONDETERMINISTIC` passed on as value to `--skipNondeterministic` to the testing.
251
+
-`SKIPTIMECRITICAL` passed on as value to `--skipTimeCritical` to the testing.
252
+
-`BUILDMODE` passed on as value to `--buildType` to the testing.
253
+
-`DUMPAGENCYONERROR` passed on as value to `--dumpAgencyOnError` to the testing.
254
+
-`PORTBASE` passed on as value to `--minPort` and `--maxPort` (+99) to the testing. Defaults to 7000
255
+
-`SKIPGREY` passed on as value to `--skipGrey` to the testing.
256
+
-`ONLYGREY` passed on as value to `--onlyGrey` to the testing.
257
+
-`TIMELIMIT` is used to calculate the execution deadline starting point in time.
258
+
-`COREDIR` the directory to locate coredumps for crashes
259
+
-`LDAPHOST` to enable the tests with `ldap` flags.
260
+
- any parameter in `test-definition.txt` that starts with a `$` is expanded to its value.
261
+
262
+
### Its Parameters are:
263
+
-`PATH/test-definition.txt` - (first parameter) test definitions file from the arangodb source tree
264
+
(also used to locate the arangodb source)
265
+
-`-f``[launch|dump]` use `dump` for syntax checking of `test-definition.txt` instead of executing the tests
266
+
-`--validate-only` don't run the tests
267
+
-`--help-flags` list the flags which can be used in `test-definition.txt`:
268
+
-`cluster`: this test requires a cluster
269
+
-`single`: this test requires a single server
270
+
-`full`: this test is only executed in full tests
271
+
-`!full`: this test is only executed in non-full tests
272
+
-`gtest`: only the gtests are to be executed
273
+
-`ldap`: ldap
274
+
-`enterprise`: this test is only executed with the enterprise version
275
+
-`!windows`: test is excluded from ps1 output
276
+
-`--cluster` filter `test-definition.txt` for all tests flagged as `cluster`
277
+
-`--full` - all tests including those flagged as `full` are executed.
278
+
-`--gtest` - only gtests are executed
279
+
-`--all` - output unfiltered
280
+
281
+
### Syntax in `test-definition.txt`
282
+
Lines consist of these parts:
283
+
```
284
+
testingJsSuiteName flags params suffix -- args to testing.js
285
+
```
286
+
where
287
+
-`flags` are listed above in `--help-flags`
288
+
- params are:
289
+
- weight - sequence priority of test, 250 is the default.
290
+
- wweight - execution slots to book. defaults to 1, if cluster 4.
291
+
- buckets - split testcases to be launched in concurent chunks
292
+
Specifying a `*` in front of the number takes the default and multiplies it by the value.
293
+
- suffix - if a testsuite is launched several times, make it distinguishable
294
+
like shell_aql => shell_aql_vst ; Bucket indexes are appended afterwards.
295
+
-`--` literally the two dashes to split the line at.
296
+
-`args to testing.js` - anything that `./scripts/unittest --help` would print you.
297
+
298
+
### job scheduling
299
+
To utilize all of the machines resources, tests can be run in parallel. The `execution_slots` are
300
+
set to the number of the physical cores of the machine (not threads).
301
+
`wweight` is used to add the currently expected load by the tests to be no more than `execution_slots`.
302
+
303
+
For managing each of these parallel executions of testing.js, worker threads are used. The workers
304
+
themselves will spawn a set of I/O threads to capture the output of testing.js into a report file.
305
+
306
+
The life cycle of a testrun will be as follows:
307
+
308
+
- the environment variable `TIMELIMIT` defines a *deadline* to all the tests, how much seconds should be allowed.
309
+
- tests are running in worker threads.
310
+
- main thread keeps control, launches more worker threads, once machine bandwith permits, but only every 5s as closest to not overwhelm the machine while launching arangods.
311
+
- tests themselves have their timeouts; `testing.js` will abort if they are reached.
312
+
- workers have a progressive timeout, if it doesn't hear back from `testing.js` for 999999999s it will hard kill and abort. [currently high / not used!]
313
+
- if workers have no output from `testing.js` they check whether the *deadline* is reached.
314
+
- if the *deadline* is reached, `SIG_INT`[* nix] / `SIG_BREAK`[windows] is sent to `testing.js` to trigger its *deadline* feature.
315
+
- the reached *deadline* will be indicated to the `testfailures.txt` report file and the logfile of the test in question.
316
+
- with *deadline* engageged, `testing.js` can send no more subsequent requests, nor spawn processes => eventually testing will abort.
317
+
- force shutdown of arangod Instances will reset the deadline, SIG_ABRT arangods, and try to do core dump analysis.
318
+
- workers continue reading pipes from `testing.js`, but once no chars are comming, `waitpid()` checks with a 1s timout whether `testing.js` is done and exited.
319
+
- if the worker reaches `180` counters of `waitpid()` invocations it will give up. It will hard kill `testing.js` and all other child processes it can find.
320
+
- this should unblock the workers I/O threads, and they should exit.
321
+
- the `waitpid()` on `testing.js` should exit, I/O threads should be joined, results should be passed up to the main thread.
322
+
- so the workers still have a slugish interpretation of the *deadline*, giving them the chance to collect as much knowledge about the test execution as posible.
323
+
- meanwhile the main thread has a *fixed* deadline: 5 minutes after the `TIMELIMIT` is reached.
324
+
- if not all workers have indicated their exit before this final deadline:
325
+
- the main thread will start killing any subprocesses of itself which it finds.
326
+
- after this wait another 20s, to see whether the workers may have been unblocked by the killing
327
+
- if not, it shouts "Geronimoooo" and takes the big shotgun, and force-terminates the python process which is running it. This will kill all threads as well and terminate the process.
328
+
- if all workers have indicated their exit in time, their threads will be joined.
0 commit comments