Skip to content

QA-273: Feature/python test runner #401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 140 commits into from
Aug 23, 2022
Merged

QA-273: Feature/python test runner #401

merged 140 commits into from
Aug 23, 2022

Conversation

dothebart
Copy link
Contributor

@dothebart dothebart commented Jul 5, 2022

replace fish and ps test launching facilities plus report generators for one python implementation

  • the environment variable TESTSUITE_TIMEOUT defines a deadline to the tests, how much seconds should be allowed.
  • tests are running in worker threads.
  • main thread keeps control, launches more worker threads, once machine bandwith permits, but only every 5s as closest.
  • tests themselves have their timeouts; testing.js will abort if they are reached.
  • workers have a progressive timeout, if it doesn't hear back from testing.js for 999999999s it will hard kill and abort.
  • if workers have no output from testing.js they check whether the deadline is reached.
  • if the deadline is reached, SIG_INT[*nix] / SIG_BREAK[windows] is sent to testing.js to trigger its deadline feature.
  • the reached deadline will be indicated to testfailures.txt and the logfile of this test.
  • with deadline engageged, testing.js can send no more subsequent requests, nor spawn processes => eventually testing will abort.
  • force shutdown of instances will reset the deadline, SIG_ABRT arangods, and try to do core dump analysis.
  • workers continue reading pipes from testing.js, but once no chars are comming, waitpid() checks with a 1s timout whether testing.js is done.
  • if the worker reaches 180 counters of waitpid() it will give up. It will hard kill testing.js and all other child processes.
  • this should unblock the workers STDOUT/STDERR threads, and they should exit.
  • the waitpid() on testing.js should exit, I/O threads should be joined, results should be passed up to the main thread.
  • so the workers still have a slugish interpretation of the deadline, giving them the chance to collect as much knowledge as posible.
  • meanwhile the main thread has a fixed deadline: 5 minutes after the TESTSUITE_TIMEOUT is reached.
  • if not all workers have indicated their exit before this final deadline:
  • the main thread will start killing any subprocesses of itself which it finds.
  • after this wait another 20s, to see whether the workers may have been unblocked by the killing
  • if not, it shouts "Geronimoooo" and takes the big shotgun, and force-terminates the python process which is running it. This will kill all threads as well and terminate the process.
  • if all workers have indicated their exited in time, their threads will be joined.
  • reports will be generated.

@dothebart dothebart marked this pull request as draft July 5, 2022 14:53
@dothebart dothebart marked this pull request as ready for review July 13, 2022 11:58
@dothebart dothebart requested review from fceller and KVS85 July 13, 2022 11:58
set s $status
set s (math $s + (getSanStatus))
else
runInContainer --cap-add SYS_NICE (findBuildImage) $SCRIPTSDIR/runTests.fish $argv
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the moment this is required so we may spawn threads for containers in later ubuntus.

@dothebart dothebart changed the title Feature/python test runner QA-273: Feature/python test runner Jul 18, 2022
Copy link
Contributor

@jsteemann jsteemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-Python files LGTM.
I am not in a good position to review the Python code in this PR though.

@dothebart dothebart changed the base branch from master to main August 22, 2022 15:14
@jsteemann jsteemann merged commit 4c1b2a5 into main Aug 23, 2022
@jsteemann jsteemann deleted the feature/python_test_runner branch August 23, 2022 08:25
KVS85 added a commit that referenced this pull request Sep 8, 2022
* QA-273: Feature/python test runner (#401)

Co-authored-by: Jan <[email protected]>
Co-authored-by: Vadim <[email protected]>

* Feature/python test runner (#417)

* start implementing a python launch controller

* make it work for the first time

* try launching outside of oskar.

* no more pipes needed

* adjust report directory

* fix paths, thread naming.

* fallback if no env is configured

* lint

* more work on cluster etc

* silence, proper error message for missing variable

* convert params

* lint

* fix slot

* fix arangosh.conf, launching of subsequent testruns

* try to launch it from fish

* implement 7zip

* add modules to the docker container

* more printing

* fix handling

* Add pip3

* Fix typo

* Typo 2

* handle INNERWORKDIR

* fix missing line break

* export settings

* fix typo

* on windows skip !windows tests

* lint, refactor, simplify

* install 7z

* export core directory

* work on fish integration

* similarize for new python job scheduler

* work on reprot generating

* try to implement timeout

* also upload 7z and txt

* also upload 7z and txt

* fix deadline

* fix workspace handling

* fix temporary directory handling

* make sure out temp directory exists

* RTFM fail

* don't put it to the workspace

* implement gtest invoking

* cleanup

* sort, lint

* prefer INNERWORKDIR

* implement writing test.log

* implement html report

* bring back function deletet to early

* install the windows boomerang handler on top level

* fix include

* fix reference

* print before killing shit

* work on timeout

* finish deadline handling, rename script

* fix exit code handling

* lint

* thanks @mpoeter for ps aid

* make the thread identifier the test plus a growing number

* implement central final deadline, which will kick in after 2 minutes

* remove debug output

* use /usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin to locate python

* wintendo next try

* wintendo next try

* wintendo go home

* fix calculation of hard time limit

* make sure nobody changes the exit code to good

* add monkey patches

* cleanup deadline

* ignore exceptions if no process is there

* deadline handling: prioritize incomming lines over timeout counting

* fix directory handling

* work on result presentation

* cleanup

* let the file remain open for further info

* fix environment variable handling

* documentation

* fix port handling

* work on deadline

* fix hard deadline handling

* make it 20s

* need more time

* list processes so we may guess whats actually going on

* kill all, then waitpid all

* make threads provide half a slot.

* be sure to catch

* resume just in case, then kill

* resume just in case, then kill

* ignore resume errors

* increase volume

* lint

* lint

* catch more

* add multipliers

* more load, print load avg

* fix sorting by prio - biggest values first

* cleanup crash report for size

* if test indicates its been crashing create report as well.

* more threat to the machine.

* timeout

* fix typo

* delete tzdata subdir first

* use load and sockets for throttle control

* install required python libs

* only see for load [0, 1]

* increase container version

* anounce deadline at start

* don't print to logfile

* give better feedback if arangosh fails to launch in first place, thangs @maierlars for bringing up the topic

* Update helper.linux.fish

* tschuess ruby

* re-sync to be stock RTA

* fix container numbers, adjust #3

* sync to rta

* resync

* this is not needed anymore

* add --fix-missing

* fresh python?

* revert to tar.gz

* chaos tests in nightlies demand for longer timeouts, since tests run longer.

* Update README.md

Co-authored-by: Jan <[email protected]>

* Update README.md

Co-authored-by: Jan <[email protected]>

* Update README.md

Co-authored-by: Jan <[email protected]>

* Update README.md

Co-authored-by: Jan <[email protected]>

* Update README.md

Co-authored-by: Jan <[email protected]>

* remove more old stuff

* ignore encoding errors

* increase timeout to hard self kill

* switch to one environment variable name

* env

* limit the amount of coredumps

* ignore access denied to open sockets

* if we need to wait for the system to cool down on start...

* make sure we don't come back good if nothing launched at all

* them tiny boxes need more time

* need more time

* add deadline status to testfailurs.txt

* need more time

* beautify testfailures.txt

* give machine estimate reasons at the start of the run

* case may matter

* one more environment variable

Co-authored-by: Vadim <[email protected]>
Co-authored-by: Jan <[email protected]>

* Feature/python test runner (#418)

* start implementing a python launch controller

* make it work for the first time

* try launching outside of oskar.

* no more pipes needed

* adjust report directory

* fix paths, thread naming.

* fallback if no env is configured

* lint

* more work on cluster etc

* silence, proper error message for missing variable

* convert params

* lint

* fix slot

* fix arangosh.conf, launching of subsequent testruns

* try to launch it from fish

* implement 7zip

* add modules to the docker container

* more printing

* fix handling

* Add pip3

* Fix typo

* Typo 2

* handle INNERWORKDIR

* fix missing line break

* export settings

* fix typo

* on windows skip !windows tests

* lint, refactor, simplify

* install 7z

* export core directory

* work on fish integration

* similarize for new python job scheduler

* work on reprot generating

* try to implement timeout

* also upload 7z and txt

* also upload 7z and txt

* fix deadline

* fix workspace handling

* fix temporary directory handling

* make sure out temp directory exists

* RTFM fail

* don't put it to the workspace

* implement gtest invoking

* cleanup

* sort, lint

* prefer INNERWORKDIR

* implement writing test.log

* implement html report

* bring back function deletet to early

* install the windows boomerang handler on top level

* fix include

* fix reference

* print before killing shit

* work on timeout

* finish deadline handling, rename script

* fix exit code handling

* lint

* thanks @mpoeter for ps aid

* make the thread identifier the test plus a growing number

* implement central final deadline, which will kick in after 2 minutes

* remove debug output

* use /usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin to locate python

* wintendo next try

* wintendo next try

* wintendo go home

* fix calculation of hard time limit

* make sure nobody changes the exit code to good

* add monkey patches

* cleanup deadline

* ignore exceptions if no process is there

* deadline handling: prioritize incomming lines over timeout counting

* fix directory handling

* work on result presentation

* cleanup

* let the file remain open for further info

* fix environment variable handling

* documentation

* fix port handling

* work on deadline

* fix hard deadline handling

* make it 20s

* need more time

* list processes so we may guess whats actually going on

* kill all, then waitpid all

* make threads provide half a slot.

* be sure to catch

* resume just in case, then kill

* resume just in case, then kill

* ignore resume errors

* increase volume

* lint

* lint

* catch more

* add multipliers

* more load, print load avg

* fix sorting by prio - biggest values first

* cleanup crash report for size

* if test indicates its been crashing create report as well.

* more threat to the machine.

* timeout

* fix typo

* delete tzdata subdir first

* use load and sockets for throttle control

* install required python libs

* only see for load [0, 1]

* increase container version

* anounce deadline at start

* don't print to logfile

* give better feedback if arangosh fails to launch in first place, thangs @maierlars for bringing up the topic

* Update helper.linux.fish

* tschuess ruby

* re-sync to be stock RTA

* fix container numbers, adjust #3

* sync to rta

* resync

* this is not needed anymore

* add --fix-missing

* fresh python?

* revert to tar.gz

* chaos tests in nightlies demand for longer timeouts, since tests run longer.

* Update README.md

Co-authored-by: Jan <[email protected]>

* Update README.md

Co-authored-by: Jan <[email protected]>

* Update README.md

Co-authored-by: Jan <[email protected]>

* Update README.md

Co-authored-by: Jan <[email protected]>

* Update README.md

Co-authored-by: Jan <[email protected]>

* remove more old stuff

* ignore encoding errors

* increase timeout to hard self kill

* switch to one environment variable name

* env

* limit the amount of coredumps

* ignore access denied to open sockets

* if we need to wait for the system to cool down on start...

* make sure we don't come back good if nothing launched at all

* them tiny boxes need more time

* need more time

* add deadline status to testfailurs.txt

* need more time

* beautify testfailures.txt

* give machine estimate reasons at the start of the run

* case may matter

* one more environment variable

* anounce test directory

* switch sequence, print first

* one more var exported

Co-authored-by: Vadim <[email protected]>
Co-authored-by: Jan <[email protected]>

* Fixed 7z and Signing

Added fixes for signing and 7z

* Feature/python test runner (#419)

* start implementing a python launch controller

* make it work for the first time

* try launching outside of oskar.

* no more pipes needed

* adjust report directory

* fix paths, thread naming.

* fallback if no env is configured

* lint

* more work on cluster etc

* silence, proper error message for missing variable

* convert params

* lint

* fix slot

* fix arangosh.conf, launching of subsequent testruns

* try to launch it from fish

* implement 7zip

* add modules to the docker container

* more printing

* fix handling

* Add pip3

* Fix typo

* Typo 2

* handle INNERWORKDIR

* fix missing line break

* export settings

* fix typo

* on windows skip !windows tests

* lint, refactor, simplify

* install 7z

* export core directory

* work on fish integration

* similarize for new python job scheduler

* work on reprot generating

* try to implement timeout

* also upload 7z and txt

* also upload 7z and txt

* fix deadline

* fix workspace handling

* fix temporary directory handling

* make sure out temp directory exists

* RTFM fail

* don't put it to the workspace

* implement gtest invoking

* cleanup

* sort, lint

* prefer INNERWORKDIR

* implement writing test.log

* implement html report

* bring back function deletet to early

* install the windows boomerang handler on top level

* fix include

* fix reference

* print before killing shit

* work on timeout

* finish deadline handling, rename script

* fix exit code handling

* lint

* thanks @mpoeter for ps aid

* make the thread identifier the test plus a growing number

* implement central final deadline, which will kick in after 2 minutes

* remove debug output

* use /usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin to locate python

* wintendo next try

* wintendo next try

* wintendo go home

* fix calculation of hard time limit

* make sure nobody changes the exit code to good

* add monkey patches

* cleanup deadline

* ignore exceptions if no process is there

* deadline handling: prioritize incomming lines over timeout counting

* fix directory handling

* work on result presentation

* cleanup

* let the file remain open for further info

* fix environment variable handling

* documentation

* fix port handling

* work on deadline

* fix hard deadline handling

* make it 20s

* need more time

* list processes so we may guess whats actually going on

* kill all, then waitpid all

* make threads provide half a slot.

* be sure to catch

* resume just in case, then kill

* resume just in case, then kill

* ignore resume errors

* increase volume

* lint

* lint

* catch more

* add multipliers

* more load, print load avg

* fix sorting by prio - biggest values first

* cleanup crash report for size

* if test indicates its been crashing create report as well.

* more threat to the machine.

* timeout

* fix typo

* delete tzdata subdir first

* use load and sockets for throttle control

* install required python libs

* only see for load [0, 1]

* increase container version

* anounce deadline at start

* don't print to logfile

* give better feedback if arangosh fails to launch in first place, thangs @maierlars for bringing up the topic

* Update helper.linux.fish

* tschuess ruby

* re-sync to be stock RTA

* fix container numbers, adjust #3

* sync to rta

* resync

* this is not needed anymore

* add --fix-missing

* fresh python?

* revert to tar.gz

* chaos tests in nightlies demand for longer timeouts, since tests run longer.

* Update README.md

Co-authored-by: Jan <[email protected]>

* Update README.md

Co-authored-by: Jan <[email protected]>

* Update README.md

Co-authored-by: Jan <[email protected]>

* Update README.md

Co-authored-by: Jan <[email protected]>

* Update README.md

Co-authored-by: Jan <[email protected]>

* remove more old stuff

* ignore encoding errors

* increase timeout to hard self kill

* switch to one environment variable name

* env

* limit the amount of coredumps

* ignore access denied to open sockets

* if we need to wait for the system to cool down on start...

* make sure we don't come back good if nothing launched at all

* them tiny boxes need more time

* need more time

* add deadline status to testfailurs.txt

* need more time

* beautify testfailures.txt

* give machine estimate reasons at the start of the run

* case may matter

* one more environment variable

* anounce test directory

* switch sequence, print first

* one more var exported

* add disk i/o to the output

* better work with M1 performance cores

* print other sequence; enable more load[1]

* more threads doesn't cut it

* print platform

* precise M1 detection

* two places on mac to collect cores

* properly append

* fix default directory

* use iso-ish datetime format for filenames

Co-authored-by: Vadim <[email protected]>
Co-authored-by: Jan <[email protected]>
Co-authored-by: Markus Pfeiffer <[email protected]>

Co-authored-by: Jan <[email protected]>
Co-authored-by: Vadim <[email protected]>
Co-authored-by: Sven Luschgy <[email protected]>
Co-authored-by: Markus Pfeiffer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants