t/01-test-utilities.t fails in 'test would have failed' in unrelated PR
in https://circleci.com/gh/os-autoinst/openQA/60151?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link t/01-test-utilities.t fails with
not ok 1 - warnings in sub processes are fatal test failures Subtest: warnings in sub processes are fatal test failures # PID of test-process-1: 141 # waiting at most 42 seconds for SIGCHLD (sleep is supposed to be interrupted by SIGCHLD) ok 1 - warning logged not ok 2 - test would have failed ok 3 - message of test failure ok 4 - exit code of test failure is non-zero not ok 5 - manual termination via stop_service does not trigger _fail_and_exit 1..5
On my work notebook local environment 1k runs (!) have succeeded, in circleCI a second run failed: https://app.circleci.com/pipelines/github/os-autoinst/openQA/6386/workflows/ba9d9a66-05f7-4c14-b2ea-921dd5a2e272/jobs/60180/steps . Anyone an idea?
mkittler mentioned maybe it's because of coverage analysis. Trying with
runs=2000 count_fail_ratio make coverage KEEP_DB=1 TESTS=t/01-test-utilities.t
but that fails for me with
Can't read /home/okurz/local/os-autoinst/openQA/cover_db/runs/1620218733.13161.31806/cover.14 with Sereal: Sereal: Error: Bad Sereal header: Not a valid Sereal document. at offset 1 of input at srl_decoder.c line 600 at /usr/lib/perl5/vendor_perl/5.26.1/x86_64-linux-thread-multi/Devel/Cover/DB/IO/Sereal.pm line 34, <$fh> chunk 1.. I think I had something like this already in the past. What's missing now?
- Due date set to 2021-05-19
- Status changed from In Progress to Feedback
- Priority changed from High to Normal
runs=2000 count_fail_ratio make coverage KEEP_DB=1 TESTS=t/01-test-utilities.t yields:
## count_fail_ratio: Run: 2000. Fails: 0. Fail ratio 0%
so at least not easily reproducible this way with coverage locally, must be something else. Any other ideas what to try?
Maybe not a new observation, but at least I wanted to note it:
I ran the tests on CircleCI in a loop (with a dummy fail and RETRY), and in the failing cases the output
ok 1 - warning logged ...
always appeared around 40 seconds after the note
# waiting at most 42 seconds for SIGCHLD (sleep is supposed to be interrupted by SIGCHLD)
So at least we know that the SIGCHLD is not interrupting the sleep in this case.
Not sure whether it helped: https://github.com/os-autoinst/openQA/pull/3877#issuecomment-836523675