action #92164: t/01-test-utilities.t fails in 'test would have failed' in unrelated PR - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #92164

closed

t/01-test-utilities.t fails in 'test would have failed' in unrelated PR

Added by okurz about 4 years ago. Updated about 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

tinita

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2021-05-05

Due date:

2021-05-19

% Done:

Estimated time:

Description

Observation¶

In https://github.com/os-autoinst/openQA/pull/3877
in https://circleci.com/gh/os-autoinst/openQA/60151?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link t/01-test-utilities.t fails with

not ok 1 - warnings in sub processes are fatal test failures
Subtest: warnings in sub processes are fatal test failures
    # PID of test-process-1: 141
    # waiting at most 42 seconds for SIGCHLD (sleep is supposed to be interrupted by SIGCHLD)
    ok 1 - warning logged
    not ok 2 - test would have failed
    ok 3 - message of test failure
    ok 4 - exit code of test failure is non-zero
    not ok 5 - manual termination via stop_service does not trigger _fail_and_exit
    1..5

Actions

Copy link

Updated by okurz about 4 years ago

On my work notebook local environment 1k runs (!) have succeeded, in circleCI a second run failed: https://app.circleci.com/pipelines/github/os-autoinst/openQA/6386/workflows/ba9d9a66-05f7-4c14-b2ea-921dd5a2e272/jobs/60180/steps . Anyone an idea?

Actions

Copy link

Updated by okurz about 4 years ago

mkittler mentioned maybe it's because of coverage analysis. Trying with

runs=2000 count_fail_ratio make coverage KEEP_DB=1 TESTS=t/01-test-utilities.t

but that fails for me with Can't read /home/okurz/local/os-autoinst/openQA/cover_db/runs/1620218733.13161.31806/cover.14 with Sereal: Sereal: Error: Bad Sereal header: Not a valid Sereal document. at offset 1 of input at srl_decoder.c line 600 at /usr/lib/perl5/vendor_perl/5.26.1/x86_64-linux-thread-multi/Devel/Cover/DB/IO/Sereal.pm line 34, <$fh> chunk 1.. I think I had something like this already in the past. What's missing now?

Actions

Copy link

Updated by okurz about 4 years ago

Using https://github.com/os-autoinst/openQA/pull/3885 as workaround I can proceed. Running tests with coverage now in a loop locally …

Actions

Copy link

Updated by okurz about 4 years ago

Due date set to 2021-05-19
Status changed from In Progress to Feedback
Priority changed from High to Normal

experiment finished, runs=2000 count_fail_ratio make coverage KEEP_DB=1 TESTS=t/01-test-utilities.t yields:

## count_fail_ratio: Run: 2000. Fails: 0. Fail ratio 0%

so at least not easily reproducible this way with coverage locally, must be something else. Any other ideas what to try?

Actions

Copy link

Updated by tinita about 4 years ago

Maybe not a new observation, but at least I wanted to note it:
I ran the tests on CircleCI in a loop (with a dummy fail and RETRY), and in the failing cases the output

ok 1 - warning logged
...

always appeared around 40 seconds after the note

# waiting at most 42 seconds for SIGCHLD (sleep is supposed to be interrupted by SIGCHLD)

So at least we know that the SIGCHLD is not interrupting the sleep in this case.

Full log of 20 runs:
https://app.circleci.com/pipelines/github/perlpunk/openQA/167/workflows/a5c11032-e6b3-4bf4-a548-70a029d0f4c0/jobs/743

Actions

Copy link

Updated by tinita about 4 years ago

I added more logging, and it appears that the SIGWARN handler setup in OpenQA::Test::Utils::_setup_sub_process is simply called before the SIGCHLD handler has been installed.

Actions

Copy link