action #103422
closed[sporadic] os-autoinst: 13-osutils.t:167 Failed test 'Exit code appear in log' in GHA size:M
Description
Observation¶
First seen on https://github.com/os-autoinst/os-autoinst/pull/1871#issuecomment-984670874:
8: [11:23:53] ./12-bmwqemu.t ........................... ok 914 ms ( 0.01 usr 0.00 sys + 0.82 cusr 0.09 csys = 0.92 CPU)
8:
8: # Failed test 'Exit code appear in log'
8: # at ./13-osutils.t line 167.
8: # STDERR:
8: # [2021-12-02T11:23:54.401579Z] [debug] running `echo foo`
8: # [2021-12-02T11:23:54.405060Z] [debug] Command `echo foo` terminated with 1
8: # foo
8: #
8: # doesn't match:
8: # (?^u:terminated with 0)
8: # as expected
8: # Looks like you failed 1 test of 10.
8:
8: # Failed test 'run_diag'
8: # at ./13-osutils.t line 190.
8: # Looks like you failed 1 test of 9.
8: [11:23:54] ./13-osutils.t ...........................
8: Dubious, test returned 1 (wstat 256, 0x100)
8: Failed 1/9 subtests
Another failure in the same test, maybe related:
https://github.com/perlpunk/os-autoinst/runs/4408870749?check_suite_focus=true
8: runcmd 'rm image.qcow2' failed with exit code 1 at /opt/osutils.pm line 112.
8: # Tests were run but no plan was declared and done_testing() was not seen.
8: # Looks like your test exited with 255 just after 7.
8: [14:05:37] ./13-osutils.t ...........................
8: Dubious, test returned 255 (wstat 65280, 0xff00)
8: All 7 subtests passed
More occurrences:
I couldn't find another instance. What I did find was another failure that didn't fail CI in 14-isotovideo.t:128
.
Acceptance criteria¶
- AC1: Failure in 13-osutils.t fails CI
- AC2: 13-osutils.t passes w/o failures
Suggestions¶
- Confirm when 13-osutils.t started failing
- Fix 13-osutils.t
- Compare if 14-isotovideo.t and maybe other tests generally don't fail CI
- Ensure that the pipeline fails if any tests fail
Updated by okurz about 3 years ago
- Subject changed from os-autoinst: 13-osutils.t:167 Failed test 'Exit code appear in log' in GHA to [sporadic] os-autoinst: 13-osutils.t:167 Failed test 'Exit code appear in log' in GHA
So I understand it's sporadic?
Updated by livdywan about 3 years ago
cdywan wrote:
Suggestions¶
- Confirm when 13-osutils.t started failing
- Compare if 14-isotovideo.t and maybe other tests generally don't fail CI
btw I put this as a suggestion here rather than an additional ticket, since it's probably not feasible to ensure tests fail without fixing them first; of course we can re-consider this when we estimate it
Updated by okurz about 3 years ago
- Subject changed from [sporadic] os-autoinst: 13-osutils.t:167 Failed test 'Exit code appear in log' in GHA to [sporadic] os-autoinst: 13-osutils.t:167 Failed test 'Exit code appear in log' in GHA size:M
- Description updated (diff)
- Status changed from New to Workable
There is also
8: Error writing to virtio/svirt serial terminal: Broken pipe at ./10-terminal.t line 233.
8:
8: # Failed test 'Fake terminal process exit status is 0'
8: # at ./10-terminal.t line 318.
8: # got: '1'
8: # expected: '0'
8:
8: # Failed test 'Direct test VIRTIO_CONSOLE not set process exit status is 0'
8: # at ./10-terminal.t line 318.
8: # got: '32'
8: # expected: '0'
8:
8: # Failed test '[Child 2321] fake_terminal timed out while performing IO'
8: # at ./10-terminal.t line 365.
8:
8: # Failed test '[Child 2322] direct: find login prompt'
8: # at ./10-terminal.t line 365.
8: # Structures begin differing at:
8: # $got->{matched} = '0'
8: # $expected->{matched} = '1'
8: # Looks like you failed 4 tests of 11.
8: [20:34:52] ./10-terminal.t ..........................
8: Dubious, test returned 4 (wstat 1024, 0x400)
8: Failed 4/11 subtests
showing up in https://github.com/os-autoinst/os-autoinst/runs/4386959135?check_suite_focus=true but the pipeline job shows passed, see the green checkmark on https://github.com/os-autoinst/os-autoinst/pull/1789 where we would expect a red cross
Updated by mkittler about 3 years ago
- Assignee set to mkittler
The issue of 13-osutils.t
might be fixed by https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/22 (released as RWP version 0.31).
If another failing test is ignored as well I suppose the problem isn't really test-specific, though.
Updated by tinita about 3 years ago
This (failures not leading to CI failure) has been going on for several weeks, and my check-for-git-status feature seems to be the culprit :(
Still trying to reproduce locally
https://github.com/perlpunk/os-autoinst/commit/9ef54a85a901a3a2340da53a8d0699de782322a3
edit: it happens only in combination with coverage:
CI=1 WITH_COVER_OPTIONS=1 CHECK_GIT_STATUS=1 ./tools/invoke-tests
now if I make a test fail, then the exit code of the first prove call is 0 and it will continue
Updated by tinita about 3 years ago
Fix for the unrecognized test failures: https://github.com/os-autoinst/os-autoinst/pull/1873 (merged)
Updated by tinita about 3 years ago
Quite embarrassing, but the previous PR (merged already) had a mistake: https://github.com/os-autoinst/os-autoinst/pull/1874 (merged)
Updated by mkittler about 3 years ago
- Assignee changed from mkittler to tinita
Ok, you're writing comments faster than I can update the ticket with my own finding. I've cross-checked the CMake code and bash syntax and unless #!/bin/bash -e
is not effective it must be the prove invocation itself which doesn't return the non-zero exit code correctly. So your PR likely fixes the issue, indeed. I've approved it and I'm assigning you to the ticket.
Updated by mkittler about 3 years ago
- Status changed from Workable to In Progress
Updated by tinita about 3 years ago
- Description updated (diff)
- Status changed from In Progress to Feedback
Updated by tinita about 3 years ago
- Copied to action #103467: [sporadic] os-autoinst: 14-isotovideo.t:133 'no fatal error recorded' size:M added
Updated by okurz about 3 years ago
- Has duplicate action #103485: [sporadic] os-autoinst: flaky test t/13-osutils.t added
Updated by okurz about 3 years ago
Observed in GHA and
locally
## Run 18
t/13-osutils.t .. 1/?
# Failed test 'Exit code appear in log'
# at t/13-osutils.t line 167.
# STDERR:
# [2021-12-05T11:03:37.186582+01:00] [debug] running `echo foo`
# [2021-12-05T11:03:37.194499+01:00] [debug] Command `echo foo` terminated with 1
# foo
#
# doesn't match:
# (?^u:terminated with 0)
# as expected
# Looks like you failed 1 test of 10.
# Failed test 'run_diag'
# at t/13-osutils.t line 190.
# Looks like you failed 1 test of 9.
t/13-osutils.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/9 subtests
Reproducible¶
Locally I could only reproduce it in 1/500 (!) runs with runs=500 count_fail_ratio prove -I. t/13-osutils.t
runs=500 count_fail_ratio make test-perl-testsuite WITH_COVER_OPTIONS=1 TESTS=t/13-osutils.t
yields
## count_fail_ratio: Run: 500. Fails: 5. Fail ratio 1%
Updated by tinita about 3 years ago
- Status changed from Feedback to In Progress
Updated by tinita about 3 years ago
I looked into all master CI runs since
"Merge pull request #1825 from perlpunk/check-for-git-status ci #1005" (Oct 28)
https://github.com/os-autoinst/os-autoinst/runs/4035778918?check_suite_focus=true
It started to fail for this one:
"Merge pull request #1855 from cfconrad/chattr_debugging ci #1145" (Nov 23)
https://github.com/os-autoinst/os-autoinst/runs/4298150735?check_suite_focus=true
For the record, all master runs:
1222: pass https://github.com/os-autoinst/os-autoinst/runs/4396177333?check_suite_focus=true
1216: pass https://github.com/os-autoinst/os-autoinst/runs/4383814133?check_suite_focus=true
1210: pass https://github.com/os-autoinst/os-autoinst/runs/4379650280?check_suite_focus=true
1204: 13-osutils.t Failed test 'Exit code appear in log'
https://github.com/os-autoinst/os-autoinst/runs/4370300894?check_suite_focus=true
1198: pass (*) https://github.com/os-autoinst/os-autoinst/runs/4367441256?check_suite_focus=true
1194: pass https://github.com/os-autoinst/os-autoinst/runs/4340445436?check_suite_focus=true
1192: pass https://github.com/os-autoinst/os-autoinst/runs/4332853295?check_suite_focus=true
1190: pass https://github.com/os-autoinst/os-autoinst/runs/4332545501?check_suite_focus=true
1188: pass https://github.com/os-autoinst/os-autoinst/runs/4331514911?check_suite_focus=true
1184: pass https://github.com/os-autoinst/os-autoinst/runs/4325174341?check_suite_focus=true
1180: 13-osutils.t Failed test 'Exit code appear in log'
https://github.com/os-autoinst/os-autoinst/runs/4324359196?check_suite_focus=true
1178: 13-osutils.t Failed test 'Exit code appear in log'
https://github.com/os-autoinst/os-autoinst/runs/4323176520?check_suite_focus=true
1174: 13-osutils.t Failed test 'Exit code appear in log'
https://github.com/os-autoinst/os-autoinst/runs/4318559803?check_suite_focus=true
1150: 13-osutils.t Failed test 'Exit code appear in log'
https://github.com/os-autoinst/os-autoinst/runs/4300097983?check_suite_focus=true
1145: 13-osutils.t Failed test 'Exit code appear in log'
https://github.com/os-autoinst/os-autoinst/runs/4298150735?check_suite_focus=true
1144: pass https://github.com/os-autoinst/os-autoinst/runs/4297730291?check_suite_focus=true
1143: pass https://github.com/os-autoinst/os-autoinst/runs/4297702670?check_suite_focus=true
1130: pass https://github.com/os-autoinst/os-autoinst/runs/4287579737?check_suite_focus=true
1099: pass https://github.com/os-autoinst/os-autoinst/runs/4210349893?check_suite_focus=true
1098: pass https://github.com/os-autoinst/os-autoinst/runs/4209742250?check_suite_focus=true
1082: pass https://github.com/os-autoinst/os-autoinst/runs/4165749243?check_suite_focus=true
1077: pass https://github.com/os-autoinst/os-autoinst/runs/4154751608?check_suite_focus=true
1073: pass https://github.com/os-autoinst/os-autoinst/runs/4154049211?check_suite_focus=true
1072: pass https://github.com/os-autoinst/os-autoinst/runs/4153523636?check_suite_focus=true
1068: pass https://github.com/os-autoinst/os-autoinst/runs/4153070582?check_suite_focus=true
1067: pass https://github.com/os-autoinst/os-autoinst/runs/4153068569?check_suite_focus=true
1059: pass https://github.com/os-autoinst/os-autoinst/runs/4151045057?check_suite_focus=true
1053: pass https://github.com/os-autoinst/os-autoinst/runs/4139738683?check_suite_focus=true
1050: pass https://github.com/os-autoinst/os-autoinst/runs/4139609023?check_suite_focus=true
1026: pass https://github.com/os-autoinst/os-autoinst/runs/4082501881?check_suite_focus=true
1024: pass https://github.com/os-autoinst/os-autoinst/runs/4079870003?check_suite_focus=true
1023: pass https://github.com/os-autoinst/os-autoinst/runs/4079327360?check_suite_focus=true
1016: pass https://github.com/os-autoinst/os-autoinst/runs/4049541911?check_suite_focus=true
1013: pass https://github.com/os-autoinst/os-autoinst/runs/4049078290?check_suite_focus=true
1010: pass https://github.com/os-autoinst/os-autoinst/runs/4041083982?check_suite_focus=true
1005: pass https://github.com/os-autoinst/os-autoinst/runs/4035778918?check_suite_focus=true
Updated by tinita about 3 years ago
Commented here: https://github.com/os-autoinst/os-autoinst/pull/1855#issuecomment-986742654
Looks like it could be related as the test calls run_diag
which was changed.
Updated by tinita about 3 years ago
- Status changed from In Progress to Feedback
Clemens confirmed Marius' suggestion that https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/22
will fix this.
The version 0.31 is currently submitted to Factory: https://build.opensuse.org/request/show/935940
Updated by okurz about 3 years ago
As a temporary workaround I did osc linkpac devel:languages:perl perl-Mojo-IOLoop-ReadWriteProcess devel:openQA:Leap:15.3
as multiple PRs are impacted. Let's see if this helps. After the fix reached Factory we should update the link to point to the Factory version again.
Updated by tinita about 3 years ago
All 15.3 builds are failing now with the same error
perl-Mojo-IOLoop-ReadWriteProcess-0.31-lp153.30.1
https://build.opensuse.org/package/live_build_log/devel:openQA/os-autoinst/openSUSE_Leap_15.3/x86_64
[ 211s] 3: [17:11:16] ./28-signalblocker.t ..................... ok 686 ms ( 0.00 usr 0.00 sys + 0.56 cusr 0.14 csys = 0.70 CPU)
[ 212s] 3:
[ 212s] 3: # Failed test 'exit logged'
[ 212s] 3: # at ./29-backend-driver.t line 31.
[ 212s] 3: # STDOUT & STDERR:
[ 212s] 3: # [37m[2021-12-06T17:11:16.953486Z] [debug] received magic close
[ 212s] 3: # [0m
[ 212s] 3: # don't match:
[ 212s] 3: # (?^u:backend.*exited)
[ 212s] 3: # as expected
[ 212s] 3: [37m[2021-12-06T17:11:17.057808Z] [debug] backend process exited: 72057594037927935
[ 212s] 3: [0m# Looks like you failed 1 test of 12.
[ 212s] 3: [17:11:17] ./29-backend-driver.t ....................
[ 212s] 3: Dubious, test returned 1 (wstat 256, 0x100)
[ 212s] 3: Failed 1/12 subtests
Also the output of 10-virtio_terminal.t looks weird, but not sure if this was happening before.
It still succeeds, though:
[ 196s] 3: Can't open out pipe for reading Interrupted system call at ./10-virtio_terminal.t line 44.
[ 196s] 3:
[ 196s] 3: # Failed test 'no (unexpected) warnings (via END block)'
[ 196s] 3: # at /usr/lib/perl5/5.26.1/Test/Builder.pm line 135.
[ 196s] 3: # Got the following unexpected warnings:
[ 196s] 3: # 1: Can't open out pipe for reading Interrupted system call at ./10-virtio_terminal.t line 44.
[ 196s] 3: [33m[0m[33m[0m[33m[0m[33m[0m[33m[0m[33m[0m[33m[0m[33m[0m[33m[0m[17:11:01] ./10-virtio_terminal.t ................... ok 820 ms ( 0.01 usr 0.00 sys + 0.71 cusr 0.13 csys = 0.85 CPU)
This is how it looks like on 15.2:
[ 164s] 3: [33m[0m[33m[0m[33m[0m[33m[0m[33m[0m[33m[0m[33m[0m[33m[0m[33m[0m[10:16:34] ./10-virtio_terminal.t ................... ok 835 ms ( 0.01 usr 0.00 sys + 0.71 cusr 0.14 csys = 0.86 CPU)
In any case we should get rid of the ansi codes.
Created an issue here: https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/issues/29
Updated by okurz about 3 years ago
are you sure it's an upstream issue and not just what we expect in our tests and that should be made less strict?
Also, as perl-Mojo-IOLoop-ReadWriteProcess was accepted into openSUSE:Factory I now did osc linkpac -f openSUSE:Factory perl-Mojo-IOLoop-ReadWriteProcess devel:openQA:Leap:15.3
EDIT: Hm, I ensured that I have perl-Mojo-IOLoop-ReadWriteProcess-0.31-lp153.30.1 installed locally as well but I can only easily reproduce the 29-backend-driver.t problem, not easily the others of the above errors. So for me it's not happening or as sporadic as reported in before
EDIT: I know the realized the message "backend process exited: 72057594037927935". That looks certainly very wrong.
Updated by okurz about 3 years ago
- Priority changed from High to Urgent
The overall problem becomes more severe with perl-Mojo-IOLoop-ReadWriteProcess in Tumbleweed as well as in devel:openQA:Leap:15.3 and also blocking more pull requests, see e.g. https://github.com/os-autoinst/openQA/pull/4388 . Feel free to pull in more people to help or delegate.
Updated by tinita about 3 years ago
- Copied to action #103611: test failure: os-autoinst: 29-backend-driver.t:31 Failed test 'exit logged' added
Updated by tinita about 3 years ago
I copied the new issue with 29-backend-driver.t et al. over to #103611
I would say the issue in this ticket with 13-osutils.t is probably solved.
Updated by livdywan about 3 years ago
tinita wrote:
I copied the new issue with 29-backend-driver.t et al. over to #103611
I would say the issue in this ticket with 13-osutils.t is probably solved.
Would you mind running runs=500 count_fail_ratio make test-perl-testsuite WITH_COVER_OPTIONS=1 TESTS=t/13-osutils.t
before resolving?
Updated by okurz about 3 years ago
- Copied to action #103692: Only upgrade o3 workers if package checks are good, same as for o3 webui size:M added
Updated by okurz about 3 years ago
As we discussed this topic and tried to find improvements I created a ticket #103692 to improve
Updated by tinita about 3 years ago
- Status changed from Feedback to Blocked
For now this is blocked because we have the old RWP version, so tests are sporadically failing:
https://github.com/os-autoinst/os-autoinst/actions/runs/1559333666
https://github.com/os-autoinst/os-autoinst/actions/runs/1559285099
Waiting until RWP 0.32 is released and in Factory.
Updated by okurz about 3 years ago
- Status changed from Blocked to Resolved
https://build.opensuse.org/package/show/openSUSE:Factory/perl-Mojo-IOLoop-ReadWriteProcess is 0.32 now, tests in os-autoinst pass and any test failures would make the CI pipeline fail so both ACs covered and followup work defined in #103845