action #176148
closedcoordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA
coordination #175515: [epic] incomplete jobs with "Failed to find an available port: Address already in use"
Ensure https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ has 100% passing unit tests in CI size:S
0%
Description
Motivation¶
The regression tracked down in #175518 was allowed to go undetected in part because Mojo-IOLoop-ReadWriteProcess currently has failing unit tests in the default branch in CI, and the repo is not considered well-maintained and easy to work with.
Acceptance Criteria¶
- AC1: https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/actions?query=branch%3Amaster shows consistent stable test results
- AC2: Testsuite succeeds on linux
- AC3: Code coverage is available for a usual Linux environment
Suggestions¶
- Linux being better an opensuse os than ubuntu
- Ensure Ensure 100% passing unit tests of CI in Mojo-IOLoop-ReadWriteProcess
- Check latest failures from https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/actions?query=branch%3Amaster, try to reproduce them locally or within CI, fix 'em
- Ensure relevant tests are included in all CI runs
- Some tests are currently MacOS-specific
Updated by dheidler 22 days ago
- Subject changed from Ensure https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ has 100% passing unit tests in CI to Ensure https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ has 100% passing unit tests in CI size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by openqa_review 22 days ago
- Due date set to 2025-04-11
Setting due date based on mean cycle time of SUSE QE Tools
Updated by dheidler 21 days ago
Not sure how we could run opensuse there - github only provides ubuntu, mac and windows images: https://github.com/actions/runner-images
Updated by okurz 21 days ago
dheidler wrote in #note-9:
Not sure how we could run opensuse there - github only provides ubuntu, mac and windows images: https://github.com/actions/runner-images
We can use container-in-container like in os-autoinst, see
https://github.com/os-autoinst/os-autoinst/blob/master/.github/workflows/ci.yml#L14
Updated by okurz 21 days ago · Edited
- Status changed from In Progress to Feedback
The main fix was done by tinita with https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/71. I have updates, smaller fixes and simplification for CI in
And related improvements:
- https://github.com/os-autoinst/os-autoinst/pull/2689
- https://github.com/os-autoinst/openQA/pull/6340
I also observed that t/01_run.t sporadically failed in CI so I checked if this can be reproduced easily. I got
## count-fail-ratio: Run: 2000. Fails: 0. Fail ratio 0±0%. No fails, computed failure probability < .15%
## mean runtime: 2484±182.24 ms
so no joy reproducing
Updated by okurz 18 days ago
- Status changed from Feedback to In Progress
https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/actions?query=branch%3Amaster shows that most tests pass but some fail. I will try to reproduce with coverage enabled or directly within github actions
Updated by tinita 18 days ago
- Related to action #178186: [sporadic] Failing OBS package check t/12_mocked_container.t for perl-Mojo-IOLoop-ReadWriteProcess on aarch64 size:S added
Updated by okurz 18 days ago
https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/73 to fix the status branch. Preparing a release as well as discussed in the daily https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/67
Updated by okurz 16 days ago
- Assignee changed from okurz to szarate
Now first we created a release 1.0.0 and uploaded to https://metacpan.org/pod/Mojo::IOLoop::ReadWriteProcess with great help from tinita but I am missing "authorized releasers" page from https://metacpan.org/dist/Mojo-IOLoop-ReadWriteProcess/permissions , asked szarate
Updated by okurz 16 days ago · Edited
Trying to reproduce the failure locally with coverage enabled also no joy with runs=2000 env PERL5OPT="-MDevel::Cover" count-fail-ratio prove -l --timer t/01_run.t
## count-fail-ratio: Run: 170. Fails: 0. Fail ratio 0±0%. No fails, computed failure probability < 1.76%
## mean runtime: 4453±332.37 ms
Running in CI now https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14217779071
Updated by szarate 16 days ago
okurz wrote in #note-19:
Now first we created a release 1.0.0 and uploaded to https://metacpan.org/pod/Mojo::IOLoop::ReadWriteProcess with great help from tinita but I am missing "authorized releasers" page from https://metacpan.org/dist/Mojo-IOLoop-ReadWriteProcess/permissions , asked szarate
Asked Ettore via Telegram to add okurz as co-mantainer or transfer ownership
Updated by okurz 16 days ago
- Copied to action #179906: Proper release with current content for https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ size:S added
Updated by okurz 16 days ago
Was running in CI now with 1000 runs each based on https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/tree/refs/heads/fix/instability with relevant changes in
https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/blob/refs/heads/fix/instability/.github/workflows/ci-tests.yaml
build:
runs-on: ${{ matrix.os }}
strategy:
…
fail-fast: false
name: 🐪 Perl ${{ matrix.perl }} on ${{ matrix.os }}
steps:
…
run: |
curl https://raw.githubusercontent.com/okurz/retry/refs/heads/main/count-fail-ratio > count-fail-ratio
chmod +x count-fail-ratio
TEST_SHARED=1 TEST_SUBREAPER=1 runs=1000 ./count-fail-ratio cover -test -report codecovbash
https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974 shows that all linux builds had and macos builds fail due to how unix time in seconds is calculated apparently.
maybe surprisingly Perl 5.16 based tests are super stable according to https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683543#step:6:27012
## count-fail-ratio: Run: 1000. Fails: 0. Fail ratio 0±0%. No fails, computed failure probability < .30%
other runs show problem. First problem https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683540#step:6:886
Errors in testing. Cannot continue.
t/01_run.t (Wstat: 139 Tests: 12 Failed: 0)
Non-zero wait status: 139
so probably unclean shutdown of the complete test module so that we don't end up with 0/1.
https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683540#step:6:23820 for 5.20 shows
## count-fail-ratio: Run: 876. Fails: 26. Fail ratio 2.96±1.12%
all same failure as above.
5.26 also stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683574#step:6:27012
## count-fail-ratio: Run: 1000. Fails: 0. Fail ratio 0±0%. No fails, computed failure probability < .30%
same for 5.30 https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683578#step:6:27012
5.34 showed "## count-fail-ratio: Run: 1000. Fails: 2. Fail ratio .20±.27%", same for 5.38, same failures as above. 5.40 stable, "latest" stable.
So it's surprising that 5.20 shows such significant problems. Retriggering all runs again to crosscheck: https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974
Updated by okurz 15 days ago
Surprisingly again 5.16 stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871863#step:6:27012, 5.20 24 fails with "(Wstat: 139 Tests: 12 Failed: 0)" https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871854#step:6:27156, 5.26 stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871866, 5.30 stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871865#step:6:27012, 5.34 3 fails with "(Wstat: 139 Tests: 12 Failed: 0)" https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871868, 5.38 one fail with "(Wstat: 139 (Signal: SEGV, dumped core) Tests: 12 Failed: 0)" https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871873, 5.40 stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871882#step:6:27012, latest stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871886#step:6:27012 so quite different to above so the Perl version does not seem to have an impact but likely the load on the node running tests, e.g. less load likely producing no failures and higher load more likely to cause problems.
Code "139" means segmentation fault so I assume either in this module or any other module we use unclean memory handling during termination.
Now running again rebased after tinita's latest changes as well as with while (my $pid = waitpid(-1, WNOHANG) > 0) { …
as well as w/o coverage collection: https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14252642847
Updated by okurz 14 days ago
- Copied to action #180026: [sporadic] t/01_run.t can end with segfault in https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ CI tests added
Updated by okurz 10 days ago
- Due date deleted (
2025-04-11) - Status changed from Feedback to Resolved
https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/79 merged. With that with retries we should have sufficient stability. If there are more specific sporadic unit test failures we can apply retry there as well.