Project

General

Profile

Actions

action #176148

closed

coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA

coordination #175515: [epic] incomplete jobs with "Failed to find an available port: Address already in use"

Ensure https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ has 100% passing unit tests in CI size:S

Added by livdywan 3 months ago. Updated 10 days ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2025-01-24
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

The regression tracked down in #175518 was allowed to go undetected in part because Mojo-IOLoop-ReadWriteProcess currently has failing unit tests in the default branch in CI, and the repo is not considered well-maintained and easy to work with.

Acceptance Criteria

Suggestions


Related issues 3 (2 open1 closed)

Related to openQA Project (public) - action #178186: [sporadic] Failing OBS package check t/12_mocked_container.t for perl-Mojo-IOLoop-ReadWriteProcess on aarch64 size:SResolvedgpuliti2025-03-03

Actions
Copied to openQA Project (public) - action #179906: Proper release with current content for https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ size:SBlockedokurz

Actions
Copied to openQA Project (public) - action #180026: [sporadic] t/01_run.t can end with segfault in https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ CI testsNew

Actions
Actions #1

Updated by livdywan 3 months ago

  • Parent task changed from #175518 to #175515
Actions #2

Updated by okurz 3 months ago

  • Assignee set to okurz
Actions #3

Updated by okurz 3 months ago

  • Target version set to Ready
Actions #4

Updated by okurz 3 months ago

  • Tags set to rwp
  • Description updated (diff)
  • Category set to Regressions/Crashes
  • Assignee deleted (okurz)
  • Target version changed from Ready to future
Actions #5

Updated by okurz 23 days ago

  • Target version changed from future to Ready
Actions #6

Updated by dheidler 22 days ago

  • Subject changed from Ensure https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ has 100% passing unit tests in CI to Ensure https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ has 100% passing unit tests in CI size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #7

Updated by dheidler 22 days ago

  • Status changed from Workable to In Progress
  • Assignee set to dheidler
Actions #8

Updated by openqa_review 22 days ago

  • Due date set to 2025-04-11

Setting due date based on mean cycle time of SUSE QE Tools

Actions #9

Updated by dheidler 21 days ago

Not sure how we could run opensuse there - github only provides ubuntu, mac and windows images: https://github.com/actions/runner-images

Actions #10

Updated by dheidler 21 days ago

  • Assignee deleted (dheidler)
Actions #11

Updated by okurz 21 days ago

dheidler wrote in #note-9:

Not sure how we could run opensuse there - github only provides ubuntu, mac and windows images: https://github.com/actions/runner-images

We can use container-in-container like in os-autoinst, see
https://github.com/os-autoinst/os-autoinst/blob/master/.github/workflows/ci.yml#L14

Actions #12

Updated by okurz 21 days ago

  • Assignee set to okurz
Actions #13

Updated by okurz 21 days ago · Edited

  • Status changed from In Progress to Feedback

The main fix was done by tinita with https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/71. I have updates, smaller fixes and simplification for CI in

And related improvements:

I also observed that t/01_run.t sporadically failed in CI so I checked if this can be reproduced easily. I got

## count-fail-ratio: Run: 2000. Fails: 0. Fail ratio 0±0%. No fails, computed failure probability < .15%
## mean runtime: 2484±182.24 ms

so no joy reproducing

Actions #14

Updated by okurz 19 days ago

All three merged. Awaiting results in master branch

Actions #15

Updated by okurz 18 days ago

  • Status changed from Feedback to In Progress

https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/actions?query=branch%3Amaster shows that most tests pass but some fail. I will try to reproduce with coverage enabled or directly within github actions

Actions #16

Updated by tinita 18 days ago

  • Related to action #178186: [sporadic] Failing OBS package check t/12_mocked_container.t for perl-Mojo-IOLoop-ReadWriteProcess on aarch64 size:S added
Actions #17

Updated by okurz 18 days ago

Actions #18

Updated by okurz 16 days ago

  • Priority changed from Normal to Low
Actions #19

Updated by okurz 16 days ago

  • Assignee changed from okurz to szarate

Now first we created a release 1.0.0 and uploaded to https://metacpan.org/pod/Mojo::IOLoop::ReadWriteProcess with great help from tinita but I am missing "authorized releasers" page from https://metacpan.org/dist/Mojo-IOLoop-ReadWriteProcess/permissions , asked szarate

Actions #20

Updated by okurz 16 days ago

  • Status changed from In Progress to Feedback
  • Assignee changed from szarate to okurz
Actions #21

Updated by okurz 16 days ago · Edited

Trying to reproduce the failure locally with coverage enabled also no joy with runs=2000 env PERL5OPT="-MDevel::Cover" count-fail-ratio prove -l --timer t/01_run.t

## count-fail-ratio: Run: 170. Fails: 0. Fail ratio 0±0%. No fails, computed failure probability < 1.76%
## mean runtime: 4453±332.37 ms

Running in CI now https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14217779071

Actions #22

Updated by szarate 16 days ago

okurz wrote in #note-19:

Now first we created a release 1.0.0 and uploaded to https://metacpan.org/pod/Mojo::IOLoop::ReadWriteProcess with great help from tinita but I am missing "authorized releasers" page from https://metacpan.org/dist/Mojo-IOLoop-ReadWriteProcess/permissions , asked szarate

Asked Ettore via Telegram to add okurz as co-mantainer or transfer ownership

Actions #23

Updated by okurz 16 days ago

  • Copied to action #179906: Proper release with current content for https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ size:S added
Actions #24

Updated by okurz 16 days ago

Was running in CI now with 1000 runs each based on https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/tree/refs/heads/fix/instability with relevant changes in
https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/blob/refs/heads/fix/instability/.github/workflows/ci-tests.yaml

  build:
    runs-on: ${{ matrix.os }}
    strategy:
…
      fail-fast: false
    name: 🐪 Perl ${{ matrix.perl }} on ${{ matrix.os }}
    steps:
…
        run: |
          curl https://raw.githubusercontent.com/okurz/retry/refs/heads/main/count-fail-ratio > count-fail-ratio
          chmod +x count-fail-ratio
          TEST_SHARED=1 TEST_SUBREAPER=1 runs=1000 ./count-fail-ratio cover -test -report codecovbash

https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974 shows that all linux builds had and macos builds fail due to how unix time in seconds is calculated apparently.

maybe surprisingly Perl 5.16 based tests are super stable according to https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683543#step:6:27012

## count-fail-ratio: Run: 1000. Fails: 0. Fail ratio 0±0%. No fails, computed failure probability < .30%

other runs show problem. First problem https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683540#step:6:886

Errors in testing.  Cannot continue.
t/01_run.t (Wstat: 139 Tests: 12 Failed: 0)
  Non-zero wait status: 139

so probably unclean shutdown of the complete test module so that we don't end up with 0/1.

https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683540#step:6:23820 for 5.20 shows

## count-fail-ratio: Run: 876. Fails: 26. Fail ratio 2.96±1.12%

all same failure as above.

5.26 also stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683574#step:6:27012

## count-fail-ratio: Run: 1000. Fails: 0. Fail ratio 0±0%. No fails, computed failure probability < .30%

same for 5.30 https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683578#step:6:27012

5.34 showed "## count-fail-ratio: Run: 1000. Fails: 2. Fail ratio .20±.27%", same for 5.38, same failures as above. 5.40 stable, "latest" stable.

So it's surprising that 5.20 shows such significant problems. Retriggering all runs again to crosscheck: https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974

Actions #25

Updated by okurz 15 days ago

Surprisingly again 5.16 stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871863#step:6:27012, 5.20 24 fails with "(Wstat: 139 Tests: 12 Failed: 0)" https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871854#step:6:27156, 5.26 stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871866, 5.30 stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871865#step:6:27012, 5.34 3 fails with "(Wstat: 139 Tests: 12 Failed: 0)" https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871868, 5.38 one fail with "(Wstat: 139 (Signal: SEGV, dumped core) Tests: 12 Failed: 0)" https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871873, 5.40 stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871882#step:6:27012, latest stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39864871886#step:6:27012 so quite different to above so the Perl version does not seem to have an impact but likely the load on the node running tests, e.g. less load likely producing no failures and higher load more likely to cause problems.

Code "139" means segmentation fault so I assume either in this module or any other module we use unclean memory handling during termination.

Now running again rebased after tinita's latest changes as well as with while (my $pid = waitpid(-1, WNOHANG) > 0) { … as well as w/o coverage collection: https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14252642847

Actions #26

Updated by okurz 14 days ago

  • Copied to action #180026: [sporadic] t/01_run.t can end with segfault in https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ CI tests added
Actions #27

Updated by okurz 14 days ago · Edited

Moving segfault fix to #180026 . For current ticket we should be ok to live with a workaround of retrying: https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/79

Actions #28

Updated by okurz 10 days ago

  • Due date deleted (2025-04-11)
  • Status changed from Feedback to Resolved

https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/79 merged. With that with retries we should have sufficient stability. If there are more specific sporadic unit test failures we can apply retry there as well.

Actions

Also available in: Atom PDF