action #176148: Ensure https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ has 100% passing unit tests in CI size:S - openQA Project (public) - openSUSE Project Management Tool

Actions

action #176148

closed

coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA

coordination #175515: [epic] incomplete jobs with "Failed to find an available port: Address already in use"

Ensure https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ has 100% passing unit tests in CI size:S

Added by livdywan 4 months ago. Updated about 1 month ago.

Status:

Resolved

Priority:

Low

Assignee:

okurz

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2025-01-24

Due date:

% Done:

Estimated time:

Tags:

rwp

Description

Motivation¶

The regression tracked down in #175518 was allowed to go undetected in part because Mojo-IOLoop-ReadWriteProcess currently has failing unit tests in the default branch in CI, and the repo is not considered well-maintained and easy to work with.

Acceptance Criteria¶

AC1: https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/actions?query=branch%3Amaster shows consistent stable test results
AC2: Testsuite succeeds on linux
AC3: Code coverage is available for a usual Linux environment

Suggestions¶

Linux being better an opensuse os than ubuntu
Ensure Ensure 100% passing unit tests of CI in Mojo-IOLoop-ReadWriteProcess
Check latest failures from https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/actions?query=branch%3Amaster, try to reproduce them locally or within CI, fix 'em
Ensure relevant tests are included in all CI runs
- Some tests are currently MacOS-specific

Related issues 3 (1 open — 2 closed)

Actions

Copy link

Updated by livdywan 4 months ago

Parent task changed from #175518 to #175515

Actions

Copy link

Updated by okurz 4 months ago

Assignee set to okurz

Actions

Copy link

Updated by okurz 4 months ago

Target version set to Ready

Actions

Copy link

Updated by okurz 4 months ago

Tags set to rwp
Description updated (diff)
Category set to Regressions/Crashes
Assignee deleted (~~okurz~~)
Target version changed from Ready to future

Actions

Copy link

Updated by okurz about 2 months ago

Target version changed from future to Ready

Actions

Copy link

Updated by dheidler about 2 months ago

Subject changed from Ensure https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ has 100% passing unit tests in CI to Ensure https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ has 100% passing unit tests in CI size:S
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by dheidler about 2 months ago

Status changed from Workable to In Progress
Assignee set to dheidler

Actions

Copy link

Updated by openqa_review about 2 months ago

Due date set to 2025-04-11

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

Updated by dheidler about 2 months ago

Not sure how we could run opensuse there - github only provides ubuntu, mac and windows images: https://github.com/actions/runner-images

Actions

Copy link

#10

Updated by dheidler about 2 months ago

Assignee deleted (~~dheidler~~)

Actions

Copy link

#11

Updated by okurz about 2 months ago

dheidler wrote in #note-9:

Not sure how we could run opensuse there - github only provides ubuntu, mac and windows images: https://github.com/actions/runner-images

We can use container-in-container like in os-autoinst, see
https://github.com/os-autoinst/os-autoinst/blob/master/.github/workflows/ci.yml#L14

Actions

Copy link

#12

Updated by okurz about 2 months ago

Assignee set to okurz

Actions

Copy link

#13

Updated by okurz about 2 months ago · Edited

Status changed from In Progress to Feedback

The main fix was done by tinita with https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/71. I have updates, smaller fixes and simplification for CI in

https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/72

And related improvements:

I also observed that t/01_run.t sporadically failed in CI so I checked if this can be reproduced easily. I got

## count-fail-ratio: Run: 2000. Fails: 0. Fail ratio 0±0%. No fails, computed failure probability < .15%
## mean runtime: 2484±182.24 ms

so no joy reproducing

Actions

Copy link

#14

Updated by okurz about 2 months ago

All three merged. Awaiting results in master branch

Actions

Copy link

#15

Updated by okurz about 2 months ago

Status changed from Feedback to In Progress

https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/actions?query=branch%3Amaster shows that most tests pass but some fail. I will try to reproduce with coverage enabled or directly within github actions

Actions

Copy link

#16

Updated by tinita about 2 months ago

Related to action #178186: [sporadic] Failing OBS package check t/12_mocked_container.t for perl-Mojo-IOLoop-ReadWriteProcess on aarch64 size:S added

Actions

Copy link

#17

Updated by okurz about 2 months ago

https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/73 to fix the status branch. Preparing a release as well as discussed in the daily https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/67

Actions

Copy link

#18

Updated by okurz about 1 month ago

Priority changed from Normal to Low

Actions

Copy link

#19

Updated by okurz about 1 month ago

Assignee changed from okurz to szarate

Now first we created a release 1.0.0 and uploaded to https://metacpan.org/pod/Mojo::IOLoop::ReadWriteProcess with great help from tinita but I am missing "authorized releasers" page from https://metacpan.org/dist/Mojo-IOLoop-ReadWriteProcess/permissions , asked szarate

Actions

Copy link

#20

Updated by okurz about 1 month ago

Status changed from In Progress to Feedback
Assignee changed from szarate to okurz

Actions

Copy link

#21

Updated by okurz about 1 month ago · Edited

Trying to reproduce the failure locally with coverage enabled also no joy with runs=2000 env PERL5OPT="-MDevel::Cover" count-fail-ratio prove -l --timer t/01_run.t

## count-fail-ratio: Run: 170. Fails: 0. Fail ratio 0±0%. No fails, computed failure probability < 1.76%
## mean runtime: 4453±332.37 ms

Running in CI now https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14217779071

Actions

Copy link

#22

Updated by szarate about 1 month ago

okurz wrote in #note-19:

Now first we created a release 1.0.0 and uploaded to https://metacpan.org/pod/Mojo::IOLoop::ReadWriteProcess with great help from tinita but I am missing "authorized releasers" page from https://metacpan.org/dist/Mojo-IOLoop-ReadWriteProcess/permissions , asked szarate

Asked Ettore via Telegram to add okurz as co-mantainer or transfer ownership

Actions

Copy link

#23

Updated by okurz about 1 month ago

Copied to action #179906: Proper release with current content for https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ size:S added

Actions

Copy link

#24

Updated by okurz about 1 month ago

Was running in CI now with 1000 runs each based on https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/tree/refs/heads/fix/instability with relevant changes in
https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/blob/refs/heads/fix/instability/.github/workflows/ci-tests.yaml

  build:
    runs-on: ${{ matrix.os }}
    strategy:
…
      fail-fast: false
    name: 🐪 Perl ${{ matrix.perl }} on ${{ matrix.os }}
    steps:
…
        run: |
          curl https://raw.githubusercontent.com/okurz/retry/refs/heads/main/count-fail-ratio > count-fail-ratio
          chmod +x count-fail-ratio
          TEST_SHARED=1 TEST_SUBREAPER=1 runs=1000 ./count-fail-ratio cover -test -report codecovbash

https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974 shows that all linux builds had and macos builds fail due to how unix time in seconds is calculated apparently.

maybe surprisingly Perl 5.16 based tests are super stable according to https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683543#step:6:27012

## count-fail-ratio: Run: 1000. Fails: 0. Fail ratio 0±0%. No fails, computed failure probability < .30%

other runs show problem. First problem https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683540#step:6:886

Errors in testing.  Cannot continue.
t/01_run.t (Wstat: 139 Tests: 12 Failed: 0)
  Non-zero wait status: 139

so probably unclean shutdown of the complete test module so that we don't end up with 0/1.

https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683540#step:6:23820 for 5.20 shows

## count-fail-ratio: Run: 876. Fails: 26. Fail ratio 2.96±1.12%

all same failure as above.

5.26 also stable https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683574#step:6:27012

## count-fail-ratio: Run: 1000. Fails: 0. Fail ratio 0±0%. No fails, computed failure probability < .30%

same for 5.30 https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974/job/39846683578#step:6:27012

5.34 showed "## count-fail-ratio: Run: 1000. Fails: 2. Fail ratio .20±.27%", same for 5.38, same failures as above. 5.40 stable, "latest" stable.

So it's surprising that 5.20 shows such significant problems. Retriggering all runs again to crosscheck: https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14220387974

Actions

Copy link

#25

Updated by okurz about 1 month ago

Code "139" means segmentation fault so I assume either in this module or any other module we use unclean memory handling during termination.

Now running again rebased after tinita's latest changes as well as with while (my $pid = waitpid(-1, WNOHANG) > 0) { … as well as w/o coverage collection: https://github.com/okurz/Mojo-IOLoop-ReadWriteProcess/actions/runs/14252642847

Actions

Copy link

#26

Updated by okurz about 1 month ago

Copied to action #180026: [sporadic] t/01_run.t can end with segfault in https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ CI tests added

Actions

Copy link

#27

Updated by okurz about 1 month ago · Edited

Moving segfault fix to #180026 . For current ticket we should be ok to live with a workaround of retrying: https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/79

Actions

Copy link

#28

Updated by okurz about 1 month ago

Due date deleted (~~2025-04-11~~)
Status changed from Feedback to Resolved

https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/pull/79 merged. With that with retries we should have sufficient stability. If there are more specific sporadic unit test failures we can apply retry there as well.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #176148

Ensure https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/ has 100% passing unit tests in CI size:S

Motivation¶

Acceptance Criteria¶

Suggestions¶

Updated by livdywan 4 months ago

Updated by okurz 4 months ago

Updated by okurz 4 months ago

Updated by okurz 4 months ago

Updated by okurz about 2 months ago

Updated by dheidler about 2 months ago

Updated by dheidler about 2 months ago

Updated by openqa_review about 2 months ago

Updated by dheidler about 2 months ago

Updated by dheidler about 2 months ago

Updated by okurz about 2 months ago

Updated by okurz about 2 months ago

Updated by okurz about 2 months ago · Edited

Updated by okurz about 2 months ago

Updated by okurz about 2 months ago

Updated by tinita about 2 months ago

Updated by okurz about 2 months ago

Updated by okurz about 1 month ago

Updated by okurz about 1 month ago

Updated by okurz about 1 month ago

Updated by okurz about 1 month ago · Edited

Updated by szarate about 1 month ago

Updated by okurz about 1 month ago

Updated by okurz about 1 month ago

Updated by okurz about 1 month ago

Updated by okurz about 1 month ago

Updated by okurz about 1 month ago · Edited

Updated by okurz about 1 month ago