Project

General

Profile

Actions

action #170209

open

coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA

coordination #175515: [epic] incomplete jobs with "Failed to find an available port: Address already in use"

[sporadic] auto_review:"Failed to find an available port: Address already in use":retry, produces incomplete jobs on OSD, multiple machines size:M

Added by nicksinger 4 months ago. Updated 1 day ago.

Status:
Workable
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-11-25
Due date:
% Done:

0%

Estimated time:

Description

Observation

I received an alert mail about an incomplete job: https://openqa.suse.de/tests/15996418
It fails with:

[2024-11-25T00:19:57.525877Z] [warn] [pid:103270] !!! : qemu-system-x86_64: -vnc :102,share=force-shared: Failed to find an availabale port: Address already in use

I asked in Slack and @tinita observed the same since "a few hours" - apparently all on worker39

https://openqa.suse.de/admin/workers/2898 shows that this started about 12 hours ago with this job: https://openqa.suse.de/tests/15995445
The qemu-system-x86_64 process 1963 is running since 12 hours with -vnc :102,share=force-shared.

Acceptance criteria

  • AC1: Affected jobs are restarted automatically
  • AC2: We have a better understanding of situations where this can happen (if at all)

Suggestion

  • Check one more time for bugs – also consider testing (!) – in the code for handling leftover QEMU processes
  • Check one more time for bugs – also consider testing (!) – in terminating/killing the process group of isotovideo (in Mojo::…::ReadWriteProcess)
  • Add/enable debug logging when starting/stopping isotovideo (maybe on ReadWriteProcess level)
  • Consider starting/stopping isotovideo in a process group with low-level Perl code to replicate the error and investigate and potentially replace the problematic Mojo::ReadWriteProcess?

Related issues 2 (0 open2 closed)

Related to openQA Project (public) - action #166025: qemu process hanging indefinitely blocking the corresponding VNC port size:SResolvedmkittler2024-08-30

Actions
Related to openQA Project (public) - action #175464: jobs incomplete with auto_review:"setup failure: isotovideo can not be started"Resolvedokurz2025-01-15

Actions
Actions

Also available in: Atom PDF