Project

General

Profile

Actions

action #180110

open

coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA

coordination #175515: [epic] incomplete jobs with "Failed to find an available port: Address already in use"

[sporadic] auto_review:"Failed to find an available port: Address already in use":retry, produces incomplete jobs on OSD, multiple machines

Added by mkittler about 1 month ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Regressions/Crashes
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

This error message is caused by leftover QEMU processes. This ticket is a continuation of ticket #170209. As part of that ticket we:

Acceptance criteria

  • AC1: We know why there are sometimes still leftover QEMU processes and RWP is able to terminate them as far as possible.
  • AC2: The worker does not run further openQA jobs if there are leftover QEMU processes so we don't end up with incomplete jobs in case a process is stuck for good (and instead an alert fires due to the broken/unavailable worker so we can take care of the situation manually).

Suggestions

  • Maybe there are more improvements to make in RWP, e.g. fixing some race condition.
  • There must be something wrong with the self-check. Maybe implementing a fullstack test for that feature would help figuring out what. Maybe spawning multiple worker instances locally using the same pool directory (and hence will conflict with each other) also helps reproducing this issue.

Related issues 3 (1 open2 closed)

Has duplicate openQA Project (public) - action #180641: [sporadic] Tests fail with auto_review:"hostfwd.*Could not set up host forwarding rule":retryResolvedmkittler2025-04-09

Actions
Follows openQA Project (public) - action #170209: [sporadic] auto_review:"Failed to find an available port: Address already in use":retry, produces incomplete jobs on OSD, multiple machines size:MResolvedmkittler2024-11-25

Actions
Copied to openQA Project (public) - action #180116: Do not run openQA jobs if there are leftover QEMU processesNew2025-04-07

Actions
Actions #1

Updated by mkittler about 1 month ago

  • Due date set to 2024-11-26
  • Start date changed from 2025-04-07 to 2024-11-26
  • Follows action #170209: [sporadic] auto_review:"Failed to find an available port: Address already in use":retry, produces incomplete jobs on OSD, multiple machines size:M added
Actions #2

Updated by okurz about 1 month ago

  • Due date deleted (2024-11-26)
  • Category set to Regressions/Crashes
  • Target version set to Tools - Next
  • Start date deleted (2024-11-26)
  • Parent task set to #175515
Actions #3

Updated by okurz about 1 month ago

  • Copied to action #180116: Do not run openQA jobs if there are leftover QEMU processes added
Actions #4

Updated by mkittler about 1 month ago

  • Has duplicate action #180641: [sporadic] Tests fail with auto_review:"hostfwd.*Could not set up host forwarding rule":retry added
Actions

Also available in: Atom PDF