Project

General

Profile

action #108953

action #107062: Multiple failures due to network issues

[tools] Performance issues in some s390 workers

Added by jlausuch 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
Start date:
2022-03-25
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

This ticket is to collect examples of jobs that are failing due to some performance degradation, specially s390x workers.

Installation jobs:
(some of these issues seem to be a slow key press so it doesn't reach the target needle on time):

Other jobs:

Boot failures:


Related issues

Related to openQA Project - action #106685: Test using svirt backend incomplete with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection timed out":retryNew

Related to openQA Infrastructure - action #108845: Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms size:MResolved2022-03-242022-04-15

Related to openQA Infrastructure - action #108266: grenache: script_run() commands randomly time out since server room moveNew2022-03-14

History

#1 Updated by okurz 3 months ago

  • Related to action #106685: Test using svirt backend incomplete with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection timed out":retry added

#2 Updated by okurz 3 months ago

  • Parent task set to #107062

jlausuch I think it's important to add relations and also I am adding your generic "network problems" ticket as parent to give more context

#3 Updated by jlausuch 3 months ago

okurz wrote:

jlausuch I think it's important to add relations and also I am adding your generic "network problems" ticket as parent to give more context

Thanks!

#4 Updated by maritawerner 3 months ago

What is the correct label here? Infrastructure?

#5 Updated by okurz 3 months ago

  • Related to action #108845: Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms size:M added

#6 Updated by okurz 3 months ago

  • Subject changed from Performance issues in some s390 workers to [tools] Performance issues in some s390 workers
  • Category set to Infrastructure
  • Status changed from Workable to Blocked
  • Assignee set to okurz
  • Target version set to Ready

yes, could be. I am taking it for [tools]. Blocked by #108845

#7 Updated by jlausuch 3 months ago

  • Related to action #108266: grenache: script_run() commands randomly time out since server room move added

#8 Updated by jlausuch 3 months ago

This could be duplicate of #108266

#9 Updated by okurz 3 months ago

  • Status changed from Blocked to Resolved

As the main problem was identified in #108845 and fixed I checked the results of all latest jobs in the scenarios of the original job failures and found 10 stable jobs showing no network problems (one failure to write the chrony config file, not related to network performance). I am confident this specific problem is resolved now.

#10 Updated by jlausuch 3 months ago

okurz wrote:

As the main problem was identified in #108845 and fixed I checked the results of all latest jobs in the scenarios of the original job failures and found 10 stable jobs showing no network problems (one failure to write the chrony config file, not related to network performance). I am confident this specific problem is resolved now.

Agree. Thanks!

#11 Updated by openqa_review 2 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_scc_sle15sp3_ha_alpha_node02
https://openqa.suse.de/tests/8557057#step/patch_sle/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Also available in: Atom PDF