Project

General

Profile

Actions

action #108953

closed

action #107062: Multiple failures due to network issues

[tools] Performance issues in some s390 workers

Added by jlausuch over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Infrastructure
Start date:
2022-03-25
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

This ticket is to collect examples of jobs that are failing due to some performance degradation, specially s390x workers.

Installation jobs:
(some of these issues seem to be a slow key press so it doesn't reach the target needle on time):

Other jobs:

Boot failures:


Related issues 3 (2 open1 closed)

Related to openQA Project (public) - action #106685: Test using svirt backend incomplete with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection timed out":retryNew

Actions
Related to openQA Infrastructure (public) - action #108845: Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms size:MResolvednicksinger2022-03-24

Actions
Related to openQA Infrastructure (public) - action #108266: grenache: script_run() commands randomly time out since server room moveNew2022-03-14

Actions
Actions #1

Updated by okurz over 2 years ago

  • Related to action #106685: Test using svirt backend incomplete with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection timed out":retry added
Actions #2

Updated by okurz over 2 years ago

  • Parent task set to #107062

@jlausuch I think it's important to add relations and also I am adding your generic "network problems" ticket as parent to give more context

Actions #3

Updated by jlausuch over 2 years ago

okurz wrote:

@jlausuch I think it's important to add relations and also I am adding your generic "network problems" ticket as parent to give more context

Thanks!

Actions #4

Updated by maritawerner over 2 years ago

What is the correct label here? Infrastructure?

Actions #5

Updated by okurz over 2 years ago

  • Related to action #108845: Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms size:M added
Actions #6

Updated by okurz over 2 years ago

  • Subject changed from Performance issues in some s390 workers to [tools] Performance issues in some s390 workers
  • Category set to Infrastructure
  • Status changed from Workable to Blocked
  • Assignee set to okurz
  • Target version set to Ready

yes, could be. I am taking it for [tools]. Blocked by #108845

Actions #7

Updated by jlausuch over 2 years ago

  • Related to action #108266: grenache: script_run() commands randomly time out since server room move added
Actions #8

Updated by jlausuch over 2 years ago

This could be duplicate of #108266

Actions #9

Updated by okurz over 2 years ago

  • Status changed from Blocked to Resolved

As the main problem was identified in #108845 and fixed I checked the results of all latest jobs in the scenarios of the original job failures and found 10 stable jobs showing no network problems (one failure to write the chrony config file, not related to network performance). I am confident this specific problem is resolved now.

Actions #10

Updated by jlausuch over 2 years ago

okurz wrote:

As the main problem was identified in #108845 and fixed I checked the results of all latest jobs in the scenarios of the original job failures and found 10 stable jobs showing no network problems (one failure to write the chrony config file, not related to network performance). I am confident this specific problem is resolved now.

Agree. Thanks!

Actions #11

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_scc_sle15sp3_ha_alpha_node02
https://openqa.suse.de/tests/8557057#step/patch_sle/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions

Also available in: Atom PDF