action #126188: [openQA][infra][worker][sut] openQA infra performance fluctuates to the level that that leads to tangible test run failure size:M - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #126188

closed

[openQA][infra][worker][sut] openQA infra performance fluctuates to the level that that leads to tangible test run failure size:M

Added by waynechen55 almost 2 years ago. Updated over 1 year ago.

Status:

Resolved

Priority:

High

Assignee:

mkittler

Category:

Support

Target version:

Ready

Start date:

2023-03-20

Due date:

% Done:

Estimated time:

Tags:

infra

Description

Observation¶

It seems that current openQA infra performance is still not good enough to run openQA test suites smoothly, because there are still lots of test runs failed due to environment issue, for example:
grenache-1:13/gonzo failed at host_upgrade_generate_run_file. Can not resolve host grenache-1.qa.suse.de
grenache-1:10/openqaipmi5 failed at boot_from_pxe Error connecting to root@openqaipmi5.qa.suse.de
grenache-1:12/kermit failed at boot_from_pxe Can not find kernel image
grenache-1:19/amd-zen3-gpu-sut1-1 failed at failed at host_upgrade_generate_run_file. Can not resolve host grenache-1.qa.suse.de
grenache-1:10/openqaipmi5 failed at boot_from_pxe Command timed out
grenache-1:15/scooter failed at update_package Can not resolve host grenache-1.qa.suse.de

By the way, I do not use openQA report link directly because here many different cases are involved.

Steps to reproduce¶

Run virtualization openQA test suites

Impact¶

Virtualization openQA test run can not pass or fail smoothly and in a timely manner.

Problem¶

Generally speaking, this looks like infra or environment issue.

Suggestion¶

Network segmentation
firewall/smart routes traversed
infra performance
infra configuration

Workaround¶

n/a

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by waynechen55 almost 2 years ago

Subject changed from [openQA][infra][worker][sut] openQA infra performance fluctuates to the level that that leads to test run failure tangible to [openQA][infra][worker][sut] openQA infra performance fluctuates to the level that that leads to tangible test run failure

Actions

Copy link

Updated by waynechen55 almost 2 years ago

Priority changed from High to Urgent

Actions

Copy link

Updated by mkittler almost 2 years ago

We have two different issues here (which you normally shouldn't mix):

Tests using the IPMI backend are failing very soon: https://openqa.suse.de/tests/10728059#step/boot_from_pxe/11
- Not sure yet why that is.
Tests failing to upload logs later on: https://openqa.suse.de/tests/10728055#step/update_package/8
- Previous uploads work. It looks like the SUT might not be ready at the point the upload is attempted (although the SSH connection to it could be established).

Actions

Copy link

Updated by livdywan almost 2 years ago

Subject changed from [openQA][infra][worker][sut] openQA infra performance fluctuates to the level that that leads to tangible test run failure to [openQA][infra][worker][sut] openQA infra performance fluctuates to the level that that leads to tangible test run failure size:M
Status changed from New to Feedback
Assignee set to mkittler
Target version set to Ready

Actions

Copy link

Updated by mkittler almost 2 years ago

Regarding openqaipmi5.qa.suse.de: It looks like https://openqa.suse.de/tests/10731111 is running now. Also previous jobs that have already finished seemed to get past the setup like https://openqa.suse.de/tests/10730149. That job then ended up running into the second issue (uploading doesn't work after reboot). Not sure whether we need to do anything regarding the setup issue considering it seems to work again.

And the uploading issue really looks like a network issue within the SUT itself, especially since the test actually was able to upload some other logs.

Actions

Copy link

Updated by okurz over 1 year ago

One important point to mention is that openqaipmi5 is located in NUE1-SRV2 so it is not just a simple problem related to the move to FC Basement

Actions

Copy link

Updated by mkittler over 1 year ago

I say it is not about our setup because:

Other uploads as part of that job worked. So the firewall is generally not blocking traffic.
It happens shortly after rebooting the SUT indicating hostname resolution is just not ready at this point.

So can you check the state of the SUT, e.g. using the developer mode?

Actions

Copy link

Updated by MDoucha over 1 year ago

My first guess would be that one of the switches in the server room is running in hub mode. Most likely the one that IPMI SUTs are plugged into. That'll cause serious performance issues on busy networks.

Actions

Copy link

Updated by waynechen55 over 1 year ago

mkittler wrote:

I say it is not about our setup because:

Other uploads as part of that job worked. So the firewall is generally not blocking traffic.

It happens shortly after rebooting the SUT indicating hostname resolution is just not ready at this point.

So can you check the state of the SUT, e.g. using the developer mode?

What developer mode can do for this case ? I can not figure it out.

And which case supports this judgement ?

It happens shortly after rebooting the SUT indicating hostname resolution is just not ready at this point.

I have been keeping an eye on test runs. Still can not have a clue.

Actions

Copy link

#10

Updated by mkittler over 1 year ago

What developer mode can do for this case ? I can not figure it out.

You could make the test pause at host_upgrade_generate_run_file and investigate the networking problems on the SUT manually.

And which case supports this judgement?

If it was a firewall issue or the command server of os-autoinst would be broken then it would likely also affect other uploads. However, other uploads (e.g. in the previous test module update_package) work fine (e.g. https://openqa.suse.de/tests/10730149#step/update_package/63). So since a general problem with the setup is unlikely this speaks for some issue within the SUT (after rebooting it, considering it is being rebooted just in the test module reboot_and_wait_up_normal before the failure). Maybe it is the simple case of hostname resolution simply not being ready or there's some other misconfiguration. (This is mainly about the case https://openqa.suse.de/tests/10730149#step/host_upgrade_generate_run_file/4. I've seen other types of failures as well. They also look like something's broken on the test side, though.)

Actions

Copy link

#11

Updated by waynechen55 over 1 year ago

mkittler wrote:

What developer mode can do for this case ? I can not figure it out.

You could make the test pause at host_upgrade_generate_run_file and investigate the networking problems on the SUT manually.

And which case supports this judgement?

If it was a firewall issue or the command server of os-autoinst would be broken then it would likely also affect other uploads. However, other uploads (e.g. in the previous test module update_package) work fine (e.g. https://openqa.suse.de/tests/10730149#step/update_package/63). So since a general problem with the setup is unlikely this speaks for some issue within the SUT (after rebooting it, considering it is being rebooted just in the test module reboot_and_wait_up_normal before the failure). Maybe it is the simple case of hostname resolution simply not being ready or there's some other misconfiguration. (This is mainly about the case https://openqa.suse.de/tests/10730149#step/host_upgrade_generate_run_file/4. I've seen other types of failures as well. They also look like something's broken on the test side, though.)

I will do as you instruct
I think we'd better use type_command => 1 with script_output API in our tests to avoid downloading it from worker machine to decrease failure.

Actions

Copy link

#12

Updated by waynechen55 over 1 year ago

And can not establish sol session to some machines at the moment:
waynechen:~ # ipmitool -I lanplus -C 3 -H sp.gonzo.qa.suse.de xxxxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session
waynechen:~ # ipmitool -I lanplus -C 3 -H sp.kermit.qa.suse.de xxxxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session
waynechen:~ # ipmitool -I lanplus -C 3 -H amd-zen3-gpu-sut1-sp.qa.suse.de xxxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session
waynechen:~ # ipmitool -I lanplus -C 3 -H sp.scooter.qa.suse.de xxxxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session

Actions

Copy link

#13

Updated by mgriessmeier over 1 year ago

waynechen55 wrote:

And can not establish sol session to some machines at the moment:
waynechen:~ # ipmitool -I lanplus -C 3 -H sp.gonzo.qa.suse.de xxxxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session
waynechen:~ # ipmitool -I lanplus -C 3 -H sp.kermit.qa.suse.de xxxxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session
waynechen:~ # ipmitool -I lanplus -C 3 -H amd-zen3-gpu-sut1-sp.qa.suse.de xxxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session
waynechen:~ # ipmitool -I lanplus -C 3 -H sp.scooter.qa.suse.de xxxxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session

There have been reports of general network issues at the moment in Frankencampus office (that's also where those machines are)
I cannot ping any host in this network, but it says "Destination unreachable" - which supports my assumption that som ethings wrong with the whole network.

Actions

Copy link

#14

Updated by mgriessmeier over 1 year ago

works again (for all but amd-zen3...)

matthi@paramore:~(:|✔) # for i in gonzo kermit scooter; do ipmitool -I lanplus -C 3 -H sp.$i.qa.suse.de -U xxx-P xxx chassis power status; done                  
Chassis Power is on
Chassis Power is on
Chassis Power is on
matthi@paramore:~(:|✔) # ipmitool -I lanplus -C 3 -H amd-zen3-gpu-sut1-sp.qa.suse.de -U xxx -P xxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session

Actions

Copy link

#15

Updated by waynechen55 over 1 year ago

@mkittler

I did some investigation on three machines, scooter, ix64ph1087 and amd-zen3 with the same test suite. You can refer to this one https://openqa.suse.de/tests/10797074#step/host_upgrade_step2_run/14.

The "Can not resolve host grenache-1.qa.suse.de" issue still happens to scooter. For the error you can refer to https://openqa.suse.de/tests/10797074#step/host_upgrade_step2_run/14.
It passed host_upgrade_generate_run_file step, because it uses type_command => 1 in here. But it still failed in the next step host_upgrade_step2_run which needs to upload log again, so test run failed and scooter was powered off.
In steps host_upgrade_generate_run_file and host_upgrade_step2_run, I also checked manually and I can not ping grenache-1 and openqa.suse.de from inside scooter.
Next I powered scooter on again, and login to it manually. I still can not ping grenache-1 and openqa.suse.de form inside scooter.
At last, I found its /etc/resolv.conf is empty as below:

### /etc/resolv.conf file autogenerated by netconfig!
#
# Before you change this file manually, consider to define the
# static DNS configuration using the following variables in the
# /etc/sysconfig/network/config file:
#     NETCONFIG_DNS_STATIC_SEARCHLIST
#     NETCONFIG_DNS_STATIC_SERVERS
#     NETCONFIG_DNS_FORWARDER
# or disable DNS configuration updates via netconfig by setting:
#     NETCONFIG_DNS_POLICY=''
#
# See also the netconfig(8) manual page and other documentation.
#
# Note: Manual change of this file disables netconfig too, but
# may get lost when this file contains comments or empty lines
# only, the netconfig settings are same with settings in this
# file and in case of a "netconfig update -f" call.
#
### Please remove (at least) this line when you modify the file!
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz

After adding some nameserver entries into this file, I can ping grenache-1 and openqa.suse.de successfully.
So I think /etc/resolv.conf was emptied after reboot and it can not be populated successfully again.

Actions

Copy link

#16

Updated by cachen over 1 year ago

Next I powered scooter on again, and login to it manually. I still can not ping grenache-1 and openqa.suse.de form inside scooter.

At last, I found its /etc/resolv.conf is empty as below:
### /etc/resolv.conf file autogenerated by netconfig!
#
# Before you change this file manually, consider to define the
# static DNS configuration using the following variables in the
# /etc/sysconfig/network/config file:
#     NETCONFIG_DNS_STATIC_SEARCHLIST
#     NETCONFIG_DNS_STATIC_SERVERS
#     NETCONFIG_DNS_FORWARDER
# or disable DNS configuration updates via netconfig by setting:
#     NETCONFIG_DNS_POLICY=''
#
# See also the netconfig(8) manual page and other documentation.
#
# Note: Manual change of this file disables netconfig too, but
# may get lost when this file contains comments or empty lines
# only, the netconfig settings are same with settings in this
# file and in case of a "netconfig update -f" call.
#
### Please remove (at least) this line when you modify the file!
search qe.nue2.suse.org nue2.suse.org suse.de arch.suse.de nue.suse.com suse.cz
After adding some nameserver entries into this file, I can ping grenache-1 and openqa.suse.de successfully.

So I think /etc/resolv.conf was emptied after reboot and it can not be populated successfully again.

Then it looks like sometimes SUT cannot get correct network dns setup from dhcp server? Can we check the dhcp server to see if anything can block it? Thanks a lot!

Actions

Copy link

#17

Updated by mkittler over 1 year ago

Status changed from Feedback to In Progress

So I think /etc/resolv.conf was emptied after reboot and it can not be populated successfully again.

This is exactly the kind of network misconfiguration within the SUT I was getting at.

Then it looks like sometimes SUT cannot get correct network dns setup from dhcp server?

Either the DHCP/DNS setup in the SUT is misconfigured or it is the DHCP server, indeed. I would suspect the former considering we don't have general issues with DHCP.

What is the VLAN of scooter? The racktables page (https://racktables.suse.de/index.php?object_id=10124&page=object&tab=default) lacks this information. Considering other hosts in the same rack are in VLAN 12 I suspect that it is VLAN 12. That would mean the relevant DHCP server is hosted on qanet.qa.suse.de. The IP of scooter-1.qa.suse.de itself resolves to 10.168.192.87 at this point. I've checked logs on qanet via journalctl | grep -i 10.168.192.87 but couldn't find anything. Maybe that DHCP server is not used after all. I would so some further digging on the SUT/scooter itself. Does it get a network link at all? Is it even configured to make DHCP requests? If so, do you see any problems from the SUT's side?

It looks like scooter-1 is even online and accessible via SSH so I'll have a look to answer those questions.

Actions

Copy link

#18

Updated by mkittler over 1 year ago

Status changed from In Progress to Feedback

scooter is currently executing this job: https://openqa.suse.de/tests/10818998

It is at module sriov_network_card_pci_passthrough which comes after reboot_after_installation. The networking setup is currently working just fine. This test was also able to upload logs to grenache-1 after the reboot. So there is not really much to investigate. This means that our DHCP server works in general (unless this test run applies some workaround).

Actions

Copy link

#19

Updated by okurz over 1 year ago

Parent task set to #115502

mkittler wrote:

What is the VLAN of scooter? The racktables page (https://racktables.suse.de/index.php?object_id=10124&page=object&tab=default) lacks this information. Considering other hosts in the same rack are in VLAN 12 I suspect that it is VLAN 12. That would mean the relevant DHCP server is hosted on qanet.qa.suse.de. The IP of scooter-1.qa.suse.de itself resolves to 10.168.192.87 at this point. I've checked logs on qanet via journalctl | grep -i 10.168.192.87 but couldn't find anything. Maybe that DHCP server is not used after all. I would so some further digging on the SUT/scooter itself.

That can not be correct. VLAN 12 is a VLAN that is only used within NUE1, i.e. Maxtorhof. The rack with the machine is part of FC Basement. Do you remember when we did the racktables walkthrough over machines and networks? We should revisit, update and complete the information if the entries state the wrong or missing VLAN. See https://wiki.suse.net/index.php/SUSE-Quality_Assurance/Labs#Current_management_of_FC_Basement_lab_network_config for information regarding the network of FC Basement.

EDIT: Completing the information regarding IP and network is part of #124637

Does it get a network link at all? Is it even configured to make DHCP requests? If so, do you see any problems from the SUT's side?

It looks like scooter-1 is even online and accessible via SSH so I'll have a look to answer those questions.

Actions

Copy link

#20

Updated by okurz over 1 year ago

Project changed from openQA Infrastructure (public) to openQA Project (public)
Category set to Support
Priority changed from Urgent to High

@waynechen55 @cachen We discussed this in our daily infra call 2023-03-30. As far as we can see the DHCP server within FC Basement works as expected. We do not currently have administrative access to the DHCP server. This is planned to be worked on in #125450 and the according SD-tickets linked in there (internal reference for myself: Specifically https://sd.suse.com/servicedesk/customer/portal/1/SD-113959). As explained by mkittler e.g. in #126188#note-10 we assume the problem is within the SUT and the test design where the SUT might not have consistent access to the network yet immediately after bootup.

We suggest you look into that from test perspective and either debug what is visible in system journals of the bootup process or adjust the test code to ensure the test execution only continues after the network initialization is finished.

With that I am reducing prio to High as we feel we did what we could provide from our side so far. Keeping mkittler assigned to support you in the investigation process and answer questions as needed if any.

Actions

Copy link

#21

Updated by cachen over 1 year ago

okurz wrote:

@waynechen55 @cachen We discussed this in our daily infra call 2023-03-30. As far as we can see the DHCP server within FC Basement works as expected. We do not currently have administrative access to the DHCP server. This is planned to be worked on in #125450 and the according SD-tickets linked in there (internal reference for myself: Specifically https://sd.suse.com/servicedesk/customer/portal/1/SD-113959). As explained by 126188 e.g. in #126188#note-10 we assume the problem is within the SUT and the test design where the SUT might not have consistent access to the network yet immediately after bootup.

We suggest you look into that from test perspective and either debug what is visible in system journals of the bootup process or adjust the test code to ensure the test execution only continues after the network initialization is finished.

With that I am reducing prio to High as we feel we did what we could provide from our side so far. Keeping mkittler assigned to support you in the investigation process and answer questions as needed if any.

Understood the challenge if you don't have access permission to dhcp server. Thank you

okurz wrote:

@waynechen55 @cachen We discussed this in our daily infra call 2023-03-30. As far as we can see the DHCP server within FC Basement works as expected. We do not currently have administrative access to the DHCP server. This is planned to be worked on in #125450 and the according SD-tickets linked in there (internal reference for myself: Specifically https://sd.suse.com/servicedesk/customer/portal/1/SD-113959). As explained by mkittler e.g. in #126188#note-10 we assume the problem is within the SUT and the test design where the SUT might not have consistent access to the network yet immediately after bootup.

We suggest you look into that from test perspective and either debug what is visible in system journals of the bootup process or adjust the test code to ensure the test execution only continues after the network initialization is finished.

With that I am reducing prio to High as we feel we did what we could provide from our side so far. Keeping mkittler assigned to support you in the investigation process and answer questions as needed if any.

Understood the challenge if you don't have access permission to dhcp server. Thank you, @okurz @mkittler and @nicksinger for taking care those machines infra stuff and network issues.
I tracked some test jobs, the disconnection between SUT and worker grenache-1.qa.suse.de not reproduce every time. Before tools team can dig into the dhcp/dns settings, just like Oliver suggested, @waynechen55 @xlai let's see if any workaround solution can be placed in test, e.g. add a step to check network setup and correct the nameserver if detecte the missing.

Again, thank you all for the co-work on those challenge tickets, many appreciated!

Actions

Copy link

#22

Updated by waynechen55 over 1 year ago

cachen wrote:

okurz wrote:

@waynechen55 @cachen We discussed this in our daily infra call 2023-03-30. As far as we can see the DHCP server within FC Basement works as expected. We do not currently have administrative access to the DHCP server. This is planned to be worked on in #125450 and the according SD-tickets linked in there (internal reference for myself: Specifically https://sd.suse.com/servicedesk/customer/portal/1/SD-113959). As explained by 126188 e.g. in #126188#note-10 we assume the problem is within the SUT and the test design where the SUT might not have consistent access to the network yet immediately after bootup.

We suggest you look into that from test perspective and either debug what is visible in system journals of the bootup process or adjust the test code to ensure the test execution only continues after the network initialization is finished.

With that I am reducing prio to High as we feel we did what we could provide from our side so far. Keeping mkittler assigned to support you in the investigation process and answer questions as needed if any.

Understood the challenge if you don't have access permission to dhcp server. Thank you

okurz wrote:

@waynechen55 @cachen We discussed this in our daily infra call 2023-03-30. As far as we can see the DHCP server within FC Basement works as expected. We do not currently have administrative access to the DHCP server. This is planned to be worked on in #125450 and the according SD-tickets linked in there (internal reference for myself: Specifically https://sd.suse.com/servicedesk/customer/portal/1/SD-113959). As explained by mkittler e.g. in #126188#note-10 we assume the problem is within the SUT and the test design where the SUT might not have consistent access to the network yet immediately after bootup.

We suggest you look into that from test perspective and either debug what is visible in system journals of the bootup process or adjust the test code to ensure the test execution only continues after the network initialization is finished.

With that I am reducing prio to High as we feel we did what we could provide from our side so far. Keeping mkittler assigned to support you in the investigation process and answer questions as needed if any.

Understood the challenge if you don't have access permission to dhcp server. Thank you, @okurz @mkittler and @nicksinger for taking care those machines infra stuff and network issues.
I tracked some test jobs, the disconnection between SUT and worker grenache-1.qa.suse.de not reproduce every time. Before tools team can dig into the dhcp/dns settings, just like Oliver suggested, @waynechen55 @xlai let's see if any workaround solution can be placed in test, e.g. add a step to check network setup and correct the nameserver if detecte the missing.

Again, thank you all for the co-work on those challenge tickets, many appreciated!

I was considering this yesterday, maybe "netconfig update -f" will do the trick. But this only works if failure is not caused by network breakdown or failure is only caused by temporary dhcp glitch. If there is severe communication issue, nothing will help.
I think the issue really worth further investigation on infra side. Although it can not be reproduced every time, it happens frequently. The latest Build88.1 test run hit the issue again, please refer to example1 and example2.

Actions

Copy link

#23

Updated by okurz over 1 year ago

waynechen55 wrote:

I was considering this yesterday, maybe "netconfig update -f" will do the trick. But this only works if failure is not caused by network breakdown or failure is only caused by temporary dhcp glitch. If there is severe communication issue, nothing will help.

I still doubt there is a breakdown on DHCP server side. It might behave differently and hence making such issue more apparent but I still assume it's a problem that tests try to continue too fast when the SUT is not yet ready.

I think the issue really worth further investigation on infra side. Although it can not be reproduced every time, it happens frequently.

We would look into this issue as soon as Eng-Infra gives us access. Based on previous experiences I assume this will take more months (!). If the issue is happening frequently then I strongly recommend you follow up from test side

Actions

Copy link

#24

Updated by mkittler over 1 year ago

Considering what we've already found out the new ticket #127256 might be the same. It is very explicit about what's not working (nameserver missing from DHCP response) and in the test failures reported here have a matching sympthom. Unfortunately, this doesn't change what @okurz wrote in the last paragraph of the previous comment.

Actions

Copy link

#25

Updated by mkittler over 1 year ago

Related to action #127256: missing nameservers in dhcp response for baremetal machines in NUE-FC-B 2 size:M added

Actions

Copy link

#26

Updated by okurz over 1 year ago

Related to action #122983: [alert] openqa/monitor-o3 failing because openqaworker1 is down size:M added

Actions

Copy link

#27

Updated by openqa_review over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: virt-guest-migration-developing-from-developing-to-developing-kvm-dst@virt-mm-64bit-ipmi
https://openqa.suse.de/tests/10933402#step/guest_migration_dst/1

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released" or "EOL" (End-of-Life)
The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions

Copy link

#28

Updated by livdywan over 1 year ago

Related branch in progress: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16774

Actions

Copy link

#29

Updated by okurz over 1 year ago

https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3456 was merged and is deployed to both our DHCP servers walter1.qe.nue2.suse.org and walter2.qe.nue2.suse.org . We assume this fixes the problem.

Actions

Copy link

#30

Updated by mkittler over 1 year ago

It would be nice if you could confirm that it works now.

Actions

Copy link

#31

Updated by okurz over 1 year ago

Status changed from Feedback to In Progress

@mkittler please check history of openQA jobs. If you find any obvious errors of the same kind then please note that and work on it otherwise resolve the ticket.

Actions

Copy link

#32

Updated by openqa_review over 1 year ago

Due date set to 2023-05-18

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

#33

Updated by mkittler over 1 year ago

Status changed from In Progress to Resolved

I've checked the scenarios from the ticket description. Some tests pass now. I haven't seen any obvious failures due to name resolution in the recent history. Since I haven't seen the problem anymore on openqaworker1 as well I suppose it can be considered resolved.

Actions

Copy link

#34

Updated by xlai over 1 year ago

mkittler wrote:

I've checked the scenarios from the ticket description. Some tests pass now. I haven't seen any obvious failures due to name resolution in the recent history. Since I haven't seen the problem anymore on openqaworker1 as well I suppose it can be considered resolved.

Thanks for the fix! We will also follow the results. If reproduced again, will share info here and reopen the ticket.

Actions

Copy link

#35

Updated by okurz over 1 year ago

Due date deleted (~~2023-05-18~~)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #126188

[openQA][infra][worker][sut] openQA infra performance fluctuates to the level that that leads to tangible test run failure size:M

Observation¶

Steps to reproduce¶

Impact¶

Problem¶

Suggestion¶

Workaround¶

Updated by waynechen55 almost 2 years ago

Updated by waynechen55 almost 2 years ago

Updated by mkittler almost 2 years ago

Updated by livdywan almost 2 years ago

Updated by mkittler almost 2 years ago

Updated by okurz over 1 year ago

Updated by mkittler over 1 year ago

Updated by MDoucha over 1 year ago

Updated by waynechen55 over 1 year ago

Updated by mkittler over 1 year ago

Updated by waynechen55 over 1 year ago

Updated by waynechen55 over 1 year ago

Updated by mgriessmeier over 1 year ago

Updated by mgriessmeier over 1 year ago

Updated by waynechen55 over 1 year ago

Updated by cachen over 1 year ago

Updated by mkittler over 1 year ago

Updated by mkittler over 1 year ago

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by cachen over 1 year ago

Updated by waynechen55 over 1 year ago

Updated by okurz over 1 year ago

Updated by mkittler over 1 year ago

Updated by mkittler over 1 year ago

Updated by okurz over 1 year ago

Updated by openqa_review over 1 year ago

Updated by livdywan over 1 year ago

Updated by okurz over 1 year ago

Updated by mkittler over 1 year ago

Updated by okurz over 1 year ago

Updated by openqa_review over 1 year ago

Updated by mkittler over 1 year ago

Updated by xlai over 1 year ago

Updated by okurz over 1 year ago