Project

General

Profile

action #108845

openQA Tests - action #107062: Multiple failures due to network issues

Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms size:M

Added by okurz 3 months ago. Updated 19 days ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
Start date:
2022-03-24
Due date:
2022-04-15
% Done:

0%

Estimated time:

Description

Rollback steps


  • Restore connection to core2 for the following switches:
    • qanet10 - port gi52 is disabled on qanet10
    • qanet13 - port is disabled on core2
    • qanet15 - port is disabled on core2
    • qanet20 - port gi52 is disabled on qanet20

Examples to disable ports:

qanet20nue#configure terminal
qanet20nue(config)#interface GigabitEthernet 52
qanet20nue(config-if)#shutdown
qanet20nue(config-if)#exit
qanet20nue(config)#exit

no shutdown enables the port again.

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#108845


Related issues

Related to openQA Infrastructure - action #108266: grenache: script_run() commands randomly time out since server room moveNew2022-03-14

Related to openQA Infrastructure - action #108896: [ppc64le] auto_review:"(?s)Size of.*differs, expected.*but downloaded.*Download.*failed: 521 Connect timeout":retryResolved2022-03-24

Related to openQA Tests - action #108953: [tools] Performance issues in some s390 workersResolved2022-03-25

Related to openQA Infrastructure - action #109028: [openqa][worker][sut] Very severe stability and connectivity issues of openqa workers and sutsResolved2022-03-28

Related to openQA Tests - action #108737: [sle][security][backlog][powerVM]test fails in bootloader, any issue with install server or network performance issue?Resolved2022-03-22

Related to openQA Infrastructure - action #109055: Broken workers alertResolved2022-03-28

Copied to openQA Infrastructure - action #108872: Outdated information on openqaw5-xen https://racktables.suse.de/index.php?page=object&tab=default&object_id=3468New

Copied to openQA Infrastructure - action #109241: Prefer to use domain names rather than IPv4 in salt pillarsNew

History

#1 Updated by okurz 3 months ago

  • Project changed from openQA Tests to openQA Infrastructure
  • Category deleted (Infrastructure)

#2 Updated by okurz 3 months ago

On openqaw5-xen.qa.suse.de I installed tmux and within a tmux session I started in splits mtr scc.suse.com, mtr openqaw5-xen, mtr 2620:113:80c0:80a0:10:162:0:1, mtr 10.162.0.19 that is qanetnue.qa.suse.de

We found that there is no loss to scc.suse.com and no loss to download.opensuse.org but a 30% loss to both IPv6 and IPv4 of qanet from openqaw5-xen.qa.suse.de. We started mtr qanet14nue.qa.suse.de which looks ok. An ssh connection from openqaw5-xen.qa.suse.de to qanet14nue.qa.suse.de stopped after some seconds after connection from the command ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 -oHostKeyAlgorithms=+ssh-rsa admin@10.162.0.74.

I started a screen session on osd running an ssh connection to qanet14nue, the rack switch for openqaw5-xen.qa.suse.de, see https://racktables.nue.suse.com/index.php?page=rack&rack_id=928 and in there running ssh qanet count 0 which resolves to IPv4 running a continous ping. I see gaps in the connection:

18 bytes from 10.162.0.1: icmp_seq=1. time=0 ms
… (no outage)
18 bytes from 10.162.0.1: icmp_seq=975. time=0 ms
18 bytes from 10.162.0.1: icmp_seq=976. time=0 ms
PING: no reply from 10.162.0.1
… (repeated)
PING: no reply from 10.162.0.1
PING: timeout
18 bytes from 10.162.0.1: icmp_seq=990. time=0 ms
… (no outage)
18 bytes from 10.162.0.1: icmp_seq=1016. time=0 ms
PING: no reply from 10.162.0.1
PING: timeout
… (repeated)
PING: timeout
18 bytes from 10.162.0.1: icmp_seq=1028. time=0 ms

and the cycle repeats so intermittent outages on the connection. Next step, check connection between qanet14 and the switch in the rack of qanet, that is qanet15. And I see the same outages. So there is a problem in the connection between qanet14 and qanet15. I suggest to crosscheck between other two switches and then bisect.

#3 Updated by okurz 3 months ago

  • Status changed from New to In Progress
  • Priority changed from High to Urgent

#4 Updated by okurz 3 months ago

  • Copied to action #108872: Outdated information on openqaw5-xen https://racktables.suse.de/index.php?page=object&tab=default&object_id=3468 added

#5 Updated by okurz 3 months ago

The information in racktables was not up to date. The current position for openqaw5-xen.qa.suse.de is (obviously) not in the QA labs anymore. It's in NUE-SRV2-B-3. I moved the machine racktables. So the switch that openqaw5-xen.qa.suse.de is connected to is actually qanet20nue.qa.suse.de. That switch feels partially unresponsive when we execute any command there, e.g. show system to show general system parameters, compared to other switches. We found that the switch likely runs for already 490 days so we tried to trigger a reboot with the reload command. That got stuck so we decided that the switch should be manually power cycled with physical access. After that we can check if the responsiveness changes, if there is still packet loss from openqaw5-xen.qa.suse.de to qanet.qa.suse.de . If there is, then cleanup switch configuration, crosscheck with other switches, other machines in the same rack connected to the same switch, etc.

According to
https://www.cisco.com/c/de_de/support/switches/sg300-28-28-port-gigabit-managed-switch/model.html#~tab-downloads
there is still a year of support for the switches we use as products.
So additionally to the above we should document the important configuration from all switches that we maintain, e.g. crosscheck information in racktables about ports, VLAN entries, uplink port aggregation, etc.. Then ensure to have updated firmware on all our switches, clean out old entries, e.g. for machines that are not even there anymore, maybe some factory resets and start from scratch to configure the switches in a clean way.

#6 Updated by okurz 3 months ago

  • Related to action #108266: grenache: script_run() commands randomly time out since server room move added

#7 Updated by okurz 3 months ago

  • Related to action #108896: [ppc64le] auto_review:"(?s)Size of.*differs, expected.*but downloaded.*Download.*failed: 521 Connect timeout":retry added

#8 Updated by okurz 3 months ago

  • Subject changed from Network performance problems, DNS, DHCP, within SUSE QA network to Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"Error connecting to VNC server.*openqaw5-xen.*Connection timed out":retry but also other symptoms

#10 Updated by okurz 3 months ago

  • Subject changed from Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"Error connecting to VNC server.*openqaw5-xen.*Connection timed out":retry but also other symptoms to Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*openqaw5-xen.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms

#11 Updated by okurz 3 months ago

  • Subject changed from Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*openqaw5-xen.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms to Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms

#12 Updated by openqa_review 3 months ago

  • Due date set to 2022-04-08

Setting due date based on mean cycle time of SUSE QE Tools

#13 Updated by nicksinger 3 months ago

okurz wrote:

So the switch that openqaw5-xen.qa.suse.de is connected to is actually qanet20nue.qa.suse.de. That switch feels partially unresponsive when we execute any command there, e.g. show system to show general system parameters, compared to other switches. We found that the switch likely runs for already 490 days so we tried to trigger a reboot with the reload command. That got stuck so we decided that the switch should be manually power cycled with physical access.

I did so yesterday evening. The switch came up again and our address-table is now more populated then before. I could confirm that openqaw5-xen is connected to port 47 on that switch (physically as well as with show mac address-table address 0c:c4:7a:*:c2). It doesn't crash any more if you access the address-table but it isn't really that much more responsive. This could be because qanet20 has way more ports then our reference switches have.

When I run a ping from that switch to qanet I still see a loss of data every now and then for some seconds. It is directly connected to the core switch:

qanet20nue#show cdp neighbors
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
                  S - Switch, H - Host, I - IGMP, r - Repeater, P - VoIP Phone
                  M - Remotely-Managed Device, C - CAST Phone Port,
                  W - Two-Port MAC Relay

    Device ID       Local      Adv  Time To Capability   Platform     Port ID
                    Interface  Ver. Live
------------------ ----------- ---- ------- ---------- ------------ -----------
Nx5696Q-Core1.mgmt    gi51      2     146     R S C     N5K-C5696Q  Ethernet101
.suse.de.mgmt.suse                                                  /1/39
.de.mgmt.suse.de.m
gmt.suse.de(FOC222
0R2B1)
Nx5696Q-Core2.mgmt    gi52      2     147     R S C     N5K-C5696Q  Ethernet102
.suse.de.mgmt.suse                                                  /1/39
.de.mgmt.suse.de.m
gmt.suse.de(FOC222
2R07L)

#14 Updated by nicksinger 3 months ago

I ran a few more pings from and between qanet20 and qanet15. w5-xen connects to qanet20, qanet connects to qanet15.
Both switches are get their upstream from Nx5696Q-Core1.mgmt.suse.de and Nx5696Q-Core2.mgmt.suse.de.

Ping from qanet15 -> qanet: everything fine, no timeouts visible after over 7k pings
Ping from qanet20 -> qanet15: everything fine, no timeouts visible after over 6.5k pings
Ping from qanet20 -> qanet: timeouts every couple of seconds for 1-2 seconds. Several timeouts observed in ~2k pings

For me this indicates somehow that qanet15 struggles to switch requests to qanet.

#16 Updated by okurz 3 months ago

Given that we "discovered" the HTML configuration pages for http://qanet15nue.qa.suse.de/ and I assume then that all other if not all switches have that as well I recommend we

  • Review the configuration and settings on all switches
  • Ensure to have enabled time synced clocks
  • Backup current configuration
  • Reboot all switches to give them a fresh start
  • Conduct ping tests between all switches
  • Update configuration, e.g. mailing list addresses for current contact persons

#17 Updated by okurz 3 months ago

  • Related to action #108953: [tools] Performance issues in some s390 workers added

#18 Updated by okurz 3 months ago

  • Related to action #109028: [openqa][worker][sut] Very severe stability and connectivity issues of openqa workers and suts added

#19 Updated by nicksinger 3 months ago

  • Related to action #108737: [sle][security][backlog][powerVM]test fails in bootloader, any issue with install server or network performance issue? added

#20 Updated by okurz 3 months ago

#21 Updated by nicksinger 3 months ago

I asked Gerhard in a private slack message if he can check the core switch for qanet15 (the one qanet is attached to). What we saw:

core1: no errors
core2: 16359 output errors
both show ~15k "output discards"

We both don't know what these error counters indicate. But he was kind enough to remove the second link of qanet15 (the one to core2) for the time being.

#22 Updated by nicksinger 3 months ago

nicksinger wrote:

Ping from qanet20 -> qanet: timeouts every couple of seconds for 1-2 seconds. Several timeouts observed in ~2k pings

After the change I see a 100% success rate pinging from qanet20 -> qanet:

1000 packets transmitted, 1000 packets received, 0% packet loss

#23 Updated by nicksinger 3 months ago

We also saw:

Core1: 0error 470k discard
Core2: 6150error 76k discard

for qanet13 (where powerqaworker-qam-1.qa.suse.de) is connected to. I previously saw a package loss of ~50%. Now after removing the connection to core2 I get a pretty constant ping with only ~0.1% loss

#25 Updated by okurz 3 months ago

Thank you.
I sent an announcement to https://suse.slack.com/archives/C02CANHLANP/p1648459597479859 as well:

@here update about the QA lab related openQA network problems that seem to have appeared or significantly increased since last week. We could pinpoint to the problem to network switches, in particular the core switches maintained by SUSE IT EngInfra providing connection for the QA switches. gschlotter (EngInfra) and nsinger (QE Tools) have decided to remove the second link between QA switch qanet15 and core2. Now we have 100% success rate pinging from the switches to the machine qanet. We will monitor the situation and coordinate with EngInfra to also look into the other switch combinations as well as further followup for the future. See https://progress.opensuse.org/issues/108845 for details.

#26 Updated by nicksinger 3 months ago

  • Description updated (diff)

#27 Updated by nicksinger 3 months ago

okurz wrote:

Given that we "discovered" the HTML configuration pages for http://qanet15nue.qa.suse.de/ and I assume then that all other if not all switches have that as well I recommend we

  • Review the configuration and settings on all switches
  • Ensure to have enabled time synced clocks
  • Backup current configuration
  • Reboot all switches to give them a fresh start
  • Conduct ping tests between all switches
  • Update configuration, e.g. mailing list addresses for current contact persons

Please be aware that a complete reset also kills our web and ssh access to the switches so we would need serial to initially configure them again. What I did for now is the following:

  • Review the configuration and settings on all switches
  • Ensure to have enabled time synced clocks
  • Update configuration, e.g. mailing list addresses for current contact persons (+ Location field update)

#28 Updated by nicksinger 3 months ago

  • Description updated (diff)

#29 Updated by nicksinger 3 months ago

  • Assignee changed from okurz to nicksinger

#30 Updated by nicksinger 3 months ago

nicksinger wrote:

I created https://sd.suse.com/servicedesk/customer/portal/1/SD-81499 for a proper fix.

One fiber was broken. It got replaced today and we enabled the 2nd link for qanet15 again. Pings to grenache (which is connected to qanet15) and qanet15 itself work flawlessly now.

#31 Updated by okurz 3 months ago

Ok, given that I think a good metric could be to run ping from one switch to another switch and also central components, so e.g.

ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 -oHostKeyAlgorithms=+ssh-rsa admin@10.162.0.74 "ping qanet count 600"

and check for any gaps. WDYT?

#32 Updated by nicksinger 3 months ago

okurz wrote:

Ok, given that I think a good metric could be to run ping from one switch to another switch and also central components, so e.g.

ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 -oHostKeyAlgorithms=+ssh-rsa admin@10.162.0.74 "ping qanet count 600"

and check for any gaps. WDYT?

Yeah I forgot to mention that I already did this check right after Gerhard told me they fixed it. I did the ping to grenache and qanet from qanet20 and had 0% loss in 1k pings.

#33 Updated by okurz 3 months ago

  • Description updated (diff)

#34 Updated by okurz 3 months ago

  • Copied to action #109241: Prefer to use domain names rather than IPv4 in salt pillars added

#35 Updated by okurz 3 months ago

nicksinger wrote:

Yeah I forgot to mention that I already did this check right after Gerhard told me they fixed it. I did the ping to grenache and qanet from qanet20 and had 0% loss in 1k pings.

Yes, and I mean to conduct this check as an automatic continuous monitoring step. Can we do that?

So far no more problems observed. We can now focus on introducing more monitoring checks in our infrastructure. I see two things as necessary before resolving: 1. Check if more recent jobs are still labeled by auto-review, 2. Have at least improvements planned in separate tickets, e.g. additional telegraf ping checks, mtr checks, monitoring for the switches etc. So if you move that out to separate tickets then that is covered.
If no auto-review labeled jobs show up I assume DNS problems are gone. Right now openqa-query-for-job-label poo#108845 returns

8435334|2022-03-30 10:13:06|done|failed|gi-guest_developing-on-host_developing-xen||grenache-1
8435341|2022-03-30 10:11:09|done|failed|gi-guest_win2019-on-host_developing-kvm||grenache-1
8435342|2022-03-30 09:31:04|done|failed|virt-guest-migration-developing-from-developing-to-developing-kvm-dst||openqaworker2
8434694|2022-03-30 08:08:46|done|incomplete|qam-minimal-full|backend died: Error connecting to VNC server <s390qa105.qa.suse.de:5901>: IO::Socket::INET: connect: Connection timed out|openqaworker2
8430129|2022-03-29 11:33:42|done|failed|qam-minimal|backend done: Error connecting to VNC server <s390qa101.qa.suse.de:5901>: IO::Socket::INET: connect: Connection timed out|openqaworker2
8429701|2022-03-29 09:01:25|done|incomplete|qam-minimal-full|backend died: Error connecting to VNC server <s390qa106.qa.suse.de:5901>: IO::Socket::INET: connect: Connection timed out|openqaworker2
8429644|2022-03-29 08:35:44|done|incomplete|qam-minimal-full|backend died: Error connecting to VNC server <s390qa101.qa.suse.de:5901>: IO::Socket::INET: connect: Connection timed out|openqaworker2
8429626|2022-03-29 08:34:52|done|incomplete|jeos-containers|backend died: Error connecting to VNC server <openqaw5-xen-1.qa.suse.de:5911>: IO::Socket::INET: connect: Connection timed out|openqaworker2
8429631|2022-03-29 08:34:40|done|incomplete|jeos-filesystem|backend died: Error connecting to VNC server <openqaw5-xen-1.qa.suse.de:5914>: IO::Socket::INET: connect: Connection timed out|openqaworker2
8429627|2022-03-29 08:34:38|done|incomplete|jeos-base+sdk+desktop|backend died: Error connecting to VNC server <openqaw5-xen-1.qa.suse.de:5913>: IO::Socket::INET: connect: Connection timed out|openqaworker2

so recent results from today. The most recent is https://openqa.suse.de/tests/8435334#step/boot_from_pxe/9 failing with

Test died: Error connecting to <root@10.162.2.87>: No route to host at /usr/lib/os-autoinst/testapi.pm line 1761.

but this was not a job labeled by auto-review and it might even be unrelated. 10.162.2.87 is actually "quinn.qa.suse.de". Seems reverse DNS is missing as host 10.162.2.87 returns NXDOMAIN. Also I wonder can we not just use domain names in the openQA worker config? I will try to handle that in #109241

#37 Updated by okurz 3 months ago

Discussed with nicksinger:

  • This ticket will focus on rollback steps which nicksinger will carefully conduct and check with the inter-switch ping and ping to qanet from various sources
  • Clarify with mgmt about missing network monitoring in EngInfra domain -> #109250
  • Add monitoring, e.g. ping checks in telegraf from each openQA worker (or monitor.qa as source) to qanet.qa, dist.suse.de, download.opensuse.org, scc.suse.com, proxy.scc.suse.de -> #109253

#38 Updated by mkittler 3 months ago

  • Subject changed from Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms to Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms size:M

#39 Updated by geor 3 months ago

Are the following failures related to this ticket? (or should I open a ticket for performance issues with QA-Power8 kvm workers?)

https://openqa.suse.de/tests/8450511#step/validate_partition_table_via_blkid/5 : blkid command times out (QA-Power8-5-kvm:7)
https://openqa.suse.de/tests/8450952#step/yast2_system_settings/7 : slow typing (QA-Power8-5-kvm:2)
https://openqa.suse.de/tests/8450500#step/zypper_in/2 : slow typing (QA-Power8-4-kvm:4)
https://openqa.suse.de/tests/8449706#step/shutdown/3 : stall detected (QA-Power8-5-kvm:7)

#40 Updated by rfan1 3 months ago

nicksinger
Can you please help check this as well?
https://openqa.suse.de/tests/8456680#step/firefox_nss/16 [powerVM VNC console connection takes more time than other platforms]

#41 Updated by nicksinger 3 months ago

geor wrote:

Are the following failures related to this ticket? (or should I open a ticket for performance issues with QA-Power8 kvm workers?)

https://openqa.suse.de/tests/8450511#step/validate_partition_table_via_blkid/5 : blkid command times out (QA-Power8-5-kvm:7)
https://openqa.suse.de/tests/8450952#step/yast2_system_settings/7 : slow typing (QA-Power8-5-kvm:2)
https://openqa.suse.de/tests/8450500#step/zypper_in/2 : slow typing (QA-Power8-4-kvm:4)
https://openqa.suse.de/tests/8449706#step/shutdown/3 : stall detected (QA-Power8-5-kvm:7)

Hey geor, I checked the machines and both are connected to qanet13. Since this is the only switch where the second uplink is not restored yet (this was our previous workaround to fix the unstable connection) I don't think this is related.

rfan1 wrote:

nicksinger
Can you please help check this as well?
https://openqa.suse.de/tests/8456680#step/firefox_nss/16 [powerVM VNC console connection takes more time than other platforms]

Could you please confirm that more tests are failing on redcurrant? As these machines are not connected to the qanet switches I think we might have a different problem here.

#42 Updated by nicksinger 3 months ago

  • Status changed from In Progress to Blocked

The links on qanet{10,15,20} are restored. show interfaces po 1 shows that 2 ports are active on each switch despite cdp neighbor data missing for the second link.
I did a ping-check to 10.162.0.1 (qanet) and have 0% loss over 1000 packets.

I asked gerhard to restore the second link for qanet13 over slack but no response yet. Therefore setting this to blocked until I receive further feedback from him.

#43 Updated by okurz 3 months ago

  • Status changed from Blocked to Feedback

Sounds good! Please use "Feedback" except for cases where there is another ticket which we can track, or anything public. "Feedback" can also mean "busy-waiting" with polling for a response :)

#44 Updated by cdywan 3 months ago

  • Due date changed from 2022-04-08 to 2022-04-15

Still waiting for confirmation from Gerhard

#45 Updated by nicksinger 3 months ago

  • Status changed from Feedback to Resolved

Got a response from Gerhard today and he brought back the second interface on qanet13. 1k pings to qanet (10.162.0.1) shows 0% loss. I'm therefore considering this here as done :)

#46 Updated by openqa_review 2 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: sles+sdk+proxy_SCC_via_YaST_ncurses@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/8569074#step/addon_products_via_SCC_yast2_ncurses/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

#47 Updated by okurz 2 months ago

I removed the comment from https://openqa.suse.de/tests/8569074#comments to not use this ticket as label.

#48 Updated by openqa_review about 2 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: prj4_guest_upgrade_sles12sp5_on_sles12sp5-kvm
https://openqa.suse.de/tests/8685507#step/boot_from_pxe/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

#49 Updated by openqa_review 21 days ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: prj4_guest_upgrade_sles12sp5_on_sles12sp5-kvm
https://openqa.suse.de/tests/8751600#step/boot_from_pxe/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 56 days if nothing changes in this ticket.

#50 Updated by okurz 20 days ago

  • Status changed from Resolved to Feedback

We need to handle the openqa-review comments

#51 Updated by nicksinger 20 days ago

okurz wrote:

We need to handle the openqa-review comments

I deleted the two mentioned comments because the tests failing are not related to this ticket here. Anything else which needs to be done to "handle the openqa-review comments"?

#52 Updated by cdywan 19 days ago

nicksinger wrote:

okurz wrote:

We need to handle the openqa-review comments

I deleted the two mentioned comments because the tests failing are not related to this ticket here. Anything else which needs to be done to "handle the openqa-review comments"?

They also don't seem to match the autoreview expression. I would assume this can be closed

#53 Updated by okurz 19 days ago

nicksinger wrote:

I deleted the two mentioned comments because the tests failing are not related to this ticket here. Anything else which needs to be done to "handle the openqa-review comments"?

cdywan wrote:

They also don't seem to match the autoreview expression. I would assume this can be closed

Both does not help if there is carry-over happening. So would be good to follow the URLs pointing to this ticket and understand why there is a reference to this ticket. With the comments deleting that's a bit harder :) But querying the database and looking for jobs that have this ticket in a comment can help for that.

#54 Updated by cdywan 19 days ago

okurz wrote:

nicksinger wrote:

I deleted the two mentioned comments because the tests failing are not related to this ticket here. Anything else which needs to be done to "handle the openqa-review comments"?

cdywan wrote:

They also don't seem to match the autoreview expression. I would assume this can be closed

Both does not help if there is carry-over happening. So would be good to follow the URLs pointing to this ticket and understand why there is a reference to this ticket. With the comments deleting that's a bit harder :) But querying the database and looking for jobs that have this ticket in a comment can help for that.

The comments have been deleted already - what else would cause a carry-over now?

#55 Updated by cdywan 19 days ago

cdywan wrote:

okurz wrote:

nicksinger wrote:

I deleted the two mentioned comments because the tests failing are not related to this ticket here. Anything else which needs to be done to "handle the openqa-review comments"?

cdywan wrote:

They also don't seem to match the autoreview expression. I would assume this can be closed

Both does not help if there is carry-over happening. So would be good to follow the URLs pointing to this ticket and understand why there is a reference to this ticket. With the comments deleting that's a bit harder :) But querying the database and looking for jobs that have this ticket in a comment can help for that.

The comments have been deleted already - what else would cause a carry-over now?

I guess this answers my question:

./openqa-query-for-job-label poo#108845
8766983|2022-05-16 19:31:03|done|incomplete|qam-minimal|backend died: Error connecting to VNC server <s390qa102.qa.suse.de:5901>: IO::Socket::INET: connect: timeout|openqaworker2
8751821|2022-05-13 23:35:51|done|failed|uefi-gi-guest_sles12sp5-on-host_developing-xen||openqaworker2
8737947|2022-05-12 07:43:09|done|failed|prj4_guest_upgrade_sles12sp5_on_sles12sp5-kvm||grenache-1
8729421|2022-05-10 12:46:06|done|failed|modify_existing_partition||grenache-1
8722867|2022-05-09 07:25:56|done|failed|qam-minimal-full|backend done: Error connecting to VNC server <s390qa104.qa.suse.de:5901>: IO::Socket::INET: connect: Connection timed out|openqaworker2
8722866|2022-05-09 07:25:44|done|incomplete|qam-minimal|backend died: Error connecting to VNC server <s390qa106.qa.suse.de:5901>: IO::Socket::INET: connect: Connection timed out|openqaworker2

#56 Updated by cdywan 19 days ago

  • Status changed from Feedback to Resolved

cdywan wrote:

okurz wrote:

Both does not help if there is carry-over happening. So would be good to follow the URLs pointing to this ticket and understand why there is a reference to this ticket. With the comments deleting that's a bit harder :) But querying the database and looking for jobs that have this ticket in a comment can help for that.

All references are gone now

Also available in: Atom PDF