action #120112
worker worker2.oqa.suse.de auto_review:"Error connecting to <root@win2k19.qa.suse.cz>: Connection timed out":retry size:M
0%
Description
Observation¶
openQA test in scenario sle-15-SP5-JeOS-for-MS-HyperV-x86_64-jeos-containers-docker@svirt-hyperv-uefi fails in
bootloader_hyperv
Test suite description¶
worker2:~> ping -c 10 win2k19.qa.suse.cz PING win2k19.qa.suse.cz (10.100.101.33) 56(84) bytes of data. --- win2k19.qa.suse.cz ping statistics --- 10 packets transmitted, 0 received, 100% packet loss, time 9195ms
Connection from my local machine to win2k19.qa.suse.cz
works fine
Reproducible¶
Fails since (at least) Build 1.95
Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#120112
Expected result¶
Last good: (unknown) (or more recent)
Further details¶
Always latest result in this scenario: latest
History
#4
Updated by okurz 3 months ago
- Due date set to 2022-11-22
- Status changed from New to Feedback
On OSD
sudo salt --no-color --state-output=changes \* cmd.run 'command -v nmap>/dev/null || zypper -n in nmap && nmap -p 22 win2k19.qa.suse.cz'
shows how the non-migrated workers can still reach it, migrated ones can't.
okurz@openqa:~> sudo salt --no-color --state-output=changes \* cmd.run 'command -v mtr>/dev/null || zypper -n in mtr && mtr -r win2k19.qa.suse.cz' openqaworker3.suse.de: Start: 2022-11-08T18:40:20+0100 HOST: worker3 Loss% Snt Last Avg Best Wrst StDev 1.|-- gateway.oqa.suse.de 0.0% 10 0.2 0.2 0.2 0.2 0.0 2.|-- 10.136.0.12 0.0% 10 0.4 0.4 0.4 0.5 0.1 3.|-- vpn20.open.ch 0.0% 10 11.0 11.4 11.0 13.3 0.7 4.|-- 10.156.234.201 0.0% 10 12.6 13.2 11.7 22.1 3.2 5.|-- win2k19.qa.suse.cz 0.0% 10 12.0 14.1 11.5 29.0 5.5 worker10.oqa.suse.de: Start: 2022-11-08T18:40:20+0100 HOST: worker10 Loss% Snt Last Avg Best Wrst StDev 1.|-- gateway.oqa.suse.de 0.0% 10 0.3 0.3 0.2 0.4 0.1 2.|-- 10.136.0.12 0.0% 10 0.3 0.7 0.3 2.6 0.7 3.|-- vpn20.open.ch 0.0% 10 11.0 12.4 11.0 18.4 2.2 4.|-- 10.156.234.201 0.0% 10 11.6 14.2 11.4 28.4 5.1 5.|-- win2k19.qa.suse.cz 0.0% 10 11.6 14.2 11.5 28.5 5.4
shows that all can reach the host though. I can ping from worker2 as well.
I could login into the windows host and start wireshark. I don't see any incoming packets on TCP port 22 when I try to connect from a migrated host but I see packets from other hosts. So this seems to be a problem on the side of PRG incoming firewall. lhaleplidis is informed and on it.
#6
Updated by okurz 3 months ago
(Oliver Kurz) @Lazaros Haleplidis what is the status on access to win2k19.suse.cz?
(Lazaros Haleplidis) trying to identify the device blocking the traffic so that we can resolve it. (we resolved the problem of the tunnel/firewall blocking this (opensystems), now we reach PRG but there is something there, trying to identify the device first)
#7
Updated by okurz 3 months ago
(Lazaros Haleplidis) trying to troubleshoot the access to win2k19.suse.cz. You said you can access it from other locations, can you give me a copy of it's routing table?
from worker3, one of the migrated machines that now can not access win2k19.qa.suse.cz over ssh port 22
worker3:/home/okurz # ip r default via 10.137.10.254 dev br0 proto dhcp 10.0.0.0/15 dev br1 proto kernel scope link src 10.0.2.2 10.137.10.0/24 dev br0 proto kernel scope link src 10.137.10.3
in contrast to OSD that can reach 22/tcp ssh on the host just fine:
okurz@openqa:~> ip r default via 149.44.183.254 dev eth1 10.136.0.0/14 via 10.160.255.254 dev eth0 10.137.10.0/24 via 10.160.255.254 dev eth0 10.160.0.0/16 dev eth0 proto kernel scope link src 10.160.0.207 127.0.0.0/8 dev lo scope link 149.44.176.0/21 dev eth1 proto kernel scope link src 149.44.176.58
#9
Updated by okurz 3 months ago
compare a port scan to win2k19.qa.suse.cz from my notebook:
okurz@linux-28d7:~ 0 (master) $ sudo nmap 10.100.101.33 Starting Nmap 7.92 ( https://nmap.org ) at 2022-11-10 17:02 CET Nmap scan report for win2k19.qa.suse.cz (10.100.101.33) Host is up (0.041s latency). Not shown: 992 closed tcp ports (reset) PORT STATE SERVICE 22/tcp open ssh 135/tcp open msrpc 139/tcp open netbios-ssn 445/tcp open microsoft-ds 2179/tcp open vmrdp 3389/tcp open ms-wbt-server 5357/tcp open wsdapi 10012/tcp open unknown Nmap done: 1 IP address (1 host up) scanned in 2.17 seconds
to the same scan done from worker2:
okurz@worker2:~> sudo nmap win2k19.qa.suse.cz Starting Nmap 7.92 ( https://nmap.org ) at 2022-11-10 17:00 CET Nmap scan report for win2k19.qa.suse.cz (10.100.101.33) Host is up (0.011s latency). Not shown: 995 filtered tcp ports (no-response) PORT STATE SERVICE 53/tcp closed domain 113/tcp closed ident 2000/tcp open cisco-sccp 5060/tcp open sip 8008/tcp open http Nmap done: 1 IP address (1 host up) scanned in 4.69 seconds
in wireshark I can see ICMP but as TCP traffic only the one to/from 53
#10
Updated by okurz 3 months ago
- Status changed from Feedback to In Progress
(Lazaros Haleplidis) ok, final attempt. Thank you @Martin Caj for point me to the right direction so @Oliver Kurz please test for a final time
(Marius Kittler) Works now, tested from worker2.
(Lazaros Haleplidis) kudos goes to @Martin Caj for point me to the right direction
(Oliver Kurz) let us learn, what was it?
(Lazaros Haleplidis) on PRG, on l3 core, they were ACL in place
(Oliver Kurz) Well, ok. That's what I meant with it's becoming tedious if one does not have access to the controlling systems. But please let's handle it better for other problems. You don't need to wait for us to execute a simple ping or nmap call. We will provide all the necessary access so that you can check yourself.
#13
Updated by okurz 3 months ago
I assume https://openqa.suse.de/tests/9920506 is the same problem but trying to connect to esxi7.qa.suse.cz, labeled and retriggered.
#15
Updated by okurz 3 months ago
From now:
$ openqa-query-for-job-label poo#120112 9918133|2022-11-10 15:57:43|done|failed|select_modules_and_patterns+registration_dev|backend done: Error connecting to <root@win2k19.qa.suse.cz>: Connection timed out|worker2 9918139|2022-11-10 15:50:17|done|failed|select_modules_and_patterns+registration_dev|backend done: Error connecting to <root@win2k19.qa.suse.cz>: Connection timed out|worker2 9920079|2022-11-10 13:47:51|done|failed|jeos-containers-docker|backend done: Error connecting to <root@win2k19.qa.suse.cz>: Connection timed out|worker2 9919801|2022-11-10 13:39:19|done|failed|msdos_dev|backend done: Error connecting to <root@win2k19.qa.suse.cz>: Connection timed out|worker2 9919668|2022-11-10 13:39:17|done|failed|jeos-main|backend done: Error connecting to <root@win2k19.qa.suse.cz>: Connection timed out|worker2 9919869|2022-11-10 13:39:16|done|failed|jeos-filesystem|backend done: Error connecting to <root@win2k19.qa.suse.cz>: Connection timed out|worker2 9919863|2022-11-10 13:39:16|done|failed|jeos-main|backend done: Error connecting to <root@win2k19.qa.suse.cz>: Connection timed out|worker2 9919662|2022-11-10 13:39:16|done|failed|jeos-base+sdk+desktop|backend done: Error connecting to <root@win2k19.qa.suse.cz>: Connection timed out|worker2 9919606|2022-11-10 13:28:43|done|failed|default|backend done: Error connecting to <root@win2k19.qa.suse.cz>: Connection timed out|worker2 9919342|2022-11-10 13:13:20|done|failed|online_upgrade_sles15sp4_hyperv|backend done: Error connecting to <root@win2k19.qa.suse.cz>: Connection timed out|worker2
so no more failures since more than 12h, good sign.
#16
Updated by okurz 3 months ago
- Due date deleted (
2022-11-22) - Status changed from Feedback to Resolved
https://openqa.suse.de/tests/9922599 from the original scenario passing at least bootloader_hyperv.
#17
Updated by openqa_review 2 months ago
- Status changed from Resolved to Feedback
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: online_upgrade_sles15sp4_vmware@svirt-vmware70
https://openqa.suse.de/tests/10031922#step/bootloader_svirt/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
#18
Updated by okurz 2 months ago
- Status changed from Feedback to Resolved
That one test was labeled via carry-over but failed in
# Test died: { "console" => "svirt", "function" => "define_and_start", "json_cmd_token" => "plFjuRzC", "args" => [], "wantarray" => undef, "cmd" => "backend_proxy_console_call" } virsh define failed at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 523.
which is unrelated. I removed the comment from the openQA job.