action #103575
closed[virtualization][3rd party hypervisor] Worker openqaw8-vmware.qa.suse.de is not reachable
0%
Description
The following OSD jobs failed at bootloader_svirt due to the worker openqaw8-vmware.qa.suse.de cannot be reached and logged in.
https://openqa.nue.suse.com/tests/7794974
https://openqa.nue.suse.com/tests/7799143
https://openqa.nue.suse.com/tests/7799141
https://openqa.nue.suse.com/tests/7799136
Updated by jlausuch almost 3 years ago
Updated by okurz almost 3 years ago
- Status changed from New to In Progress
- Assignee set to okurz
- Priority changed from Normal to High
- Target version set to Ready
I think this is related to a recent IPMI firmware change done by bmwiedemann from EngInfra. I just powered on the host with ipmitool -I lanplus -H sp.openqaw8-vmware.qa.suse.de -U ADMIN -P $password power on
and will check. Also https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/372 is related. IPMI SoL looks like the machine is up so I cloned https://openqa.suse.de/tests/7794713#step/bootloader_svirt/7 as https://openqa.suse.de/tests/7802844#live and will monitor.
https://openqa.suse.de/tests/7802844#step/bootloader_svirt/44 shows that we reached the SUT so I went ahead and executed
openqa-label-all --verbose --openqa-host https://openqa.suse.de --label '* bootloader_svirt: https://progress.opensuse.org/issues/103575' --module bootloader_svirt
with openqa-label-all from the package openQA-python-scripts
The complete output including all jobs that have been triggered:
Updated by openqa_review almost 3 years ago
- Due date set to 2021-12-22
Setting due date based on mean cycle time of SUSE QE Tools
Updated by nanzhang almost 3 years ago
Thank you for the fix!
The latest run looks good. - https://openqa.nue.suse.com/tests/7806054
Updated by okurz almost 3 years ago
- Due date deleted (
2021-12-22) - Status changed from In Progress to Resolved
All referenced jobs seem to have passed the initial step at least
Updated by openqa_review almost 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: jeos-fips@svirt-vmware65
https://openqa.suse.de/tests/7888369
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by openqa_review almost 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: jeos-fips@svirt-vmware65
https://openqa.suse.de/tests/7924906
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by openqa_review almost 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: jeos-fs_stress@svirt-vmware65
https://openqa.suse.de/tests/7998874
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by okurz almost 3 years ago
- Status changed from Resolved to New
- Assignee deleted (
okurz)
We apparently need to check this again, see the openQA test references
Updated by mkittler almost 3 years ago
- Status changed from New to Feedback
The tests which fail now look like VMWare tests but they're actually (successfully) connecting to openqaw5-xen.qa.suse.de (instead of openqaw8-vmware.qa.suse.de). Judging by its hostname, I assume openqaw5-xen.qa.suse.de only works for Xen tests so this test setup seems just wrong. So the failures haven't anything to do with openqaw8-vmware.qa.suse.de being unreachable. In fact, I can connect to that host just fine (via VPN).
Updated by mkittler almost 3 years ago
Ok, it is actually using the VMWare host. However, it still seems to be reachable and responding to SSH commands:
[2022-01-26T18:33:11.275678+01:00] [debug] SSH connection to root@openqaw8-vmware.qa.suse.de established
[2022-01-26T18:33:11.364395+01:00] [debug] [run_ssh_cmd(set -x; rm -f /vmfs/volumes/datastore1/openQA/*openQA-SUT-3*)] stderr:
+ rm -f /vmfs/volumes/datastore1/openQA/SLES15-SP1-JeOS.x86_64-15.1-VMware-Build37.8.53_openQA-SUT-3_thinfile-flat.vmdk /vmfs/volumes/datastore1/openQA/SLES15-SP1-JeOS.x86_64-15.1-VMware-Build37.8.53_openQA-SUT-3_thinfile.vmdk /vmfs/volumes/datastore1/openQA/openQA-SUT-3.vmsd /vmfs/volumes/datastore1/openQA/openQA-SUT-3.vmx
[2022-01-26T18:33:11.367458+01:00] [debug] [run_ssh_cmd(set -x; rm -f /vmfs/volumes/datastore1/openQA/*openQA-SUT-3*)] exit-code: 0
…
[2022-01-26T18:33:11.831266+01:00] [debug] <<< backend::baseclass::new_ssh_connection(keep_open=1, username="root", hostname="openqaw8-vmware.qa.suse.de", password="SECRET", blocking=1, wantarray=1)
[2022-01-26T18:33:11.945655+01:00] [debug] Use existing SSH connection (key:hostname=openqaw8-vmware.qa.suse.de,username=root,port=22)
[2022-01-26T18:33:33.357052+01:00] [debug] [run_ssh_cmd(find /vmfs/volumes/openqa/hdd /vmfs/volumes/openqa/hdd/fixed -name SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk.xz | head -n1 | awk 1 ORS='')] stdout:
/vmfs/volumes/openqa/hdd/SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk.xz
[2022-01-26T18:33:33.359679+01:00] [debug] [run_ssh_cmd(find /vmfs/volumes/openqa/hdd /vmfs/volumes/openqa/hdd/fixed -name SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk.xz | head -n1 | awk 1 ORS='')] exit-code: 0
[2022-01-26T18:33:33.518226+01:00] [debug] Image found: /vmfs/volumes/openqa/hdd/SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk.xz
[2022-01-26T18:33:33.518469+01:00] [debug] tests/installation/bootloader_svirt.pm:137 called bootloader_svirt::search_image_on_svirt_host -> tests/installation/bootloader_svirt.pm:49 called testapi::enter_cmd
[2022-01-26T18:33:33.518686+01:00] [debug] <<< testapi::type_string(string="# Copying image SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk.xz...", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-26T18:33:36.283921+01:00] [debug] tests/installation/bootloader_svirt.pm:142 called backend::console_proxy::__ANON__
[2022-01-26T18:33:36.284202+01:00] [debug] <<< backend::console_proxy::__ANON__(wrapped_call={
"function" => "run_cmd",
"console" => "svirt",
"wantarray" => "",
"args" => [
"test -e /vmfs/volumes/datastore1/openQA//SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk",
"domain",
"sshVMwareServer"
]
})
[2022-01-26T18:33:36.285235+01:00] [debug] <<< backend::baseclass::run_ssh_cmd(cmd="test -e /vmfs/volumes/datastore1/openQA//SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk", wantarray=0, keep_open=1, username="root", password="SECRET", hostname="openqaw8-vmware.qa.suse.de")
The problem is apparently that some file exists which shouldn't:
[2022-01-26T18:34:41.051751+01:00] [debug] Use existing SSH connection (key:hostname=openqaw8-vmware.qa.suse.de,username=root,port=22)
[2022-01-26T18:34:41.064536+01:00] [debug] [run_ssh_cmd(xz --decompress --keep --verbose /vmfs/volumes/datastore1/openQA//SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk.xz)] stderr:
xz: /vmfs/volumes/datastore1/openQA//SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk: File exists
[2022-01-26T18:34:41.067476+01:00] [debug] [run_ssh_cmd(xz --decompress --keep --verbose /vmfs/volumes/datastore1/openQA//SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk.xz)] exit-code: 1
[2022-01-26T18:34:41.299347+01:00] [info] ::: basetest::runtest: # Test died: Image decompress in datastore failed!
at sle/tests/installation/bootloader_svirt.pm line 146.
bootloader_svirt::run(bootloader_svirt=HASH(0x560cf842d6e0)) called at /usr/lib/os-autoinst/basetest.pm line 360
eval {...} called at /usr/lib/os-autoinst/basetest.pm line 354
basetest::runtest(bootloader_svirt=HASH(0x560cf842d6e0)) called at /usr/lib/os-autoinst/autotest.pm line 372
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 372
autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 242
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 242
autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 296
autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x560cfa2895f0)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x560cfa2895f0), CODE(0x560cfa73a5a0)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 488
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x560cfa2895f0)) called at /usr/lib/os-autoinst/autotest.pm line 298
autotest::start_process() called at /usr/bin/isotovideo line 261
Updated by mkittler almost 3 years ago
Maybe
[root@openqaw8-vmware:~] mv /vmfs/volumes/datastore1/openQA//SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk.xz /vmfs/volumes/datastore1/openQA//SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk.xz.bak
helped but I cannot retry the openQA job to test that because assets are missing.
Ensuring these details within the test setup is also likely more something the test writers should handle.
I've been removing the wrong bugrefs from the jobs.
Updated by mloviska almost 3 years ago
mkittler wrote:
Maybe
[root@openqaw8-vmware:~] mv /vmfs/volumes/datastore1/openQA//SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk.xz /vmfs/volumes/datastore1/openQA//SLES15-SP2-JeOS.x86_64-15.2-VMware-Build15.106.vmdk.xz.bak
helped but I cannot retry the openQA job to test that because assets are missing.
Ensuring these details within the test setup is also likely more something the test writers should handle.
I've been removing the wrong bugrefs from the jobs.
Seems like you are trying to use assets that we are not testing anymore. :)
Updated by mkittler almost 3 years ago
- Status changed from Feedback to Resolved
I am not testing anything. I am only taking care of this ticket which was reopened due to these failing jobs. However, it turns out to be unrelated so I'm resolving the ticket again. Of course it would make sense to avoid creating those jobs the they are not relevant anymore. (Maybe that's already the case. The last job is from 7 days ago.)