action #15378
closedppc worker died somewhere in the middle
100%
Description
Observation¶
openQA test in scenario sle-12-SP3-Server-DVD-ppc64le-om_proxyscc_sles12sp2_sdk_full_update_by_yast_ppc@ppc64le fails in
bootloader_ofw to show any content from the virtual machine. It looks like the virtual machine never started or was already shutdown at this time.
https://openqa.suse.de/tests/659770/file/autoinst-log.txt does not show anything obvious.
logs from worker on malbec at that time:
Dec 08 09:34:30 malbec worker[162548]: setting job 659232 to done
Dec 08 09:34:30 malbec worker[162548]: got job 659770: 00659770-sle-12-SP3-Server-DVD-ppc64le-Build0185-om_proxyscc_sles12sp2_sdk_full_update_by_yast_ppc@ppc64le
Dec 08 09:34:30 malbec worker[162548]: 47042: WORKING 659770
Dec 08 09:35:16 malbec worker[162548]: can't open /var/lib/openqa/pool/3/testresults/result-online_migration_setup.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 626.
Dec 08 09:35:21 malbec worker[162548]: cleaning up 00659770-sle-12-SP3-Server-DVD-ppc64le-Build0185-om_proxyscc_sles12sp2_sdk_full_update_by_yast_ppc@ppc64le...
Dec 08 09:35:21 malbec worker[162548]: Received command livelog_stop for job 659770, but we do not have any assigned. Ignoring!
Dec 08 09:36:48 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:37:21 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:37:38 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:38:50 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:39:18 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:41:18 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:48:40 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:48:55 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:49:33 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:50:14 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:50:27 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:51:10 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:56:35 malbec worker[162548]: Use of uninitialized value $op in numeric eq (==) at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/Transaction/WebSocket.pm line 110.
Dec 08 09:56:35 malbec worker[162548]: setting job 659770 to done
Dec 08 09:56:35 malbec worker[162548]: got job 659814: 00659814-sle-12-SP2-Server-DVD-Incidents-ppc64le-Build:125282.perl-DBD-mysql.3690-qam-minimal-full@ppc64le
Dec 08 09:56:35 malbec worker[162548]: 47727: WORKING 659814
Problem¶
H1. we might not have enough logging output from the worker to understand what could have happened here.
H2. worker dead detection?
Further details¶
Always latest result in this scenario: latest
Updated by okurz about 8 years ago
- Priority changed from Normal to High
It's not just happening once: https://openqa.suse.de/tests/658864#step/bootloader_ofw/2 and https://openqa.suse.de/tests/658866#step/bootloader_ofw/2
The job should probably be reported as incomplete, not "failed". why is that?
Updated by okurz about 8 years ago
- Priority changed from High to Urgent
And it's reproducible a lot: https://openqa.suse.de/tests/659899
Updated by mkittler about 8 years ago
- Subject changed from ppc worker did somewhere in the middle to ppc worker died somewhere in the middle
Updated by mkittler about 8 years ago
to H2: Then we would receive warnings like https://progress.opensuse.org/issues/13952 in web UI log.
Updated by okurz about 8 years ago
- Status changed from New to In Progress
mkittler wrote:
to H2: Then we would receive warnings like https://progress.opensuse.org/issues/13952 in web UI log.
Who said we don't have? I did not look for the logs from the web ui at around this time but I did now and could not find anything wrong regarding job 659770.
@mgriessmeier can you help to better understand why we did not see the issue in the logs properly as you found it in #15396?
Updated by coolo almost 8 years ago
- Status changed from In Progress to Closed
"match=bootloader-ofw timed out after 15" doesn't sound like 'in the middle'. As this immediate issue is now untouched for 10 days, I assume the panic faded off and we can close this
Updated by okurz almost 8 years ago
- Status changed from Closed to In Progress
The child ticket #15396 was more about the "immediate" panic but this ticket still should be open (can't change prio right now?!?). I was saying "in the middle" because os-autoinst was just acting super-stupid here. It's like if some DAU calls you that he can not see anything on his screen and after some debugging over the phone it turns out that the display was stolen. In all the tests failing linked to this ticket the problem was that the firmware was not loadable at all and the qemu instance did not display anything. I guess this can still be improved.
Updated by coolo almost 8 years ago
- Status changed from In Progress to Closed
then please create a new low priority issue - immediate priority issues are locked to it.