action #96272
closedTest failed incomplete: "Reason: backend died: QEMU exited unexpectedly, see log for details" size:M
0%
Description
Observation¶
This error is seen every now and then, a test seems to be running all right, and suddenly goes 'incomplete' with an error like
Reason: backend died: QEMU exited unexpectedly, see log for details
Example: https://openqa.opensuse.org/tests/1860795
Acceptance criteria¶
- AC1: Reason should be sth like "QEMU startup failed: samba not installed" or similar
Updated by okurz over 3 years ago
- Subject changed from Test failed incomplete: auto_review:"Reason: backend died: QEMU exited unexpectedly, see log for details":retry to Test failed incomplete: "Reason: backend died: QEMU exited unexpectedly, see log for details"
- Category set to Infrastructure
- Priority changed from Normal to High
The details are actually very important. Here it says:
Could not find '/usr/sbin/smbd', please install it
That's a pretty clear message :)
I tried to make that line more clear but unfortunately it's not very obvious as a "debug" message. Will create separate ticket for that.
Updated by okurz over 3 years ago
- Copied to action #96311: qemu error message is still "debug", should be "warn" or more severe size:S added
Updated by okurz over 3 years ago
- Project changed from openQA Tests (public) to openQA Infrastructure (public)
- Category deleted (
Infrastructure) - Target version set to Ready
Updated by livdywan over 3 years ago
- Subject changed from Test failed incomplete: "Reason: backend died: QEMU exited unexpectedly, see log for details" to Test failed incomplete: "Reason: backend died: QEMU exited unexpectedly, see log for details" size:M
- Description updated (diff)
- Status changed from New to Workable
- Priority changed from High to Normal
Lowering Priority as it's basically about the error, not a problem with failing tests
Updated by osukup over 3 years ago
this is missing samba on worker in WSL (Windows) test, which was problem on imgtester. Now is probably irrelevant
Updated by livdywan about 3 years ago
I'm unsure what the best approach is, so I'm documenting what I was pondering on:
The logs reveal the use of Samba in two places, the arguments passed to qemu as well as the error which looks like a "debug" message:
[2021-07-29T11:35:34.940 UTC] [debug] starting: /usr/bin/qemu-system-x86_64 -vga qxl -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -global isa-fdc.fdtypeA=none -m 2048 -machine q35 -cpu host -netdev user,id=qanet0,smb=/var/lib/openqa/pool/2 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -device usb-ehci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :92,share=force-shared -device virtio-serial -chardev
[...]
[2021-07-29T11:35:35.132 UTC] [debug] QEMU: QEMU emulator version 4.2.1 (openSUSE Leap 15.2)
[2021-07-29T11:35:35.132 UTC] [debug] QEMU: Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers
[2021-07-29T11:35:35.132 UTC] [debug] QEMU: qemu-system-x86_64: Could not find '/usr/sbin/smbd', please install it
- The arguments are added in
qemu.pm
iffQEMU_ENABLE_SMBD
is defined. We could presumably check wether Samba's available here before we even do that. - We could
die "QEMU: $line" if $line =~ m/Could not find.+please install it/;
inprocess_qemu_output
.- One step further, we could error on anything that's not related to version or copyright.
- A test based on what's already in
18-qemu-options.t
could assert that "please install" or other messages result in a clean failure.
- Should we make Samba a dep and use it in tests? I can't think of an easy way to mock it.
Updated by osukup about 3 years ago
cdywan wrote:
- Should we make Samba a dep and use it in tests? I can't think of an easy way to mock it.
is should be dependency of os-autoinst-distri-opensuse
Updated by okurz about 3 years ago
please keep https://progress.opensuse.org/issues/96272#Acceptance-criteria in mind, the primary object is to make the error message appear in the job reason, not necessarily fix the problem :)
Updated by osukup about 3 years ago
- Status changed from Workable to In Progress
Updated by osukup about 3 years ago
It seems be bigger problem than I excepted.
Code for this is in os-autoinst/OpenQA/Qemu/Proc.pm
:
sub exec_qemu {
my ($self) = @_;
my @params = $self->gen_cmdline();
session->enable;
bmwqemu::diag('starting: ' . join(' ', @params));
session->enable_subreaper;
my $process = $self->_process;
$self->{_qemu_terminated} = 0;
$process->on(
collected => sub {
$self->{_qemu_terminated} = 1;
unless ($self->{_stopping}) {
my $msg = 'QEMU exited unexpectedly, see log for details';
$msg = 'QEMU was killed due to the system being out of memory' if ($self->check_qemu_oom == 0);
bmwqemu::serialize_state(component => 'backend', msg => $msg);
}
});
$process->code(sub {
$SIG{__DIE__} = undef; # overwrite the default - just exit
system $self->qemu_bin, '-version';
# don't try to talk to the host's PA
$ENV{QEMU_AUDIO_DRV} = "none";
exec(@params);
});
we can get stdout from $proc
but this will break our logging to autoinst-log.txt
which is unacceptable --> see http://quasar.suse.cz/tests/7268#details
$process->read_all_stdout
flush stdoud .. so channel for our logger will be empty..
Updated by openqa_review about 3 years ago
- Due date set to 2021-10-21
Setting due date based on mean cycle time of SUSE QE Tools
Updated by livdywan about 3 years ago
osukup wrote:
we can get stdout from
$proc
but this will break our logging toautoinst-log.txt
which is unacceptable --> see http://quasar.suse.cz/tests/7268#details
$process->read_all_stdout
flush stdoud .. so channel for our logger will be empty..
What I see here is this:
[2021-10-06T14:44:21.320 CEST] [debug] DIAGGG:'QEMU emulator version 5.2.0 (SUSE Linux Enterprise 15)
'
'Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers
'
'qemu-system-x86_64: Could not find \'/usr/sbin/smbd\', please install it
'
dmesg: read kernel buffer failed: Operation not permitted
So I can see the error we want to catch. The whitespace just looks garbled. Maybe you can just strip it?
I'd say push what you have and it should make it easier to spot the problem
Updated by osukup about 3 years ago
cdywan wrote:
osukup wrote:
we can get stdout from
$proc
but this will break our logging toautoinst-log.txt
which is unacceptable --> see http://quasar.suse.cz/tests/7268#details
$process->read_all_stdout
flush stdoud .. so channel for our logger will be empty..What I see here is this:
[2021-10-06T14:44:21.320 CEST] [debug] DIAGGG:'QEMU emulator version 5.2.0 (SUSE Linux Enterprise 15) ' 'Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers ' 'qemu-system-x86_64: Could not find \'/usr/sbin/smbd\', please install it ' dmesg: read kernel buffer failed: Operation not permitted
So I can see the error we want to catch. The whitespace just looks garbled. Maybe you can just strip it?
I'd say push what you have and it should make it easier to spot the problem
Look at autoinst-log.txt
... exactly this info is missing from log (DIAGG: ....
is Data::Dumper in Callback) , which is exactly why we can't use this way
Updated by osukup about 3 years ago
- Status changed from In Progress to Workable
- Assignee deleted (
osukup)
I'm giving up on this poo:(
there isn't the way to get needed info without broking our logging, or a bigger rewrite of how is qemu executed and controlled.
Other way would be analysis of logs in openQA ... but this change will be also pretty big. So I don't think is worth this amount of work for too small effect in webgui which applies only in small amount cases
Updated by okurz about 3 years ago
So originally I hoped we could improve the "log gathering" within os-autoinst but I would be ok to parse the logs in openQA and update the reason retroactively. We already do something similar in https://github.com/os-autoinst/openQA/blob/7ea206aff0f8cf33abb039bb12ab0da25e90fcbc/lib/OpenQA/Worker/Job.pm#L567 for special cases
Updated by Xiaojing_liu about 3 years ago
- Status changed from Workable to In Progress
- Assignee set to Xiaojing_liu
Here is a pr to override the reason with more details: https://github.com/os-autoinst/openQA/pull/4298
Updated by openqa_review about 3 years ago
- Due date set to 2021-11-02
Setting due date based on mean cycle time of SUSE QE Tools
Updated by Xiaojing_liu about 3 years ago
- Status changed from Feedback to Resolved
We could track this ticket by seeing monitor results.