Project

General

Profile

action #96272

Test failed incomplete: "Reason: backend died: QEMU exited unexpectedly, see log for details" size:M

Added by dimstar 4 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2021-07-29
Due date:
2021-11-02
% Done:

0%

Estimated time:

Description

Observation

This error is seen every now and then, a test seems to be running all right, and suddenly goes 'incomplete' with an error like

Reason: backend died: QEMU exited unexpectedly, see log for details 

Example: https://openqa.opensuse.org/tests/1860795

Acceptance criteria

  • AC1: Reason should be sth like "QEMU startup failed: samba not installed" or similar

Related issues

Copied to openQA Project - action #96311: qemu error message is still "debug", should be "warn" or more severe size:SResolved2021-07-29

History

#1 Updated by okurz 4 months ago

  • Subject changed from Test failed incomplete: auto_review:"Reason: backend died: QEMU exited unexpectedly, see log for details":retry to Test failed incomplete: "Reason: backend died: QEMU exited unexpectedly, see log for details"
  • Category set to Infrastructure
  • Priority changed from Normal to High

The details are actually very important. Here it says:

Could not find '/usr/sbin/smbd', please install it

That's a pretty clear message :)

I tried to make that line more clear but unfortunately it's not very obvious as a "debug" message. Will create separate ticket for that.

#2 Updated by okurz 4 months ago

  • Copied to action #96311: qemu error message is still "debug", should be "warn" or more severe size:S added

#3 Updated by okurz 3 months ago

  • Project changed from openQA Tests to openQA Infrastructure
  • Category deleted (Infrastructure)
  • Target version set to Ready

#4 Updated by cdywan 3 months ago

  • Subject changed from Test failed incomplete: "Reason: backend died: QEMU exited unexpectedly, see log for details" to Test failed incomplete: "Reason: backend died: QEMU exited unexpectedly, see log for details" size:M
  • Description updated (diff)
  • Status changed from New to Workable
  • Priority changed from High to Normal

Lowering Priority as it's basically about the error, not a problem with failing tests

#5 Updated by osukup 3 months ago

this is missing samba on worker in WSL (Windows) test, which was problem on imgtester. Now is probably irrelevant

#6 Updated by cdywan about 2 months ago

I'm unsure what the best approach is, so I'm documenting what I was pondering on:

The logs reveal the use of Samba in two places, the arguments passed to qemu as well as the error which looks like a "debug" message:

[2021-07-29T11:35:34.940 UTC] [debug] starting: /usr/bin/qemu-system-x86_64 -vga qxl -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -global isa-fdc.fdtypeA=none -m 2048 -machine q35 -cpu host -netdev user,id=qanet0,smb=/var/lib/openqa/pool/2 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -device usb-ehci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :92,share=force-shared -device virtio-serial -chardev 
[...]
[2021-07-29T11:35:35.132 UTC] [debug] QEMU: QEMU emulator version 4.2.1 (openSUSE Leap 15.2)
[2021-07-29T11:35:35.132 UTC] [debug] QEMU: Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers
[2021-07-29T11:35:35.132 UTC] [debug] QEMU: qemu-system-x86_64: Could not find '/usr/sbin/smbd', please install it
  • The arguments are added in qemu.pm iff QEMU_ENABLE_SMBD is defined. We could presumably check wether Samba's available here before we even do that.
  • We could die "QEMU: $line" if $line =~ m/Could not find.+please install it/; in process_qemu_output.
    • One step further, we could error on anything that's not related to version or copyright.
    • A test based on what's already in 18-qemu-options.t could assert that "please install" or other messages result in a clean failure.
  • Should we make Samba a dep and use it in tests? I can't think of an easy way to mock it.

#7 Updated by osukup about 2 months ago

cdywan wrote:

  • Should we make Samba a dep and use it in tests? I can't think of an easy way to mock it.

is should be dependency of os-autoinst-distri-opensuse

#8 Updated by okurz about 2 months ago

please keep https://progress.opensuse.org/issues/96272#Acceptance-criteria in mind, the primary object is to make the error message appear in the job reason, not necessarily fix the problem :)

#9 Updated by osukup about 2 months ago

  • Assignee set to osukup

#10 Updated by osukup about 2 months ago

  • Status changed from Workable to In Progress

#11 Updated by osukup about 2 months ago

It seems be bigger problem than I excepted.

Code for this is in os-autoinst/OpenQA/Qemu/Proc.pm:

sub exec_qemu {
    my ($self) = @_;

    my @params = $self->gen_cmdline();
    session->enable;
    bmwqemu::diag('starting: ' . join(' ', @params));
    session->enable_subreaper;

    my $process = $self->_process;
    $self->{_qemu_terminated} = 0;
    $process->on(
        collected => sub {
            $self->{_qemu_terminated} = 1;
            unless ($self->{_stopping}) {
                my $msg = 'QEMU exited unexpectedly, see log for details';
                $msg = 'QEMU was killed due to the system being out of memory' if ($self->check_qemu_oom == 0);
                bmwqemu::serialize_state(component => 'backend', msg => $msg);
            }   
        }); 
    $process->code(sub {
            $SIG{__DIE__} = undef;    # overwrite the default - just exit
            system $self->qemu_bin, '-version';
            # don't try to talk to the host's PA
            $ENV{QEMU_AUDIO_DRV} = "none";
            exec(@params);
    });     


we can get stdout from $proc but this will break our logging to autoinst-log.txt which is unacceptable --> see http://quasar.suse.cz/tests/7268#details

$process->read_all_stdout flush stdoud .. so channel for our logger will be empty..

#12 Updated by openqa_review about 2 months ago

  • Due date set to 2021-10-21

Setting due date based on mean cycle time of SUSE QE Tools

#13 Updated by cdywan about 2 months ago

osukup wrote:

we can get stdout from $proc but this will break our logging to autoinst-log.txt which is unacceptable --> see http://quasar.suse.cz/tests/7268#details

$process->read_all_stdout flush stdoud .. so channel for our logger will be empty..

What I see here is this:

[2021-10-06T14:44:21.320 CEST] [debug] DIAGGG:'QEMU emulator version 5.2.0 (SUSE Linux Enterprise 15)
  '
  'Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers
  '
  'qemu-system-x86_64: Could not find \'/usr/sbin/smbd\', please install it
  '

dmesg: read kernel buffer failed: Operation not permitted

So I can see the error we want to catch. The whitespace just looks garbled. Maybe you can just strip it?

I'd say push what you have and it should make it easier to spot the problem

#14 Updated by osukup about 2 months ago

cdywan wrote:

osukup wrote:

we can get stdout from $proc but this will break our logging to autoinst-log.txt which is unacceptable --> see http://quasar.suse.cz/tests/7268#details

$process->read_all_stdout flush stdoud .. so channel for our logger will be empty..

What I see here is this:

[2021-10-06T14:44:21.320 CEST] [debug] DIAGGG:'QEMU emulator version 5.2.0 (SUSE Linux Enterprise 15)
  '
  'Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers
  '
  'qemu-system-x86_64: Could not find \'/usr/sbin/smbd\', please install it
  '

dmesg: read kernel buffer failed: Operation not permitted

So I can see the error we want to catch. The whitespace just looks garbled. Maybe you can just strip it?

I'd say push what you have and it should make it easier to spot the problem

Look at autoinst-log.txt ... exactly this info is missing from log (DIAGG: .... is Data::Dumper in Callback) , which is exactly why we can't use this way

#15 Updated by osukup about 2 months ago

  • Status changed from In Progress to Workable
  • Assignee deleted (osukup)

I'm giving up on this poo:(

there isn't the way to get needed info without broking our logging, or a bigger rewrite of how is qemu executed and controlled.

Other way would be analysis of logs in openQA ... but this change will be also pretty big. So I don't think is worth this amount of work for too small effect in webgui which applies only in small amount cases

#16 Updated by okurz about 2 months ago

  • Due date deleted (2021-10-21)

#17 Updated by okurz about 2 months ago

So originally I hoped we could improve the "log gathering" within os-autoinst but I would be ok to parse the logs in openQA and update the reason retroactively. We already do something similar in https://github.com/os-autoinst/openQA/blob/7ea206aff0f8cf33abb039bb12ab0da25e90fcbc/lib/OpenQA/Worker/Job.pm#L567 for special cases

#18 Updated by Xiaojing_liu about 1 month ago

  • Status changed from Workable to In Progress
  • Assignee set to Xiaojing_liu

Here is a pr to override the reason with more details: https://github.com/os-autoinst/openQA/pull/4298

#19 Updated by openqa_review about 1 month ago

  • Due date set to 2021-11-02

Setting due date based on mean cycle time of SUSE QE Tools

#20 Updated by Xiaojing_liu about 1 month ago

  • Status changed from In Progress to Feedback

PR got merged

#21 Updated by Xiaojing_liu about 1 month ago

  • Status changed from Feedback to Resolved

We could track this ticket by seeing monitor results.

Also available in: Atom PDF