Project

General

Profile

Actions

action #71188

open

coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA

coordination #102909: [epic] Prevent more incompletes already within os-autoinst or openQA

job incomplete with auto_review:"backend died: QEMU exited unexpectedly, see log for details" and no other obvious information in the logfile what went wrong

Added by ilausuch over 3 years ago. Updated about 2 years ago.

Status:
Workable
Priority:
Low
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2020-09-10
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://openqa.suse.de/tests/4667727 incomplete reason "backend died: QEMU exited unexpectedly, see log for details"

https://openqa.suse.de/tests/4667727/file/autoinst-log.txt shows just:


[2020-09-09T19:53:48.385 CEST] [debug] <<< testapi::wait_serial(quiet=undef, timeout=1000, expect_not_found=0, record_output=undef, buffer_size=undef, regexp=qr/cmy9h-\d+-/, no_regex=0)
[2020-09-09T19:53:49.454 CEST] [debug] >>> testapi::wait_serial: (?^:cmy9h-\d+-): ok
[2020-09-09T19:53:49.454 CEST] [debug] tests/console/force_scheduled_tasks.pm:54 called testapi::assert_script_run
[2020-09-09T19:53:49.454 CEST] [debug] <<< testapi::assert_script_run(cmd="for i in \$(systemctl list-units --type=timer --state=active --no-legend | sed -e 's/\\(\\S\\+\\)\\.timer\\s.*/\\1/'); do echo \"Triggering systemd timed service \$i\" && systemctl start \$i && systemctl mask \$i.{service,timer}; done", quiet=undef, fail_message="", timeout=1000)
[2020-09-09T19:53:49.454 CEST] [debug] tests/console/force_scheduled_tasks.pm:54 called testapi::assert_script_run
[2020-09-09T19:53:49.454 CEST] [debug] <<< testapi::type_string(string="for i in \$(systemctl list-units --type=timer --state=active --no-legend | sed -e 's/\\(\\S\\+\\)\\.timer\\s.*/\\1/'); do echo \"Triggering systemd timed service \$i\" && systemctl start \$i && systemctl mask \$i.{service,timer}; done", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2020-09-09T19:53:52.124 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2020-09-09T19:53:52.127 CEST] [debug] sending magic and exit
[2020-09-09T19:53:52.127 CEST] [debug] received magic close
[2020-09-09T19:53:52.128 CEST] [debug] THERE IS NOTHING TO READ 16 5 4
[2020-09-09T19:53:52.128 CEST] [debug] stopping command server 28437 because test execution ended
[2020-09-09T19:53:52.128 CEST] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20053/Bx3fOTDMBl3VvYnl/broadcast
[2020-09-09T19:53:52.142 CEST] [debug] commands process exited: 0
[2020-09-09T19:53:52.151 CEST] [debug] backend process exited: 0
[2020-09-09T19:53:52.177 CEST] [debug] Driver backend collected unknown process with pid 28559 and exit status: 0
[2020-09-09T19:53:52.177 CEST] [debug] done with command server
[2020-09-09T19:53:52.177 CEST] [debug] stopping autotest process 28441
[2020-09-09T19:53:52.177 CEST] [debug] autotest received signal TERM, saving results of current test before exiting
[2020-09-09T19:53:52.187 CEST] [debug] [autotest] process exited: 1
[2020-09-09T19:53:53.288 CEST] [debug] done with autotest process
[2020-09-09T19:53:53.288 CEST] [debug] isotovideo failed
[2020-09-09T19:53:53.288 CEST] [debug] stopping backend process 28500
[2020-09-09T19:53:53.288 CEST] [debug] done with backend process
28430: EXIT 1
[2020-09-09T19:53:53.0361 CEST] [info] [pid:1576] Isotovideo exit status: 1
[2020-09-09T19:53:53.0387 CEST] [info] [pid:1576] +++ worker notes +++
[2020-09-09T19:53:53.0387 CEST] [info] [pid:1576] End time: 2020-09-09 17:53:53
[2020-09-09T19:53:53.0387 CEST] [info] [pid:1576] Result: died
[2020-09-09T19:53:53.0393 CEST] [info] [pid:37797] Uploading logs_from_installation_system-y2logs.tar.bz2
[2020-09-09T19:54:22.0706 CEST] [info] [pid:37797] Uploading video.ogv
[2020-09-09T19:59:29.0220 CEST] [info] [pid:37797] Uploading vars.json
[2020-09-09T19:59:29.0786 CEST] [info] [pid:37797] Uploading autoinst-log.txt

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
for example to look for this ticket call openqa-query-for-job-label poo#71188

Acceptance criteria

  • AC1: A user better understands what could have gone wrong and the next step is clear what to do

Suggestions

  • Regardless of what happened maybe just restarting would have helped
  • Check if there is any information on the worker host for this time

Related issues 1 (1 open0 closed)

Related to openQA Project - action #75091: incomplete jobs with one of the isotovideo sub-processes receiving a signal and just terminating, no clue why or who/what triggered the terminationNew2020-10-22

Actions
Actions #1

Updated by okurz over 3 years ago

  • Subject changed from job incomplete with "backend died: QEMU exited unexpectedly, see log for details" to job incomplete with "backend died: QEMU exited unexpectedly, see log for details" and no other obvious information in the logfile what went wrong
  • Description updated (diff)
  • Category set to Feature requests
  • Status changed from New to Workable
  • Priority changed from Normal to Low
  • Target version set to future

this was found by https://gitlab.suse.de/openqa/auto-review/pipelines when ilausuch reviewed this together with me. I asked him to create the ticket initially. I have now extended the description but I think it is unlikely we will be able to do much here. Also it seems that this problem does not happen often or has any high impact given that this time it was only one job and we did not see others failing with the same issue in before so far.

Actions #2

Updated by Xiaojing_liu over 3 years ago

  • Subject changed from job incomplete with "backend died: QEMU exited unexpectedly, see log for details" and no other obvious information in the logfile what went wrong to job incomplete with auto_review:"backend died: QEMU exited unexpectedly, see log for details" and no other obvious information in the logfile what went wrong
Actions #3

Updated by okurz over 3 years ago

  • Target version changed from future to Ready

https://gitlab.suse.de/openqa/auto-review/-/jobs/271952#L45 shows that we had this recently 17 times in the last 24h on osd, so we should not ignore this for long but plan it as part of our backlog.

Actions #4

Updated by okurz over 3 years ago

  • Related to action #75091: incomplete jobs with one of the isotovideo sub-processes receiving a signal and just terminating, no clue why or who/what triggered the termination added
Actions #5

Updated by okurz over 3 years ago

  • Description updated (diff)
  • Parent task set to #62420

The regex is too generic and also matching in https://openqa.suse.de/tests/5046475# which has:

[2020-11-21T15:33:08.515 CET] [debug] The built-in video encoder (pid 64116) terminated
[2020-11-21T15:33:08.517 CET] [debug] QEMU: QEMU emulator version 3.1.1.1 (openSUSE Leap 15.1)
[2020-11-21T15:33:08.517 CET] [debug] QEMU: Copyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers
[2020-11-21T15:33:08.518 CET] [debug] QEMU: qemu-system-x86_64: cannot set up guest memory 'pc.ram': Cannot allocate memory

which is a problem that the test maintainers are aware about but we mark the job with this ticket. I have kept the auto_review keyword here though to prevent alerts again appearing because the auto_review pipeline would fail.

Actions #6

Updated by okurz over 3 years ago

  • Target version changed from Ready to future
Actions #7

Updated by okurz over 2 years ago

  • Parent task changed from #62420 to #102909
Actions #8

Updated by okurz about 2 years ago

https://openqa.opensuse.org/tests/2144277# has this ticket linked but the error detail is pretty clear: "qemu-system-aarch64: -drive id=pflash-code-overlay0,if=pflash,file=/var/lib/openqa/pool/2/raid/pflash-code-overlay0,unit=0,readonly=on: Failed to lock byte 100"

Actions

Also available in: Atom PDF