action #71188
opencoordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA
coordination #102909: [epic] Prevent more incompletes already within os-autoinst or openQA
job incomplete with auto_review:"backend died: QEMU exited unexpectedly, see log for details" and no other obvious information in the logfile what went wrong
0%
Description
Observation¶
https://openqa.suse.de/tests/4667727 incomplete reason "backend died: QEMU exited unexpectedly, see log for details"
https://openqa.suse.de/tests/4667727/file/autoinst-log.txt shows just:
[2020-09-09T19:53:48.385 CEST] [debug] <<< testapi::wait_serial(quiet=undef, timeout=1000, expect_not_found=0, record_output=undef, buffer_size=undef, regexp=qr/cmy9h-\d+-/, no_regex=0)
[32m[2020-09-09T19:53:49.454 CEST] [debug] >>> testapi::wait_serial: (?^:cmy9h-\d+-): ok
[0m[2020-09-09T19:53:49.454 CEST] [debug] tests/console/force_scheduled_tasks.pm:54 called testapi::assert_script_run
[2020-09-09T19:53:49.454 CEST] [debug] <<< testapi::assert_script_run(cmd="for i in \$(systemctl list-units --type=timer --state=active --no-legend | sed -e 's/\\(\\S\\+\\)\\.timer\\s.*/\\1/'); do echo \"Triggering systemd timed service \$i\" && systemctl start \$i && systemctl mask \$i.{service,timer}; done", quiet=undef, fail_message="", timeout=1000)
[2020-09-09T19:53:49.454 CEST] [debug] tests/console/force_scheduled_tasks.pm:54 called testapi::assert_script_run
[2020-09-09T19:53:49.454 CEST] [debug] <<< testapi::type_string(string="for i in \$(systemctl list-units --type=timer --state=active --no-legend | sed -e 's/\\(\\S\\+\\)\\.timer\\s.*/\\1/'); do echo \"Triggering systemd timed service \$i\" && systemctl start \$i && systemctl mask \$i.{service,timer}; done", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[33m[2020-09-09T19:53:52.124 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[0m[37m[2020-09-09T19:53:52.127 CEST] [debug] sending magic and exit
[0m[37m[2020-09-09T19:53:52.127 CEST] [debug] received magic close
[0m[37m[2020-09-09T19:53:52.128 CEST] [debug] THERE IS NOTHING TO READ 16 5 4
[0m[37m[2020-09-09T19:53:52.128 CEST] [debug] stopping command server 28437 because test execution ended
[0m[37m[2020-09-09T19:53:52.128 CEST] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20053/Bx3fOTDMBl3VvYnl/broadcast
[0m[37m[2020-09-09T19:53:52.142 CEST] [debug] commands process exited: 0
[0m[37m[2020-09-09T19:53:52.151 CEST] [debug] backend process exited: 0
[0m[37m[2020-09-09T19:53:52.177 CEST] [debug] Driver backend collected unknown process with pid 28559 and exit status: 0
[0m[37m[2020-09-09T19:53:52.177 CEST] [debug] done with command server
[0m[37m[2020-09-09T19:53:52.177 CEST] [debug] stopping autotest process 28441
[0m[37m[2020-09-09T19:53:52.177 CEST] [debug] autotest received signal TERM, saving results of current test before exiting
[0m[37m[2020-09-09T19:53:52.187 CEST] [debug] [autotest] process exited: 1
[0m[37m[2020-09-09T19:53:53.288 CEST] [debug] done with autotest process
[0m[37m[2020-09-09T19:53:53.288 CEST] [debug] isotovideo failed
[0m[37m[2020-09-09T19:53:53.288 CEST] [debug] stopping backend process 28500
[0m[37m[2020-09-09T19:53:53.288 CEST] [debug] done with backend process
[0m28430: EXIT 1
[2020-09-09T19:53:53.0361 CEST] [info] [pid:1576] Isotovideo exit status: 1
[2020-09-09T19:53:53.0387 CEST] [info] [pid:1576] +++ worker notes +++
[2020-09-09T19:53:53.0387 CEST] [info] [pid:1576] End time: 2020-09-09 17:53:53
[2020-09-09T19:53:53.0387 CEST] [info] [pid:1576] Result: died
[2020-09-09T19:53:53.0393 CEST] [info] [pid:37797] Uploading logs_from_installation_system-y2logs.tar.bz2
[2020-09-09T19:54:22.0706 CEST] [info] [pid:37797] Uploading video.ogv
[2020-09-09T19:59:29.0220 CEST] [info] [pid:37797] Uploading vars.json
[2020-09-09T19:59:29.0786 CEST] [info] [pid:37797] Uploading autoinst-log.txt
Steps to reproduce¶
Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
for example to look for this ticket call openqa-query-for-job-label poo#71188
Acceptance criteria¶
- AC1: A user better understands what could have gone wrong and the next step is clear what to do
Suggestions¶
- Regardless of what happened maybe just restarting would have helped
- Check if there is any information on the worker host for this time
Updated by okurz about 4 years ago
- Subject changed from job incomplete with "backend died: QEMU exited unexpectedly, see log for details" to job incomplete with "backend died: QEMU exited unexpectedly, see log for details" and no other obvious information in the logfile what went wrong
- Description updated (diff)
- Category set to Feature requests
- Status changed from New to Workable
- Priority changed from Normal to Low
- Target version set to future
this was found by https://gitlab.suse.de/openqa/auto-review/pipelines when ilausuch reviewed this together with me. I asked him to create the ticket initially. I have now extended the description but I think it is unlikely we will be able to do much here. Also it seems that this problem does not happen often or has any high impact given that this time it was only one job and we did not see others failing with the same issue in before so far.
Updated by Xiaojing_liu about 4 years ago
- Subject changed from job incomplete with "backend died: QEMU exited unexpectedly, see log for details" and no other obvious information in the logfile what went wrong to job incomplete with auto_review:"backend died: QEMU exited unexpectedly, see log for details" and no other obvious information in the logfile what went wrong
Updated by okurz almost 4 years ago
- Target version changed from future to Ready
https://gitlab.suse.de/openqa/auto-review/-/jobs/271952#L45 shows that we had this recently 17 times in the last 24h on osd, so we should not ignore this for long but plan it as part of our backlog.
Updated by okurz almost 4 years ago
- Related to action #75091: incomplete jobs with one of the isotovideo sub-processes receiving a signal and just terminating, no clue why or who/what triggered the termination added
Updated by okurz almost 4 years ago
- Description updated (diff)
- Parent task set to #62420
The regex is too generic and also matching in https://openqa.suse.de/tests/5046475# which has:
[0m[37m[2020-11-21T15:33:08.515 CET] [debug] The built-in video encoder (pid 64116) terminated
[0m[37m[2020-11-21T15:33:08.517 CET] [debug] QEMU: QEMU emulator version 3.1.1.1 (openSUSE Leap 15.1)
[0m[37m[2020-11-21T15:33:08.517 CET] [debug] QEMU: Copyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers
[0m[37m[2020-11-21T15:33:08.518 CET] [debug] QEMU: qemu-system-x86_64: cannot set up guest memory 'pc.ram': Cannot allocate memory
which is a problem that the test maintainers are aware about but we mark the job with this ticket. I have kept the auto_review keyword here though to prevent alerts again appearing because the auto_review pipeline would fail.
Updated by okurz over 2 years ago
https://openqa.opensuse.org/tests/2144277# has this ticket linked but the error detail is pretty clear: "qemu-system-aarch64: -drive id=pflash-code-overlay0,if=pflash,file=/var/lib/openqa/pool/2/raid/pflash-code-overlay0,unit=0,readonly=on: Failed to lock byte 100"
Updated by openqa_review about 1 month ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: security_tpm2
https://openqa.suse.de/tests/15186210
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 220 days if nothing changes in this ticket.