action #60443
closedjob incomplete with "(?s)process exited: 0.*isotovideo failed.*EXIT 1":retry but no further details what is wrong
Description
Observation¶
https://openqa.suse.de/tests/3641268 first failed due to a problem in tests vs. product behaviour but then the job incompleted somewhere in the middle of the post_fail_hook:
[2019-11-29T01:42:19.595 CET] [debug] <<< testapi::send_key(key='ret', do_wait=0, wait_screen_change=0)
[2019-11-29T01:42:19.996 CET] [debug] <<< testapi::wait_serial(record_output=undef, no_regex=0, regexp='Reached target Shutdown', quiet=undef, timeout=90, expect_not_found=0, buffer_size=undef)
[2019-11-29T01:43:51.136 CET] [debug] >>> testapi::wait_serial: Reached target Shutdown: fail
[2019-11-29T01:43:51.136 CET] [debug] <<< testapi::send_key(key='esc', wait_screen_change=0, do_wait=0)
[2019-11-29T01:43:51.647 CET] [debug] <<< testapi::send_key(key='spc', do_wait=0, wait_screen_change=0)
[2019-11-29T01:43:52.055 CET] [debug] [autotest] process exited: 0
[2019-11-29T01:43:52.056 CET] [debug] terminating command server 77732 because test execution ended
[2019-11-29T01:43:52.056 CET] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20033/WYckk7Uc8qxL6MAU/broadcast
[2019-11-29T01:43:53.066 CET] [debug] commands process exited: 0
[2019-11-29T01:43:54.066 CET] [debug] done with command server
[2019-11-29T01:43:54.066 CET] [debug] stopping autotest process 77744
[2019-11-29T01:43:54.066 CET] [debug] done with autotest process
[2019-11-29T01:43:54.066 CET] [debug] isotovideo failed
[2019-11-29T01:43:54.067 CET] [debug] stopping backend process 77745
[2019-11-29T01:43:55.068 CET] [debug] backend got TERM
[2019-11-29T01:43:55.068 CET] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2019-11-29T01:43:58.068 CET] [debug] done with backend process
77728: EXIT 1
[2019-11-29T01:43:58.0106 CET] [info] [pid:95895] Isotovideo exit status: 1
[2019-11-29T01:44:01.0134 CET] [info] [pid:95895] +++ worker notes +++
[2019-11-29T01:44:01.0134 CET] [info] [pid:95895] End time: 2019-11-29 00:44:01
[2019-11-29T01:44:01.0134 CET] [info] [pid:95895] Result: died
It seems the VM did not reach shutdown in time which one can see in "testapi::wait_serial: Reached target Shutdown: fail" and after the next two key presses of "esc" and "spc" the autotest process exited and the rest follows.
Expected Result¶
Test should fail not incomplete
Updated by okurz about 5 years ago
- Subject changed from job incomplete with "isotovideo failed" but no further details what is wrong to job incomplete with auto_review:"(?s)isotovideo failed.*EXIT 1" but no further details what is wrong
Updated by okurz almost 5 years ago
- Subject changed from job incomplete with auto_review:"(?s)isotovideo failed.*EXIT 1" but no further details what is wrong to job incomplete with auto_review:"(?s).autotest. process exited: 0.*isotovideo failed.*EXIT 1" but no further details what is wrong
Restricting to regex to not match on #60161 cases, e.g. as in https://openqa.suse.de/tests/3772385/file/autoinst-log.txt where it says:
The console 'sut' is not responding (half-open socket?). Make sure the console is reachable or disable stall detection on expected disconnects with '$console->disable_vnc_stalls', for example in case of intended machine shutdown
[2020-01-09T16:22:59.576 CET] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2020-01-09T16:23:01.744 CET] [debug] flushing frames
[2020-01-09T16:23:01.782 CET] [debug] QEMU: QEMU emulator version 3.1.1.1 (openSUSE Leap 15.1)
[2020-01-09T16:23:01.782 CET] [debug] QEMU: Copyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers
[2020-01-09T16:23:01.782 CET] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 30026 (/usr/bin/isotovideo: backen)
[2020-01-09T16:23:01.783 CET] [debug] sending magic and exit
[2020-01-09T16:23:01.784 CET] [debug] received magic close
[2020-01-09T16:23:01.785 CET] [debug] THERE IS NOTHING TO READ 15 4 3
[2020-01-09T16:23:01.785 CET] [debug] terminating command server 29985 because test execution ended
[2020-01-09T16:23:01.785 CET] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20193/raztdqRUz9de9kg8/broadcast
[2020-01-09T16:23:01.800 CET] [debug] commands process exited: 0
[2020-01-09T16:23:01.800 CET] [debug] done with command server
[2020-01-09T16:23:01.800 CET] [debug] stopping autotest process 30006
[2020-01-09T16:23:01.805 CET] [debug] backend process exited: 0
[2020-01-09T16:23:01.808 CET] [debug] done with autotest process
[2020-01-09T16:23:01.808 CET] [debug] isotovideo failed
[2020-01-09T16:23:01.809 CET] [debug] stopping backend process 30026
[2020-01-09T16:23:01.809 CET] [debug] done with backend process
29979: EXIT 1
so I added the '[autotest] process exited: 0', maybe that's enough.
Updated by okurz almost 5 years ago
- Related to action #59926: test incompletes in middle of execution with auto_review:"Unexpected end of data 0":retry, system journal shows 'kernel: traps: /usr/bin/isotov[2300] general protection ip:7fd5ef11771e sp:7ffe066f2200 error:0 in libc-2.26.so[7fd5ef094000+1b1000]' added
Updated by okurz almost 5 years ago
https://openqa.suse.de/tests/3772621/file/autoinst-log.txt was caught by auto-review but it does provide a bit more details:
[2020-01-09T18:04:19.617 CET] [debug] no match: 295.6s, best candidate: displaymanager-sles4sap-nousers-20170117 (0.28)
[2020-01-09T18:04:19.623 CET] [debug] no change: 291.6s
[2020-01-09T18:04:19.867 CET] [debug] considering VNC stalled, no update for 6.14 seconds
[2020-01-09T18:04:20.121 CET] [debug] pointer type 1 0 1024 768 -257
[2020-01-09T18:04:20.121 CET] [debug] led state 0 1 1 -261
[2020-01-09T18:04:20.618 CET] [debug] no change: 290.6s
[2020-01-09T18:04:21.620 CET] [debug] no change: 289.6s
/usr/bin/isotovideo: backen: malloc.c:3829: _int_malloc: Assertion `chunk_main_arena (bck->bk)' failed.
Unexpected end of data 0
[2020-01-09T18:04:22.640 CET] [debug] backend process exited: 0
[2020-01-09T18:04:22.641 CET] [debug] terminating command server 32299 because test execution ended
[2020-01-09T18:04:22.641 CET] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20153/0NP1yqsAinoUXvjN/broadcast
[2020-01-09T18:04:22.642 CET] [debug] Driver backend collected unknown process with pid 32378 and exit status: 1
[2020-01-09T18:04:22.656 CET] [debug] commands process exited: 0
[2020-01-09T18:04:22.656 CET] [debug] done with command server
[2020-01-09T18:04:22.657 CET] [debug] stopping autotest process 32314
[2020-01-09T18:04:22.657 CET] [debug] autotest received signal TERM, saving results of current test before exiting
[2020-01-09T18:04:23.243 CET] [debug] [autotest] process exited: 0
[2020-01-09T18:04:23.243 CET] [debug] done with autotest process
[2020-01-09T18:04:23.243 CET] [debug] isotovideo failed
[2020-01-09T18:04:23.244 CET] [debug] stopping backend process 32346
[2020-01-09T18:04:23.244 CET] [debug] done with backend process
32296: EXIT 1
see the line
/usr/bin/isotovideo: backen: malloc.c:3829: _int_malloc: Assertion `chunk_main_arena (bck->bk)' failed.
maybe the underlying issue is actually the same as #59926 , that isotovideo just crashes, sometimes with a timely message, sometimes silent.
Updated by okurz almost 5 years ago
- Has duplicate action #56261: [tools] test fails in yast2_bootloader- isotovideo failed, test execuition interrupted on xen added
Updated by okurz almost 5 years ago
- Related to coordination #62420: [epic] Distinguish all types of incompletes added
Updated by okurz almost 5 years ago
- Has duplicate action #60200: "considering VNC stalled, no update for" job incompletes just after VNC stall detection, no clear error added
Updated by okurz almost 5 years ago
https://openqa.suse.de/tests/3850098/file/autoinst-log.txt looks like the same to me but it does not have ".autotest. process exited: 0".
system journal on the system shows:
Jan 31 02:30:02 openqaworker3 kernel: perl[30128]: segfault at 514 ip 0000563252eb16a0 sp 00007f72fb89d830 error 4 in perl[563252df1000+1f9000]
Jan 31 02:30:02 openqaworker3 kernel: perl[30129]: segfault at 514 ip 0000563252eb16a0 sp 00007f72fb09c830 error 4
Jan 31 02:30:02 openqaworker3 kernel: perl[30130]: segfault at 514 ip 0000563252eb16a0 sp 00007f72fa89b830 error 4 in perl[563252df1000+1f9000]
Jan 31 02:30:02 openqaworker3 kernel: perl[30131]: segfault at 514 ip 0000563252eb16a0 sp 00007f72fa09a830 error 4
Jan 31 02:30:02 openqaworker3 kernel: in perl[563252df1000+1f9000]
Jan 31 02:30:02 openqaworker3 kernel: in perl[563252df1000+1f9000]
Jan 31 02:30:02 openqaworker3 systemd-coredump[30520]: Failed to get EXE, ignoring: No such process
Jan 31 02:30:02 openqaworker3 systemd[1]: Started Process Core Dump (PID 30520/UID 0).
Jan 31 02:30:02 openqaworker3 systemd-coredump[30521]: Failed to compress (unnamed temporary file): Invalid argument
Jan 31 02:30:02 openqaworker3 systemd-coredump[30521]: Failed to generate stack trace: invalid `Elf' handle
Jan 31 02:30:02 openqaworker3 systemd-coredump[30521]: Process 30127 (/usr/bin/isotov) of user 489 dumped core.
Jan 31 02:30:02 openqaworker3 worker[28728]: [debug] [pid:30490] Uploading artefact yast2_nfs_server-117.png as f0023ce600fba82ebe50d0a7e9486532
Jan 31 02:30:03 openqaworker3 worker[28722]: [debug] [pid:28722] REST-API call: POST http://openqa.suse.de/api/v1/jobs/3850089/status
Jan 31 02:30:03 openqaworker3 worker[28177]: [debug] [pid:28177] REST-API call: POST http://openqa.suse.de/api/v1/jobs/3849157/status
Jan 31 02:30:03 openqaworker3 systemd-coredump[30493]: Process 30143 (/usr/bin/isotov) of user 489 dumped core.
Stack trace of thread 30261:
#0 0x0000563252eb16a0 Perl_csighandler (perl)
#1 0x00007f7307b1f300 __restore_rt (libpthread.so.0)
#2 0x00007f7307846269 syscall (libc.so.6)
so it looks like it's actually the same as #59926
Updated by okurz almost 5 years ago
- Subject changed from job incomplete with auto_review:"(?s).autotest. process exited: 0.*isotovideo failed.*EXIT 1" but no further details what is wrong to job incomplete with auto_review:"(?s)process exited: 0.*isotovideo failed.*EXIT 1" but no further details what is wrong
I guess we can live with a bit more greedy regex.
Updated by cfconrad almost 5 years ago
- Status changed from New to In Progress
- Assignee set to cfconrad
This may be the same issue as we have in Wicked https://openqa.suse.de/tests/3925667
I'm not able to reproduce such issue in my own test environment. So I asked to get two worker instances on one of our workers.
I agreed with okurz to get two like this:
systemctl disable --now salt-minion openqa-worker@{23,24}
Updated by cfconrad almost 5 years ago
- Status changed from In Progress to Workable
- Assignee deleted (
cfconrad)
Duo to other tasks I reenabled the worker instances again. Will follow up on this later.
sudo systemctl enable --now salt-minion openqa-worker@{23,24}
sudo systemctl start --now salt-minion openqa-worker@{23,24}
Updated by okurz almost 5 years ago
cfconrad wrote:
Duo to other tasks I reenabled the worker instances again. Will follow up on this later.
sudo systemctl enable --now salt-minion openqa-worker@{23,24} sudo systemctl start --now salt-minion openqa-worker@{23,24}
Just a small hint: enable --now
already means "enable and start".
Updated by okurz almost 5 years ago
- Subject changed from job incomplete with auto_review:"(?s)process exited: 0.*isotovideo failed.*EXIT 1" but no further details what is wrong to job incomplete with auto_review:"(?s)process exited: 0.*isotovideo failed.*EXIT 1":retry but no further details what is wrong
Updated by okurz over 4 years ago
recent example: https://openqa.opensuse.org/tests/1219514/file/autoinst-log.txt
[2020-03-31T04:23:41.093 CEST] [debug] tests/installation/boot_encrypt.pm:26 called utils::unlock_if_encrypted -> lib/utils.pm:320 called testapi::send_key
[2020-03-31T04:23:41.093 CEST] [debug] <<< testapi::send_key(key="ret", do_wait=0, wait_screen_change=0)
[2020-03-31T04:23:41.365 CEST] [debug] ||| finished boot_encrypt installation at 2020-03-31 02:23:41 (16 s)
[2020-03-31T04:23:41.365 CEST] [debug] ||| starting first_boot tests/installation/first_boot.pm
[2020-03-31T04:23:41.366 CEST] [debug] tests/installation/first_boot.pm:25 called opensusebasetest::wait_boot_past_bootloader -> lib/opensusebasetest.pm:898 called opensusebasetest::handle_displaymanager_login -> lib/opensusebasetest.pm:731 called testapi::assert_screen
[2020-03-31T04:23:41.366 CEST] [debug] <<< testapi::assert_screen(mustmatch=[
"displaymanager",
"emergency-shell",
"emergency-mode"
], timeout=300)
[2020-03-31T04:23:42.674 CEST] [debug] WARNING: check_asserted_screen took 0.67 seconds for 39 candidate needles - make your needles more specific
[2020-03-31T04:23:42.674 CEST] [debug] no match: 299.3s, best candidate: displaymanager-gdm-20170703 (0.17)
[2020-03-31T04:23:43.003 CEST] [debug] no change: 298.3s
[2020-03-31T04:23:44.004 CEST] [debug] no change: 297.3s
[2020-03-31T04:23:45.005 CEST] [debug] no change: 296.3s
[2020-03-31T04:23:50.214 CEST] [debug] WARNING: check_asserted_screen took 4.21 seconds for 39 candidate needles - make your needles more specific
[2020-03-31T04:23:50.214 CEST] [debug] no match: 295.3s, best candidate: displaymanager-gdm-20170703 (0.17)
[2020-03-31T04:23:50.222 CEST] [debug] no change: 291.1s
[2020-03-31T04:23:50.465 CEST] [debug] considering VNC stalled, no update for 4.79 seconds
[2020-03-31T04:23:50.719 CEST] [debug] pointer type 1 0 1024 768 -257
[2020-03-31T04:23:50.719 CEST] [debug] led state 0 1 1 -261
[2020-03-31T04:23:51.217 CEST] [debug] no change: 290.1s
[2020-03-31T04:23:52.217 CEST] [debug] no change: 289.1s
[2020-03-31T04:23:53.278 CEST] [debug] backend process exited: 0
[2020-03-31T04:23:53.278 CEST] [debug] terminating command server 6427 because test execution ended
[2020-03-31T04:23:53.278 CEST] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/YvCTf_zBVTT1rF2g/broadcast
[2020-03-31T04:23:53.289 CEST] [debug] commands process exited: 0
[2020-03-31T04:23:53.289 CEST] [debug] done with command server
[2020-03-31T04:23:53.289 CEST] [debug] stopping autotest process 6516
[2020-03-31T04:23:53.289 CEST] [debug] autotest received signal TERM, saving results of current test before exiting
[2020-03-31T04:23:53.295 CEST] [debug] [autotest] process exited: 0
[2020-03-31T04:23:53.295 CEST] [debug] done with autotest process
[2020-03-31T04:23:53.295 CEST] [debug] isotovideo failed
[2020-03-31T04:23:53.296 CEST] [debug] stopping backend process 6532
[2020-03-31T04:23:53.296 CEST] [debug] done with backend process
6421: EXIT 1
[2020-03-31T04:23:53.0345 CEST] [info] Isotovideo exit status: 1
[2020-03-31T04:23:56.0381 CEST] [info] +++ worker notes +++
[2020-03-31T04:23:56.0381 CEST] [info] End time: 2020-03-31 02:23:56
[2020-03-31T04:23:56.0381 CEST] [info] Result: died: terminated prematurely, see log output for details
Updated by okurz over 4 years ago
- Related to action #53999: openqa-worker (isotovideo) dumps core / segfaults quite often on several workers and distributions for cancelled jobs added
Updated by asmorodskyi over 4 years ago
#66619 - cross posting here too , there might be some connection between issues
Updated by okurz over 4 years ago
I do not see a helpful relation to that ticket.
https://openqa.opensuse.org/tests/1268572/file/autoinst-log.txt is a recent example:
[2020-05-16T18:34:07.132 CEST] [debug] >>> testapi::_check_backend_response: match=emergency-mode,emergency-shell,generic-desktop,opensuse-welcome timed out after 30 (check_screen)
[2020-05-16T18:34:07.155 CEST] [debug] tests/installation/setup_zdup.pm:24 called opensusebasetest::wait_boot -> lib/opensusebasetest.pm:1003 called opensusebasetest::wait_boot_past_bootloader -> lib/opensusebasetest.pm:932 called testapi::check_screen
[2020-05-16T18:34:07.155 CEST] [debug] <<< testapi::check_screen(mustmatch=[
"generic-desktop",
"emergency-shell",
"emergency-mode",
"opensuse-welcome"
], timeout=30)
[2020-05-16T18:34:08.767 CEST] [debug] WARNING: check_asserted_screen took 1.61 seconds for 37 candidate needles - make your needles more specific
[2020-05-16T18:34:08.768 CEST] [debug] no match: 29.9s, best candidate: generic-desktop-boo1170586_welcome_in_taskbar_but_no_window-20200503 (0.00)
[2020-05-16T18:34:08.775 CEST] [debug] no change: 28.3s
…
[2020-05-16T18:34:31.161 CEST] [debug] no change: 5.9s
[2020-05-16T18:34:31.662 CEST] [debug] considering VNC stalled, no update for 4.00 seconds
[2020-05-16T18:34:31.913 CEST] [debug] pointer type 1 0 1024 768 -257
[2020-05-16T18:34:31.914 CEST] [debug] led state 0 1 1 -261
[2020-05-16T18:34:32.162 CEST] [debug] no change: 4.9s
[2020-05-16T18:34:34.880 CEST] [debug] WARNING: check_asserted_screen took 1.72 seconds for 37 candidate needles - make your needles more specific
[2020-05-16T18:34:34.880 CEST] [debug] no match: 3.9s, best candidate: generic-desktop-boo1170586_welcome_in_taskbar_but_no_window-20200503 (0.00)
[2020-05-16T18:34:34.896 CEST] [debug] no change: 2.2s
[2020-05-16T18:34:37.626 CEST] [debug] WARNING: check_asserted_screen took 1.73 seconds for 37 candidate needles - make your needles more specific
[2020-05-16T18:34:37.626 CEST] [debug] no match: 1.2s, best candidate: generic-desktop-boo1170586_welcome_in_taskbar_but_no_window-20200503 (0.00)
[2020-05-16T18:34:39.530 CEST] [debug] WARNING: check_asserted_screen took 1.90 seconds for 37 candidate needles - make your needles more specific
[2020-05-16T18:34:39.530 CEST] [debug] no match: -0.5s, best candidate: generic-desktop-boo1170586_welcome_in_taskbar_but_no_window-20200503 (0.00)
[2020-05-16T18:34:39.578 CEST] [debug] backend process exited: 0
[2020-05-16T18:34:39.579 CEST] [debug] terminating command server 15026 because test execution ended
…
15021: EXIT 1
Is maybe the last, negative timeout value related or the "considering VNC stalled"?
Updated by okurz over 4 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: offline_sles15_pscc_lp-basesys-srv-desk-dev-contm-lgm-wsm_all_full
https://openqa.suse.de/tests/4297153
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by openqa_review over 4 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: offline_sles12sp5_pscc_sdk-tcm-wsm_all_full
https://openqa.suse.de/tests/4328541
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by okurz over 4 years ago
- Target version set to Ready
this should be crosschecked against the current state as we have better reporting with the incomplete "reason". Also check current state within https://gitlab.suse.de/openqa/auto-review/pipelines
Updated by okurz over 4 years ago
- Subject changed from job incomplete with auto_review:"(?s)process exited: 0.*isotovideo failed.*EXIT 1":retry but no further details what is wrong to job incomplete with "(?s)process exited: 0.*isotovideo failed.*EXIT 1":retry but no further details what is wrong
- Status changed from Workable to Resolved
- Assignee set to okurz
situation looks better. Because of the rather generic catch-expression https://gitlab.suse.de/openqa/auto-review still referenced this ticket and e.g. SQL query select jobs.id,t_finished,test,reason,result from jobs,comments where result = 'incomplete' and t_finished >= '2020-07-20' and comments.job_id = jobs.id and text ~ 'poo#60443';
shows some but they are for another reason, e.g. https://openqa.opensuse.org/tests/1342177# with more details and linked to video encoder config failures.