Project

General

Profile

Actions

action #43631

closed

[tools] Job terminated by a SIGTERM, ending up incomplete, unclear reason for stopping even though test could have looked green so far, "Result: done"

Added by szarate over 5 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
-
Start date:
2018-11-09
Due date:
% Done:

0%

Estimated time:

Description

I only could get this from the logs, last call to script_output seemed to work fine... couldn't find any more information on the worker host :/


[2018-11-09T10:26:12.0480 CET] [debug] /var/lib/openqa/cache/tests/sle/tests/console/zypper_lifecycle.pm:38 called testapi::script_output
[2018-11-09T10:26:12.0481 CET] [debug] <<< testapi::wait_serial(timeout=90, record_output=1, regexp='SCRIPT_FINISHEDFetKi-\\d+-')
[2018-11-09T10:26:13.0367 CET] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 13082 (perl)
[2018-11-09T10:26:13.0389 CET] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2018-11-09T10:26:13.0391 CET] [debug] sending magic and exit
[2018-11-09T10:26:13.0391 CET] [debug] received magic close
[2018-11-09T10:26:13.0391 CET] [debug] THERE IS NOTHING TO READ 15 4 3
[2018-11-09T10:26:13.0404 CET] [debug] backend process exited: 0
[2018-11-09T10:26:13.0408 CET] [debug] commands process exited: 0
last frame
[2018-11-09T10:26:13.0451 CET] [debug] Driver backend collected unknown process with pid 12959 and exit status: 0
[2018-11-09T10:26:13.0451 CET] [debug] sysread failed: 
[2018-11-09T10:26:14.0451 CET] [debug] autotest received signal TERM, saving results of current test before exiting
[2018-11-09T10:26:14.0459 CET] [debug] [autotest] process exited: 1
[2018-11-09T10:26:15.0459 CET] [debug] isotovideo failed
[2018-11-09T10:26:15.0460 CET] [debug] killing backend process 12958
[2018-11-09T10:26:15.0460 CET] [debug] done with backend process
12909: EXIT 1
[2018-11-09T10:26:15.0534 CET] [info] Isotovideo exit status: 1

https://openqa.suse.de/tests/2248260#downloads


Related issues 2 (0 open2 closed)

Related to openQA Tests - action #55595: [cloud][pcm] debug memory dump - placeholderResolvedcfconrad2019-08-15

Actions
Related to openQA Project - coordination #62420: [epic] Distinguish all types of incompletesResolvedokurz2018-12-12

Actions
Actions #1

Updated by szarate over 5 years ago

  • Description updated (diff)
Actions #2

Updated by szarate over 5 years ago

  • Subject changed from Job terminated by a SIGTERM to [tools][u] Job terminated by a SIGTERM
Actions #3

Updated by szarate over 5 years ago

Also happens in https://openqa.suse.de/tests/2248379 while just waiting for install:


[2018-11-09T10:18:39.0516 CET] [debug] no match: 1863.6s, best candidate: rebootnow-20131217 (0.00)
[2018-11-09T10:18:40.0516 CET] [debug] no match: 1862.6s, best candidate: rebootnow-20131217 (0.00)
[2018-11-09T10:18:41.0100 CET] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 25510 (perl)
[2018-11-09T10:18:41.0133 CET] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2018-11-09T10:18:41.0136 CET] [debug] sending magic and exit
[2018-11-09T10:18:41.0137 CET] [debug] received magic close
[2018-11-09T10:18:41.0138 CET] [debug] THERE IS NOTHING TO READ 15 4 3
[2018-11-09T10:18:41.0160 CET] [debug] backend process exited: 0
[2018-11-09T10:18:41.0167 CET] [debug] commands process exited: 0
last frame
[2018-11-09T10:18:41.0221 CET] [debug] Driver backend collected unknown process with pid 7117 and exit status: 0
[2018-11-09T10:18:41.0222 CET] [debug] sysread failed: 
[2018-11-09T10:18:41.0224 CET] [debug] syswrite failed Broken pipe at /usr/lib/os-autoinst/myjsonrpc.pm line 38.
    myjsonrpc::send_json('GLOB(0x66dec68)', 'HASH(0x39167a0)') called at /usr/lib/os-autoinst/autotest.pm line 297
    autotest::query_isotovideo('backend_last_screenshot_data') called at /usr/lib/os-autoinst/basetest.pm line 510
    basetest::_result_add_screenshot('await_install=HASH(0x654a978)', 'HASH(0x654b038)') called at /usr/lib/os-autoinst/basetest.pm line 356
    basetest::runtest('await_install=HASH(0x654a978)') called at /usr/lib/os-autoinst/autotest.pm line 341
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 341
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 198
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 198
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 251
    autotest::__ANON__('Mojo::IOLoop::ReadWriteProcess=HASH(0x6873900)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop/ReadWriteProcess.pm line 325
    eval {...} called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop/ReadWriteProcess.pm line 325
    Mojo::IOLoop::ReadWriteProcess::_fork('Mojo::IOLoop::ReadWriteProcess=HASH(0x6873900)', 'CODE(0x68195b8)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop/ReadWriteProcess.pm line 476
    Mojo::IOLoop::ReadWriteProcess::start('Mojo::IOLoop::ReadWriteProcess=HASH(0x6873900)') called at /usr/lib/os-autoinst/autotest.pm line 252
    autotest::start_process() called at /usr/bin/isotovideo line 247

[2018-11-09T10:18:41.0237 CET] [debug] [autotest] process exited: 0
[2018-11-09T10:18:42.0237 CET] [debug] isotovideo failed
Actions #4

Updated by SLindoMansilla over 5 years ago

  • Subject changed from [tools][u] Job terminated by a SIGTERM to [tools] Job terminated by a SIGTERM

As spoken with okurz when grooming tickets, this ticket has no update since a month ago and after it doesn't need the [u] tag.

Actions #5

Updated by okurz almost 5 years ago

  • Category set to Regressions/Crashes
Actions #6

Updated by asmorodskyi over 4 years ago

not sure if this is exactly same problem but I notice such behavior :

[2019-08-13T21:32:41.838 CEST] [debug] barrier wait 'test_t05_ipip_tunnel_legacy_ready'
[2019-08-13T21:32:41.838 CEST] [debug] <<< testapi::record_info(title='Paused', output='Wait for test_t05_ipip_tunnel_legacy_ready (on parent job)', result='ok')
[2019-08-13T21:32:42.199 CEST] [debug] barrier 'test_t05_ipip_tunnel_legacy_ready' not released, sleeping 5s
[2019-08-13T21:32:48.012 CEST] [debug] barrier 'test_t05_ipip_tunnel_legacy_ready' not released, sleeping 5s
[2019-08-13T21:32:50.168 CEST] [debug] backend got TERM
[2019-08-13T21:32:50.168 CEST] [debug] autotest received signal TERM, saving results of current test before exiting
[2019-08-13T21:32:50.168 CEST] [debug] signalhandler got TERM
[2019-08-13T21:32:50.168 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2019-08-13T21:32:50.168 CEST] [debug] killing command server 17479 because test execution ended
[2019-08-13T21:32:50.168 CEST] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20163/9Ew8aRp7ecDYlo_R/broadcast
[2019-08-13T21:32:50.170 CEST] [debug] Collected unknown process with pid 17570 and exit status: 0
[2019-08-13T21:32:50.170 CEST] [debug] Collected unknown process with pid 17570 and exit status: 0
[2019-08-13T21:32:50.170 CEST] [debug] Collected unknown process with pid 17570 and exit status: 0
[2019-08-13T21:32:50.170 CEST] [debug] Collected unknown process with pid 17570 and exit status: 0
[2019-08-13T21:32:50.170 CEST] [debug] Collected unknown process with pid 17570 and exit status: 0
[2019-08-13T21:32:50.171 CEST] [debug] commands process exited: 0
[2019-08-13T21:32:50.173 CEST] [debug] [autotest] process exited: 1
[2019-08-13T21:32:50.174 CEST] [debug] isotovideo: unable to inform websocket clients about stopping command server: Connection refused at /usr/bin/isotovideo line 172.

[2019-08-13T21:32:50.174 CEST] [debug] done with command server
[2019-08-13T21:32:50.174 CEST] [debug] killing autotest process 17483
[2019-08-13T21:32:50.174 CEST] [debug] done with autotest process
[2019-08-13T21:32:50.174 CEST] [debug] isotovideo failed
Actions #7

Updated by okurz over 4 years ago

  • Subject changed from [tools] Job terminated by a SIGTERM to [tools] Job terminated by a SIGTERM, ending up incomplete, unclear reason for stopping even though test could have looked green so far

Looks related. Something similar I found in https://openqa.suse.de/tests/3256895/file/autoinst-log.txt . The job seems to be fine till the end, all modules passed and present but still the job is ending up as incomplete:

[2019-08-16T06:43:50.157 CEST] [debug] <<< testapi::check_shutdown(timeout=60)
[2019-08-16T06:43:50.158 CEST] [debug] QEMU status is not shutdown it is running
[2019-08-16T06:43:51.159 CEST] [debug] QEMU status is not shutdown it is running
[2019-08-16T06:43:52.160 CEST] [debug] QEMU status is not shutdown it is running
[2019-08-16T06:43:53.161 CEST] [debug] EVENT {"event":"SHUTDOWN","timestamp":{"microseconds":551889,"seconds":1565930632}}
[2019-08-16T06:43:53.161 CEST] [debug] EVENT {"event":"STOP","timestamp":{"microseconds":557204,"seconds":1565930632}}
[2019-08-16T06:43:53.289 CEST] [debug] ||| finished shutdown_ltp kernel at 2019-08-16 04:43:53 (32 s)
[2019-08-16T06:43:53.290 CEST] [debug] killing autotest process 38744
[2019-08-16T06:43:53.291 CEST] [debug] [autotest] process exited: 0
[2019-08-16T06:43:54.292 CEST] [debug] done with autotest process
[2019-08-16T06:43:54.292 CEST] [debug] killing command server 38742 because test execution ended
[2019-08-16T06:43:54.292 CEST] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/tRqH57ey5tQpTge9/broadcast
[2019-08-16T06:44:09.299 CEST] [debug] isotovideo: unable to inform websocket clients about stopping command server: Request timeout at /usr/bin/isotovideo line 175.

[2019-08-16T06:44:10.300 CEST] [error] can_read received kill signal at /usr/lib/os-autoinst/myjsonrpc.pm line 91.

[2019-08-16T06:44:10.304 CEST] [debug] commands process exited: 0
[2019-08-16T06:44:11.304 CEST] [debug] done with command server
[2019-08-16T06:44:11.304 CEST] [debug] isotovideo done
[2019-08-16T06:44:11.305 CEST] [debug] BACKEND SHUTDOWN 1
[2019-08-16T06:44:11.305 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2019-08-16T06:44:12.485 CEST] [debug] Driver backend collected unknown process with pid 38754 and exit status: 0
[2019-08-16T06:44:12.486 CEST] [debug] Error during stop_vm: can_read received kill signal at /usr/lib/os-autoinst/myjsonrpc.pm line 91.

[2019-08-16T06:44:12.486 CEST] [debug] killing backend process 38746
last frame
[2019-08-16T06:44:14.486 CEST] [debug] done with backend process
38737: EXIT 1
[2019-08-16T06:44:14.0516 CEST] [info] [pid:4124] Isotovideo exit status: 1
[2019-08-16T06:44:14.0518 CEST] [info] [pid:4124] Collected unknown process with pid 38747 and exit status: 0
[2019-08-16T06:44:14.0649 CEST] [info] [pid:4124] +++ worker notes +++
[2019-08-16T06:44:14.0649 CEST] [info] [pid:4124] End time: 2019-08-16 04:44:14
[2019-08-16T06:44:14.0649 CEST] [info] [pid:4124] Result: died
Actions #8

Updated by cfconrad over 4 years ago

  • Related to action #55595: [cloud][pcm] debug memory dump - placeholder added
Actions #9

Updated by okurz over 4 years ago

  • Subject changed from [tools] Job terminated by a SIGTERM, ending up incomplete, unclear reason for stopping even though test could have looked green so far to [tools] Job terminated by a SIGTERM, ending up incomplete, unclear reason for stopping even though test could have looked green so far, "Result: done"

https://openqa.suse.de/tests/3638779/file/autoinst-log.txt is not exactly the same as the job is incomplete but there is "EXIT 0" but I doubt we can move forward with the other older issues. In the above we have:

[2019-11-28T14:43:02.190 CET] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle-12-SP1/tests/kernel/shutdown_ltp.pm:46 called power_action_utils::power_action
[2019-11-28T14:43:02.190 CET] [debug] <<< testapi::check_shutdown(timeout=60)
[2019-11-28T14:43:02.193 CET] [debug] QEMU status is not 'shutdown', it is 'running'
…
[2019-11-28T14:43:15.231 CET] [debug] QEMU status is not 'shutdown', it is 'running'
…
[2019-11-28T14:43:24.671 CET] [debug] ||| finished shutdown_ltp kernel at 2019-11-28 13:43:24 (51 s)
[2019-11-28T14:43:24.675 CET] [debug] stopping autotest process 28775
…
[2019-11-28T14:43:27.727 CET] [debug] BACKEND SHUTDOWN 1
[2019-11-28T14:43:27.728 CET] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2019-11-28T14:43:29.768 CET] [debug] flushing frames
[2019-11-28T14:43:29.813 CET] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 28792 (/usr/bin/isotovideo: backen)
[2019-11-28T14:43:29.814 CET] [debug] sending magic and exit
[2019-11-28T14:43:29.815 CET] [debug] received magic close
[2019-11-28T14:43:29.841 CET] [debug] backend process exited: 0
[2019-11-28T14:43:30.842 CET] [debug] stopping backend process 28792
[2019-11-28T14:43:30.843 CET] [debug] done with backend process
28743: EXIT 0
[2019-11-28T14:43:30.0970 CET] [info] [pid:24123] Isotovideo exit status: 0
[2019-11-28T14:43:31.0073 CET] [info] [pid:24123] +++ worker notes +++
[2019-11-28T14:43:31.0073 CET] [info] [pid:24123] End time: 2019-11-28 13:43:31
[2019-11-28T14:43:31.0073 CET] [info] [pid:24123] Result: done
[2019-11-28T14:43:31.0096 CET] [info] [pid:17039] Uploading boot_ltp-config-3.12.74-60.64.124-default

so an incomplete job but "Result: done" and not "Result: died", that's weird.

Actions #10

Updated by ggardet_arm over 4 years ago

This job https://openqa.opensuse.org/tests/1137084 is flagged by this ticket and indeed, all looks ok, but it is incomplete. And there is not even a SIGTERM.

Actions #11

Updated by okurz over 4 years ago

Actions #12

Updated by okurz about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: ltp_syscalls_ipc
https://openqa.suse.de/tests/3918806

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #13

Updated by openqa_review about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: ltp_syscalls_ipc
https://openqa.suse.de/tests/3918806

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #14

Updated by openqa_review about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: ltp_syscalls
https://openqa.suse.de/tests/3980979

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #15

Updated by okurz about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: ltp_syscalls
https://openqa.suse.de/tests/3980979

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #16

Updated by okurz about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: ltp_syscalls
https://openqa.suse.de/tests/4044569

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #17

Updated by openqa_review about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: ltp_syscalls
https://openqa.suse.de/tests/4108480

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #18

Updated by okurz almost 4 years ago

  • Status changed from New to Resolved
  • Assignee set to okurz

The referenced jobs are gone but since then we have implemented the incomplete "reason" which is available from DB, API and UI any case like the above should be either gone or more clear in what's going on.

Actions

Also available in: Atom PDF