Project

General

Profile

Actions

action #6958

closed

qemu not dying

Added by coolo over 9 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2015-03-24
Due date:
% Done:

0%

Estimated time:

Description

Mär 17 12:29:04 ix64ph1073 worker[7561]: got job 18945: 00018945-sle-11-SP4-Alpha-Desktop-DVD-i586-Build0818-mediacheck
Mär 17 12:29:04 ix64ph1073 worker[7561]: 27047: WORKING 18945
Mär 17 12:43:36 ix64ph1073 worker[7561]: killing 27047
Mär 17 12:43:36 ix64ph1073 worker[7561]: waitpid returned error: No child processes
Mär 17 12:43:41 ix64ph1073 worker[7561]: cleaning up 00018945-sle-11-SP4-Alpha-Desktop-DVD-i586-Build0818-mediacheck...
Mär 17 12:45:04 ix64ph1073 worker[7561]: setting job 18945 to done
Mär 17 12:45:04 ix64ph1073 worker[7561]: got job 18977: 00018977-sle-11-SP4-Alpha-Server-DVD-x86_64-Build0975-ext3
Mär 17 12:45:04 ix64ph1073 worker[7561]: 27384: WORKING 18977
Mär 17 12:51:50 ix64ph1073 worker[7561]: backend crashed ...
Mär 17 12:51:50 ix64ph1073 worker[7561]: killing qemu 27413
Mär 17 12:51:55 ix64ph1073 worker[7561]: pid 27413 didn't die, sending KILL at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 48.
Mär 17 12:51:55 ix64ph1073 worker[7561]: killing os-autoinst 27384
Mär 17 12:52:00 ix64ph1073 worker[7561]: pid 27384 didn't die, sending KILL at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 48.
Mär 17 12:52:00 ix64ph1073 worker[7561]: killing 27384
Mär 17 12:52:50 ix64ph1073 worker[7561]: 2: Connection error: Connect timeout at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/UserAgent.pm line 269.
Mär 17 12:52:55 ix64ph1073 worker[7561]: Use of uninitialized value in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 305.
Mär 17 12:52:55 ix64ph1073 worker[7561]: Unable to upgrade connection to WebSocket: . proxy_wstunnel enabled? at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 305.
Mär 17 12:52:55 ix64ph1073 worker[7561]: Use of uninitialized value in string eq at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 306.
Mär 17 12:53:05 ix64ph1073 worker[7561]: 1: Connection error: Connect timeout at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/UserAgent.pm line 269.
Mär 17 12:53:15 ix64ph1073 worker[7561]: Use of uninitialized value in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 305.
Mär 17 12:53:15 ix64ph1073 worker[7561]: Unable to upgrade connection to WebSocket: . proxy_wstunnel enabled? at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 305.
Mär 17 12:53:15 ix64ph1073 worker[7561]: Use of uninitialized value in string eq at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 306.
Mär 17 12:53:20 ix64ph1073 worker[7561]: 0: Connection error: Connect timeout at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/UserAgent.pm line 269.
Mär 17 12:53:20 ix64ph1073 worker[7561]: cleaning up 00018977-sle-11-SP4-Alpha-Server-DVD-x86_64-Build0975-ext3...
Mär 17 12:53:20 ix64ph1073 worker[7561]: duplicating job 18977
Mär 17 12:53:20 ix64ph1073 worker[7561]: QEMU should be dead - WASUP?


Related issues 1 (0 open1 closed)

Related to openQA Project (public) - action #6770: worker does not recover from early qemu failClosedlnussel2015-03-18

Actions
Actions #1

Updated by oholecek over 9 years ago

I was wondering, what about using systemd-run to automatically create cgroup scope for qemu. Then later when we'll stop the unit, systemd should take care of any misbehaving qemu.
Draw back I so far see is missing perl bindings so either we call external binary (systemd-run) or add and use DBUS iterface.

Actions #2

Updated by oholecek over 9 years ago

  • Related to action #6770: worker does not recover from early qemu fail added
Actions #3

Updated by coolo over 9 years ago

  • Assignee set to lnussel
Actions #4

Updated by lnussel over 9 years ago

  • Status changed from New to Closed

doesn't happen anymore so closing

Actions #5

Updated by szarate over 7 years ago

This one just happened on mustang1

Jul 14 16:54:37 mustang1 worker[14548]: [ERROR] backend crashed ...
Jul 14 16:54:37 mustang1 worker[14548]: killed 22178 - waiting for exit at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 51.
Jul 14 16:54:42 mustang1 worker[14548]: can't open /var/lib/openqa/pool/1/testresults/result-force_cron_run.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 743.
Jul 14 16:54:42 mustang1 worker[14548]: [INFO] killing os-autoinst 22178
Jul 14 16:54:56 mustang1 worker[14548]: [INFO] cleaning up 00007926-sle-12-SP3-Server-DVD-aarch64-Build_MUSTANG1_0473-toolchain_zypper@aarch64
Jul 14 16:54:56 mustang1 worker[14548]: [INFO] got job 7927: 00007927-sle-12-SP3-Server-DVD-aarch64-Build_MUSTANG1_0473-xfs@aarch64
Jul 14 16:54:56 mustang1 worker[14548]: [DEBUG] CACHE: Health: Real size: 6192080896, Configured limit: 53687091200
Jul 14 16:54:56 mustang1 worker[14548]: [INFO] OpenQA::Worker::Cache: Initialized with http://deimos.suse.de at /var/lib/openqa/cache, current size is 6192080896
Jul 14 16:54:56 mustang1 worker[14548]: [DEBUG] Found ISO, caching SLE-12-SP3-Server-DVD-aarch64-Build0473-Media1.iso
Jul 14 16:54:56 mustang1 worker[14548]: [DEBUG] CACHE: Aquiring lock for /var/lib/openqa/cache/SLE-12-SP3-Server-DVD-aarch64-Build0473-Media1.iso in the database
Jul 14 16:54:57 mustang1 worker[14548]: [DEBUG] Calling flock /var/lib/openqa/cache/deimos.suse.de/needleslock rsync -avHP rsync://deimos.suse.de/openqa-tests/ --delete /var/lib/openqa/cache/deimos.suse.de/tests/
Jul 14 16:54:57 mustang1 worker[14548]: receiving incremental file list
Jul 14 16:54:57 mustang1 worker[14548]: sent 324 bytes  received 374,360 bytes  749,368.00 bytes/sec
Jul 14 16:54:57 mustang1 worker[14548]: total size is 1,795,374,775  speedup is 4,791.70
Jul 14 16:54:57 mustang1 worker[14548]: [DEBUG] RSYNC: Synchronization of tests directory took 0.00 seconds
Jul 14 16:54:58 mustang1 worker[14548]: [INFO] Waiting for subprocess
Jul 14 16:54:58 mustang1 worker[14548]: [INFO] Waiting for subprocess
Jul 14 16:54:58 mustang1 worker[14548]: [INFO] 22401: WORKING 7927
Jul 14 17:14:01 mustang1 worker[14548]: [INFO] cleaning up 00007927-sle-12-SP3-Server-DVD-aarch64-Build_MUSTANG1_0473-xfs@aarch64
Jul 14 17:14:01 mustang1 worker[14548]: [INFO] got job 7928: 00007928-sle-12-SP3-Server-DVD-aarch64-Build_MUSTANG1_0473-ext4@aarch64
Jul 14 17:14:01 mustang1 worker[14548]: [DEBUG] CACHE: Health: Real size: 6192080896, Configured limit: 53687091200
Jul 14 17:14:01 mustang1 worker[14548]: [INFO] OpenQA::Worker::Cache: Initialized with http://deimos.suse.de at /var/lib/openqa/cache, current size is 6192080896
Jul 14 17:14:01 mustang1 worker[14548]: [DEBUG] Found ISO, caching SLE-12-SP3-Server-DVD-aarch64-Build0473-Media1.iso
Jul 14 17:14:01 mustang1 worker[14548]: [DEBUG] CACHE: Aquiring lock for /var/lib/openqa/cache/SLE-12-SP3-Server-DVD-aarch64-Build0473-Media1.iso in the database
Jul 14 17:14:02 mustang1 worker[14548]: [DEBUG] Calling flock /var/lib/openqa/cache/deimos.suse.de/needleslock rsync -avHP rsync://deimos.suse.de/openqa-tests/ --delete /var/lib/openqa/cache/deimos.suse.de/tests/
Jul 14 17:14:02 mustang1 worker[14548]: receiving incremental file list
Jul 14 17:14:02 mustang1 worker[14548]: sent 324 bytes  received 374,356 bytes  749,360.00 bytes/sec
Jul 14 17:14:02 mustang1 worker[14548]: total size is 1,795,374,775  speedup is 4,791.76
Jul 14 17:14:02 mustang1 worker[14548]: [DEBUG] RSYNC: Synchronization of tests directory took 0.00 seconds
Jul 14 17:14:02 mustang1 worker[14548]: [INFO] Waiting for subprocess
Jul 14 17:14:02 mustang1 worker[14548]: [INFO] 22965: WORKING 7928
Jul 15 02:07:22 mustang1 worker[14548]: [WARN] max job time exceeded, aborting 00007928-sle-12-SP3-Server-DVD-aarch64-Build_MUSTANG1_0473-ext4@aarch64
Jul 15 02:07:22 mustang1 worker[14548]: killed 22965
Jul 15 02:07:23 mustang1 worker[14548]: [WARN] job 7928 spent more time than MAX_JOB_TIME
Jul 15 02:07:32 mustang1 worker[14548]: [ERROR] QEMU (22980 -> /usr/bin/qemu-system-aarch64) should be dead - WASUP?
Jul 15 02:07:32 mustang1 worker[14548]: [INFO] cleaning up 00007928-sle-12-SP3-Server-DVD-aarch64-Build_MUSTANG1_0473-ext4@aarch64
Jul 15 02:07:32 mustang1 systemd[1]: openqa-worker@1.service: Main process exited, code=exited, status=1/FAILURE
Jul 15 02:07:32 mustang1 systemd[1]: openqa-worker@1.service: Unit entered failed state.
Jul 15 02:07:32 mustang1 systemd[1]: openqa-worker@1.service: Failed with result 'exit-code'.
Actions

Also available in: Atom PDF