Project

General

Profile

Actions

action #10208

closed

Tests sometimes fail with 'timeout reading hmp socket'

Added by dimstar over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2016-01-13
Due date:
% Done:

100%

Estimated time:

Description

from autoinst-log.txt (https://openqa.opensuse.org/tests/113600/file/autoinst-log.txt)

DIE ERROR: timeout reading hmp socket
at /usr/lib/os-autoinst/backend/baseclass.pm line 58 thread 1.
backend::baseclass::die_handler("ERROR: timeout reading hmp socket\x{a}") called at /usr/lib/os-autoinst/backend/qemu.pm line 723 thread 1
backend::qemu::_read_hmp(backend::qemu=HASH(0x57e1d98)) called at /usr/lib/os-autoinst/backend/qemu.pm line 790 thread 1
backend::qemu::_send_hmp(backend::qemu=HASH(0x57e1d98), "savevm lastgood") called at /usr/lib/os-autoinst/backend/qemu.pm line 138 thread 1
backend::qemu::do_savevm(backend::qemu=HASH(0x57e1d98), HASH(0x7f7b781f65d0)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 53 thread 1
backend::baseclass::handle_command(backend::qemu=HASH(0x57e1d98), HASH(0x7f7b781f28a0)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 374 thread 1
backend::baseclass::check_socket(backend::qemu=HASH(0x57e1d98), IO::Handle=GLOB(0x45d9820)) called at /usr/lib/os-autoinst/backend/qemu.pm line 815 thread 1
backend::qemu::check_socket(backend::qemu=HASH(0x57e1d98), IO::Handle=GLOB(0x45d9820)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 176 thread 1
eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 143 thread 1
backend::baseclass::run_capture_loop(backend::qemu=HASH(0x57e1d98), IO::Select=ARRAY(0x5433ed0)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 98 thread 1
backend::baseclass::run(backend::qemu=HASH(0x57e1d98), 14, 17) called at /usr/lib/os-autoinst/backend/driver.pm line 66 thread 1
backend::driver::_run(backend::qemu=HASH(0x57e1d98), 14, 17) called at /usr/lib/os-autoinst/backend/driver.pm line 53 thread 1
eval {...} called at /usr/lib/os-autoinst/backend/driver.pm line 53 thread 1
waitpid for 28972 returned 28972
QEMU: qemu-system-x86_64: /home/abuild/rpmbuild/BUILD/qemu-2.1.3/hw/i386/kvm/clock.c:69: kvmclock_current_nsec: Assertion `time.tsc_timestamp <= migration_tsc' failed.
sending magic and exit
received magic close

openQA is nice and auto-reschedules those tests, but it would be even better to find why those fail

Actions #1

Updated by coolo over 8 years ago

It saves the vm and doesn't get an answer from qemu. This is also part of your log:

QEMU: qemu-system-x86_64: /home/abuild/rpmbuild/BUILD/qemu-2.1.3/hw/i386/kvm/clock.c:69: kvmclock_current_nsec: Assertion `time.tsc_timestamp <= migration_tsc' failed.

And I think that's the problem. Now I'm not a qemu hacker, so I can't tell you what the meaning of this is.

Actions #2

Updated by dimstar over 8 years ago

Some pointers I found so far seem to hint at a kernel issue, that should be fixed with:

commit 7f187922ddf6b67f2999a76dcb71663097b75497
KVM: x86: update masterclock values on TSC writes

of course I'm not sure if our 3.16.7 from openSUSE 13.2 has that commit - I might just open a boo# entry to check with the kernel team

Actions #3

Updated by dimstar over 8 years ago

Filed

https://bugzilla.opensuse.org/show_bug.cgi?id=961739

My quick glance at the kernel comes to the conclusion that we do not have the patch in our kernel

Actions #4

Updated by okurz over 8 years ago

I did not know openQA already has a "re-scheduling" feature. Is the "reschedule" somehow reported in a way easily visible in the UI?

Actions #5

Updated by dimstar over 8 years ago

It might also be an external script that restarted the 'incomplete' tests
(check openqa.opensuse.org/tests, search for tests containing "update_leap_" (including obsolete tests)

The audit trail lists those restarted ones as:
about 2 hours ago coolo efc296cb75b729269943cb097320711e job_duplicate { auto => 1, id => 113600, result => 1...

Actions #6

Updated by coolo over 8 years ago

that's just the worker running under my credentionals. The restart is done by the worker if incomplete, i.e. crashes os-autoinst

Actions #7

Updated by RBrownSUSE over 8 years ago

  • Target version set to Milestone 1
Actions #8

Updated by dimstar over 8 years ago

As per https://bugzilla.novell.com/show_bug.cgi?id=961739 : maintenance update is being prepared. Once out, the kernel update is hopefully eliminating that issue for us.

Actions #9

Updated by RBrownSUSE over 8 years ago

  • Assignee set to dimstar
Actions #10

Updated by dimstar about 8 years ago

openSUSE-SU-2016:0301-1: An update that solves 57 vulnerabilities and has 21
fixes is now available.

Category: security (important)
Bug References:
814440,851610,869564,873385,906545,907818,909077,909477,911326,912202,915517,915577,917830,918333,919007,919018,919463,919596,921313,921949,922583,922936,922944,926238,926240,927780,927786,928130,929525,930399,931988,932348,933896,933904,933907,933934,935542,935705,936502,936831,937032,937033,937969,938706,940338,944296,945825,947155,949936,950998,951194,951440,951627,952384,952579,952976,953052,953527,954138,954404,955224,955354,955422,956708,956934,957988,957990,958504,958510,958886,958951,959190,959399,959568,960839,961509,961739,962075
CVE References:
CVE-2014-2568,CVE-2014-8133,CVE-2014-8989,CVE-2014-9090,CVE-2014-9419,CVE-2014-9529,CVE-2014-9683,CVE-2014-9715,CVE-2014-9728,CVE-2014-9729,CVE-2014-9730,CVE-2014-9731,CVE-2015-0272,CVE-2015-0777,CVE-2015-1420,CVE-2015-1421,CVE-2015-2041,CVE-2015-2042,CVE-2015-2150,CVE-2015-2666,CVE-2015-2830,CVE-2015-2922,CVE-2015-2925,CVE-2015-3212,CVE-2015-3339,CVE-2015-3636,CVE-2015-4001,CVE-2015-4002,CVE-2015-4003,CVE-2015-4004,CVE-2015-4036,CVE-2015-4167,CVE-2015-4692,CVE-2015-4700,CVE-2015-5157,CVE-2015-5283,CVE-2015-5307,CVE-2015-5364,CVE-2015-5366,CVE-2015-5707,CVE-2015-6937,CVE-2015-7550,CVE-2015-7799,CVE-2015-7833,CVE-2015-7872,CVE-2015-7885,CVE-2015-7990,CVE-2015-8104,CVE-2015-8215,CVE-2015-8543,CVE-2015-8550,CVE-2015-8551,CVE-2015-8552,CVE-2015-8569,CVE-2015-8575,CVE-2015-8767,CVE-2016-0728
Sources used:

Actions #11

Updated by dimstar about 8 years ago

  • Status changed from New to Resolved

The kernel has been released and I have not seen this issue surface in a while now - seems it was the right fix in the end

Actions #12

Updated by dimstar about 8 years ago

  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF