Project

General

Profile

Actions

action #12344

closed

sporadic "corrupt images" in svirt based test on zkvm

Added by okurz almost 8 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Low
Assignee:
-
Category:
Feature requests
Target version:
-
Start date:
2016-06-15
Due date:
% Done:

0%

Estimated time:

Description

observation

svirt based tests can die when booting before installation because the image is unusable, e.g. see
http://lord.arch/tests/1048/file/autoinst-log.txt

Domain openQA-SUT-12 defined from /var/lib/libvirt/images/openQA-SUT-12.xml


20:13:09.4589 Command's stderr:

20:13:10.7595 Command's stdout:


20:13:10.7596 Command's stderr:
error: Failed to start domain openQA-SUT-12
error: internal error: process exited while connecting to monitor: 2016-06-13T20:17:10.483081Z qemu-system-s390x: -drive file=/var/lib/libvirt/images/openQA-SUT-12.img,if=none,id=drive-virtio-disk0,format=qcow2: qcow2: Image is corrupt; cannot be opened read/write


20:13:10.8867 # Test died:
{
  'args' => [],
  'console' => 'svirt',
  'function' => 'define_and_start'
}
virsh start failed at /local/os-autoinst/consoles/sshVirtsh.pm line 392.

also seen in
http://lord.arch/tests/1055
http://lord.arch/tests/1087

steps to reproduce

Run zkvm tests often, happens in about 3/20 runs, at least in my setup.

problem

H1. specific to my openQA+reserved_s390x@zkvm worker (VIRSH_HOSTNAME=s390pb.suse.de, VIRSH_GUEST=10.161.145.7, VIRSH_INSTANCE=12)
H2. the s390x guest is "reused" or the image is cleanup up by someone else
H3. out of disk space and return value of qemu-img command is not properly checked?

suggestion

We could improve the check for the image being present before we try to access it again after reboot.

workaround

sporadic, so retrigger


Related issues 1 (0 open1 closed)

Copied to openQA Project - action #12838: sporadic "corrupt images" in various tests or fails uploading, e.g. with "Premature connection close"Resolvedoholecek2016-06-15

Actions
Actions #1

Updated by okurz almost 8 years ago

  • Description updated (diff)
  • Assignee set to mgriessmeier
Actions #2

Updated by mgriessmeier almost 8 years ago

I did clone this test from you 20 times
see https://opeth.suse.de/tests/{2763..2783}

none of them failed with the described issue

Actions #3

Updated by okurz almost 8 years ago

  • Copied to action #12838: sporadic "corrupt images" in various tests or fails uploading, e.g. with "Premature connection close" added
Actions #4

Updated by okurz almost 8 years ago

  • Assignee changed from mgriessmeier to okurz
  • Priority changed from Normal to Low

looks like so far only locally.

Actions #5

Updated by mgriessmeier over 7 years ago

happened now on o.s.d ...
https://openqa.suse.de/tests/567756

I'll retrigger it to see if it happens again

Actions #6

Updated by michalnowak over 7 years ago

Can't get the op's autoinst-log.txt, but I guess that perhaps qemu-img did not finish before virsh start .... Also the execution of commands over SSH got enhanced about a month ago in the svirt backend (e.g. usage of EOF).

Actions #7

Updated by okurz over 7 years ago

  • Assignee deleted (okurz)
Actions #8

Updated by michalnowak about 7 years ago

Has anyone seen this recently?

Actions #9

Updated by okurz@suse.de about 7 years ago

no

Actions #10

Updated by coolo over 6 years ago

  • Status changed from New to Resolved

then let's close it

Actions

Also available in: Atom PDF