Project

General

Profile

action #112742

[tools] aarch64 - qemu-img: /var/lib/openqa/pool/14/raid/hd0-overlay0: Image is not in qcow2 format

Added by punkioudi about 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Concrete Bugs
Target version:
Start date:
2022-06-20
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

Many aarch64 jobs that are using qcow2 images are failing with the following error:
[2022-06-20T12:00:57.994995+02:00] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
runcmd '/usr/bin/qemu-img create -f qcow2 -F qcow2 -b /var/lib/openqa/pool/14/SLES-15-SP1-aarch64-mru-install-minimal-with-addons-Build20220619-1-Server-DVD-Updates-aarch64-virtio.qcow2 /var/lib/openqa/pool/14/raid/hd0-overlay0 1016659968' failed with exit code 1: 'qemu-img: /var/lib/openqa/pool/14/raid/hd0-overlay0: Image is not in qcow2 format
Could not open backing image.' at /usr/lib/os-autoinst/osutils.pm line 89.

https://openqa.suse.de/tests/8985743
https://openqa.suse.de/tests/8985737
https://openqa.suse.de/tests/8985736
https://openqa.suse.de/tests/8985743
https://openqa.suse.de/tests/8985742
https://openqa.suse.de/tests/8985741
https://openqa.suse.de/tests/8985740
https://openqa.suse.de/tests/8985739

Workaround

Check checksum of files on openQA web server and worker, retrigger image generation jobs as necessary


Related issues

Related to openQA Project - action #109319: [qe-core] aarch64 tests failing in qemu-img due to broken image (was: "with cache error") size:SResolved2022-03-31

History

#1 Updated by szarate about 2 months ago

  • Project changed from openQA Infrastructure to openQA Project
  • Subject changed from aarch64 - qemu-img: /var/lib/openqa/pool/14/raid/hd0-overlay0: Image is not in qcow2 format to [tools] aarch64 - qemu-img: /var/lib/openqa/pool/14/raid/hd0-overlay0: Image is not in qcow2 format

While it doesn't happen all the time, it's often that some images get mangled on the upload phase... I can't recall how the worker code handles the image upload, but adding an extra check for integrity on the webui side, wouldn't be that bad. before marking the job as passed

#2 Updated by okurz about 2 months ago

  • Related to action #109319: [qe-core] aarch64 tests failing in qemu-img due to broken image (was: "with cache error") size:S added

#3 Updated by okurz about 2 months ago

  • Description updated (diff)
  • Category set to Concrete Bugs
  • Target version set to future

Might be related to #109319. With https://github.com/os-autoinst/openQA/pull/4597 we should already be able to prevent corrupted assets being uploaded.

I checked on openqaworker-arm-2 with

find /var/lib/openqa/cache -name 'SLES-15-SP1-aarch64-mru-install-minimal-with-addons-Build20220619-1-Server-DVD-Updates-aarch64-virtio.qcow2'
file /var/lib/openqa/cache/openqa.suse.de/SLES-15-SP1-aarch64-mru-install-minimal-with-addons-Build20220619-1-Server-DVD-Updates-aarch64-virtio.qcow2
sha256sum /var/lib/openqa/cache/openqa.suse.de/SLES-15-SP1-aarch64-mru-install-minimal-with-addons-Build20220619-1-Server-DVD-Updates-aarch64-virtio.qcow2

and got

/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP1-aarch64-mru-install-minimal-with-addons-Build20220619-1-Server-DVD-Updates-aarch64-virtio.qcow2: QEMU QCOW Image (v3), 42949672960 bytes
67b3e8f3dafad6d5230160f5204afa4008ddf1914d337e73812efe2004477cd4  /var/lib/openqa/cache/openqa.suse.de/SLES-15-SP1-aarch64-mru-install-minimal-with-addons-Build20220619-1-Server-DVD-Updates-aarch64-virtio.qcow2

with the file being new since the jobs ran, Jun 20 14:20.

I think we should have checksums uploaded as assets as well and use them to check integrity for each transfer process.

punkioudi szarate if you see more cases I suggest you check the checksums before anything or anyone overwrites the files so that we can narrow down where the problem happens.

Also available in: Atom PDF