Project

General

Profile

action #99336

[qe-core][migration] test fails in bootloader_zkvm - no space on openqa s390x worker auto_review:"rsync: write failed on.*/var/lib/libvirt/images/.*s390x.*No space left on device":retry

Added by leli 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2021-09-27
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

Found by Matthias:
There are 4 .img files for each instance, this was not the case before and could be the culprit here -why there are 4 diskks created for each job?

So we need figure out why this happens.

openQA test in scenario sle-15-SP4-Migration-from-SLE12-SPx-s390x-offline_sles12sp5_pscc_base_all_minimal@s390x-kvm-sle12 fails in
bootloader_zkvm

Test suite description

The base test suite is used for job templates defined in YAML documents. It has no settings of its own.

Reproducible

Fails since (at least) Build 38.1

Expected result

Last good: 36.1 (or more recent)

Further details

Always latest result in this scenario: latest

History

#1 Updated by leli 2 months ago

  • Subject changed from [qe-core][migration] test fails in bootloader_zkvm - no space on openqa worker to [qe-core][migration] test fails in bootloader_zkvm - no space on openqa s390x worker

#2 Updated by mgriessmeier 2 months ago

first suggestion: increase disksize to mitigate current bottleneck - I''ll take care about that on mainframe side

second thing would be to find out where those img files come from and if they are really needed

#3 Updated by nicksinger 2 months ago

  • Status changed from New to In Progress
  • Assignee set to nicksinger

Disks got resize on the Z-side. /usr/bin/rescan-scsi-bus.sh -s reports a resize from 200G -> 400G

#4 Updated by vsvecova 2 months ago

We are encountering failures in Maintenance jobs that seem very similar to the one you described, such as this one:
https://openqa.suse.de/tests/7239697#step/bootloader_start/19

Do you think it could be a similar issue?

#5 Updated by nicksinger 2 months ago

  • Status changed from In Progress to Feedback

Steps to enlarge:

  1. stop workers on related jump-host (grenache-1)
  2. umount /var/lib/libvirt/images
  3. multipath resize map 36005076307ffd3b30000000000000148
  4. fdisk /dev/mapper/36005076307ffd3b30000000000000148 (delete partition, creatae new with max size)
  5. partprobe
  6. e2fsck -f /dev/mapper/36005076307ffd3b30000000000000148-part1
  7. resize2fs /dev/mapper/36005076307ffd3b30000000000000148-part1
  8. mount /var/lib/libvirt/images
  9. start workers on grenache-1 again

#6 Updated by nicksinger 2 months ago

vsvecova wrote:

We are encountering failures in Maintenance jobs that seem very similar to the one you described, such as this one:
https://openqa.suse.de/tests/7239697#step/bootloader_start/19

Do you think it could be a similar issue?

likely. Both disks are enlarged now so you could retrigger to see if the issue still persists

#7 Updated by okurz 2 months ago

  • Subject changed from [qe-core][migration] test fails in bootloader_zkvm - no space on openqa s390x worker to [qe-core][migration] test fails in bootloader_zkvm - no space on openqa s390x worker auto_review:"rsync: write failed on.*/var/lib/libvirt/images/.*s390x.*No space left on device":retry

#8 Updated by nicksinger 2 months ago

  • Status changed from Feedback to Resolved

Checked the most recent jobs on each worker instance manually - workers seem to be able to complete jobs successfully so we can assume the extension worked.

Also available in: Atom PDF