Project

General

Profile

Actions

action #99336

closed

[qe-core][migration] test fails in bootloader_zkvm - no space on openqa s390x worker auto_review:"rsync: write failed on.*/var/lib/libvirt/images/.*s390x.*No space left on device":retry

Added by leli over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2021-09-27
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

Found by Matthias:
There are 4 .img files for each instance, this was not the case before and could be the culprit here -why there are 4 diskks created for each job?

So we need figure out why this happens.

openQA test in scenario sle-15-SP4-Migration-from-SLE12-SPx-s390x-offline_sles12sp5_pscc_base_all_minimal@s390x-kvm-sle12 fails in
bootloader_zkvm

Test suite description

The base test suite is used for job templates defined in YAML documents. It has no settings of its own.

Reproducible

Fails since (at least) Build 38.1

Expected result

Last good: 36.1 (or more recent)

Further details

Always latest result in this scenario: latest

Actions #1

Updated by leli over 2 years ago

  • Subject changed from [qe-core][migration] test fails in bootloader_zkvm - no space on openqa worker to [qe-core][migration] test fails in bootloader_zkvm - no space on openqa s390x worker
Actions #2

Updated by mgriessmeier over 2 years ago

first suggestion: increase disksize to mitigate current bottleneck - I''ll take care about that on mainframe side

second thing would be to find out where those img files come from and if they are really needed

Actions #3

Updated by nicksinger over 2 years ago

  • Status changed from New to In Progress
  • Assignee set to nicksinger

Disks got resize on the Z-side. /usr/bin/rescan-scsi-bus.sh -s reports a resize from 200G -> 400G

Actions #4

Updated by vsvecova over 2 years ago

We are encountering failures in Maintenance jobs that seem very similar to the one you described, such as this one:
https://openqa.suse.de/tests/7239697#step/bootloader_start/19

Do you think it could be a similar issue?

Actions #5

Updated by nicksinger over 2 years ago

  • Status changed from In Progress to Feedback

Steps to enlarge:

  1. stop workers on related jump-host (grenache-1)
  2. umount /var/lib/libvirt/images
  3. multipath resize map 36005076307ffd3b30000000000000148
  4. fdisk /dev/mapper/36005076307ffd3b30000000000000148 (delete partition, creatae new with max size)
  5. partprobe
  6. e2fsck -f /dev/mapper/36005076307ffd3b30000000000000148-part1
  7. resize2fs /dev/mapper/36005076307ffd3b30000000000000148-part1
  8. mount /var/lib/libvirt/images
  9. start workers on grenache-1 again
Actions #6

Updated by nicksinger over 2 years ago

vsvecova wrote:

We are encountering failures in Maintenance jobs that seem very similar to the one you described, such as this one:
https://openqa.suse.de/tests/7239697#step/bootloader_start/19

Do you think it could be a similar issue?

likely. Both disks are enlarged now so you could retrigger to see if the issue still persists

Actions #7

Updated by okurz over 2 years ago

  • Subject changed from [qe-core][migration] test fails in bootloader_zkvm - no space on openqa s390x worker to [qe-core][migration] test fails in bootloader_zkvm - no space on openqa s390x worker auto_review:"rsync: write failed on.*/var/lib/libvirt/images/.*s390x.*No space left on device":retry
Actions #8

Updated by nicksinger over 2 years ago

  • Status changed from Feedback to Resolved

Checked the most recent jobs on each worker instance manually - workers seem to be able to complete jobs successfully so we can assume the extension worked.

Actions

Also available in: Atom PDF