Project

General

Profile

Actions

action #19080

closed

[s390x][zkvm] test cases fails by no space left on device to download zkvm-image

Added by JWSun almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Infrastructure
Target version:
-
Start date:
2017-05-10
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

Most of s390x test cases failed by no space left on device to download zkvm-image.

09:31:31.3702 26462 Command's stderr:
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: write failed on "/var/lib/libvirt/images/sle-12-SP1-Server-DVD-s390x-allpatterns.qcow2": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(322) [receiver=3.0.9]
rsync: connection unexpectedly closed (28 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]

openQA test in scenario sle-12-SP3-Server-DVD-POST-s390x-migration_offline_sle12sp1+lgm_allpatterns_fullupdate_s390x@zkvm-image fails in
bootloader_zkvm

Reproducible

Fails since (at least) Build 0336

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 2 (1 open1 closed)

Has duplicate openQA Tests - action #19116: test fails in bootloader_zkvmRejected2017-05-11

Actions
Precedes openQA Infrastructure - action #18608: [qe-core][tools][sle][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoringBlockedokurz

Actions
Actions #1

Updated by okurz almost 7 years ago

  • Subject changed from s390x test cases fails by no space left on device to download zkvm-image to [s390x][zkvm] test cases fails by no space left on device to download zkvm-image
  • Category changed from Bugs in existing tests to Infrastructure

I am sure that's an infrastructure issue not a problem in test.

Seems like the (current) limit on the number of migration scenarios for s390x-zkvm is reached. I suggest to talk to "ihno" about a potential increase of hard disk space on the host "s390pb" which is probably the culprit here.

Actions #2

Updated by okurz almost 7 years ago

  • Priority changed from Normal to Urgent
Actions #3

Updated by okurz almost 7 years ago

  • Has duplicate action #19116: test fails in bootloader_zkvm added
Actions #4

Updated by mgriessmeier almost 7 years ago

  • Status changed from New to Feedback
  • Assignee set to mgriessmeier

With help of gschlotter from infra team, I increased the size of the physical disk and with help of okurz we recreated the partition.

The increase was not that much, but I will monitor this to see if it's enough or if we need more
Additionally I adjusted the clean up cronjob to run more often

setting to feedback for now

Actions #5

Updated by mgriessmeier almost 7 years ago

still valid: in the longer run, we need a more proper clean up on s390pb, e.g. deleting the image after the job was run -> see related poo

Actions #6

Updated by mgriessmeier almost 7 years ago

  • Precedes action #18608: [qe-core][tools][sle][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring added
Actions #7

Updated by mgriessmeier almost 7 years ago

  • Status changed from Feedback to Resolved

100GB seems to be enough for the amount of migration jobs at the moment, so I'm closing this.
It might hit us again when more migration jobs are added

Actions #8

Updated by JWSun almost 7 years ago

  • Status changed from Resolved to New

Reopened. Found case failed again with build 0417.
https://openqa.suse.de/tests/979557#

Actions #9

Updated by okurz almost 7 years ago

  • Status changed from New to Feedback

I confirmed that the disk space was depleted, 100G have been used:

[root@s390pb ~]# ls -ltrah /var/lib/libvirt/images/
total 94G
-rw-r--r--.  1 qemu qemu 4.7G May 14  2015 sle-12-Server-DVD-x86_64-gnome.qcow2
drwxr-xr-x. 10 root root 4.0K Mar  3  2016 ..
-rw-r--r--.  1 qemu qemu 6.9G Apr  1 10:03 sle-12-SP1-Server-DVD-s390x-allpatterns.qcow2
-rw-r--r--.  1 qemu qemu 3.7G Apr  5 11:36 sle-12-SP2-Server-DVD-s390x-gnome.qcow2
-rw-r--r--.  1 qemu qemu 6.2G May  4 12:13 sle-12-SP2-Server-DVD-s390x-ha+allpatterns.qcow2
-rw-r--r--.  1 qemu qemu 7.0G May  4 14:05 SLES-12-SP1-GM-ha+geo+allpatterns-s390x.qcow2
-rw-r--r--.  1 qemu qemu 7.0G May  4 14:37 sle-12-SP1-Server-DVD-s390x-ha+allpatterns.qcow2
-rw-r--r--.  1 qemu qemu 7.9G May  4 16:27 sle-12-SP2-Server-DVD-s390x-sdk+allpatterns.qcow2
-rw-r--r--.  1 qemu qemu 4.3G May  4 16:49 sle-12-SP2-gnome-s390x-ha.qcow2
-rw-r--r--.  1 qemu qemu 7.8G May  5 13:37 sle-12-SP1-Server-DVD-s390x-ha+geo+sdk+allpatterns.qcow2
-rw-r--r--.  1 qemu qemu 5.1G May  5 13:59 SLES-12-SP1-GM-gnome-s390x-ha-geo.qcow2
-rw-r--r--.  1 qemu qemu 6.2G May  5 14:29 sle-12-SP2-Server-DVD-s390x-ha+geo+allpatterns.qcow2
drwx------.  2 root root 4.0K May 24 13:29 lost+found
-rw-r--r--.  1 qemu qemu  32M Jun  1 15:31 openQA-SUT-21.initrd
-rw-r--r--.  1 qemu qemu  11M Jun  1 15:31 openQA-SUT-21.kernel
-rw-r--r--.  1 root root  720 Jun  1 15:46 openQA-SUT-21.xml
-rw-r--r--.  1 root root 6.2G Jun  1 15:48 openQA-SUT-21a.img
-rw-r--r--.  1 qemu qemu  32M Jun  2 22:28 openQA-SUT-12.initrd
-rw-r--r--.  1 qemu qemu  11M Jun  2 22:28 openQA-SUT-12.kernel
-rw-r--r--.  1 root root  720 Jun  2 22:37 openQA-SUT-12.xml
-rw-r--r--.  1 root root 2.9G Jun  2 22:57 openQA-SUT-12a.img
-rw-r--r--.  1 qemu qemu  32M Jun  4 01:55 openQA-SUT-3.initrd
-rw-r--r--.  1 qemu qemu  11M Jun  4 01:55 openQA-SUT-3.kernel
-rw-r--r--.  1 qemu qemu  32M Jun  4 02:00 openQA-SUT-2.initrd
-rw-r--r--.  1 qemu qemu  11M Jun  4 02:00 openQA-SUT-2.kernel
-rw-r--r--.  1 root root  716 Jun  4 02:05 openQA-SUT-3.xml
-rw-r--r--.  1 root root 4.2G Jun  4 02:06 openQA-SUT-3a.img
-rw-r--r--.  1 root root  716 Jun  4 02:18 openQA-SUT-2.xml
-rw-r--r--.  1 root root 3.9G Jun  4 02:20 openQA-SUT-2a.img
-rw-r--r--.  1 qemu qemu 4.2G Jun  5 00:58 SLES-12-SP3-s390x-Build0409-gnome.qcow2
-rw-r--r--.  1 qemu qemu  32M Jun  5 05:35 openQA-SUT-1.initrd
-rw-r--r--.  1 qemu qemu  11M Jun  5 05:35 openQA-SUT-1.kernel
drwxr-xr-x.  3 root root 4.0K Jun  5 07:38 .
-rw-r--r--.  1 root root  716 Jun  5 07:38 openQA-SUT-1.xml
-rw-r--r--.  1 qemu qemu 5.4G Jun  5 07:55 openQA-SUT-1a.img

I executed /usr/local/bin/cleanup-openqa-assets manually now and we are down to 31G:

[root@s390pb ~]# ls -ltrah /var/lib/libvirt/images/
total 31G
drwxr-xr-x. 10 root root 4.0K Mar  3  2016 ..
-rw-r--r--.  1 qemu qemu 7.9G May  4 16:27 sle-12-SP2-Server-DVD-s390x-sdk+allpatterns.qcow2
drwx------.  2 root root 4.0K May 24 13:29 lost+found
-rw-r--r--.  1 qemu qemu  32M Jun  1 15:31 openQA-SUT-21.initrd
-rw-r--r--.  1 qemu qemu  11M Jun  1 15:31 openQA-SUT-21.kernel
-rw-r--r--.  1 root root  720 Jun  1 15:46 openQA-SUT-21.xml
-rw-r--r--.  1 root root 6.2G Jun  1 15:48 openQA-SUT-21a.img
-rw-r--r--.  1 qemu qemu  32M Jun  2 22:28 openQA-SUT-12.initrd
-rw-r--r--.  1 qemu qemu  11M Jun  2 22:28 openQA-SUT-12.kernel
-rw-r--r--.  1 root root  720 Jun  2 22:37 openQA-SUT-12.xml
-rw-r--r--.  1 root root 2.9G Jun  2 22:57 openQA-SUT-12a.img
-rw-r--r--.  1 qemu qemu  32M Jun  4 01:55 openQA-SUT-3.initrd
-rw-r--r--.  1 qemu qemu  11M Jun  4 01:55 openQA-SUT-3.kernel
-rw-r--r--.  1 qemu qemu  32M Jun  4 02:00 openQA-SUT-2.initrd
-rw-r--r--.  1 qemu qemu  11M Jun  4 02:00 openQA-SUT-2.kernel
-rw-r--r--.  1 root root  716 Jun  4 02:05 openQA-SUT-3.xml
-rw-r--r--.  1 root root 4.2G Jun  4 02:06 openQA-SUT-3a.img
-rw-r--r--.  1 root root  716 Jun  4 02:18 openQA-SUT-2.xml
-rw-r--r--.  1 root root 3.9G Jun  4 02:20 openQA-SUT-2a.img
-rw-r--r--.  1 qemu qemu  32M Jun  5 05:35 openQA-SUT-1.initrd
-rw-r--r--.  1 qemu qemu  11M Jun  5 05:35 openQA-SUT-1.kernel
-rw-r--r--.  1 root root  716 Jun  5 07:38 openQA-SUT-1.xml
-rw-r--r--.  1 qemu qemu 5.4G Jun  5 07:55 openQA-SUT-1a.img
drwxr-xr-x.  3 root root 4.0K Jun  5 09:06 .

@mgriessmeier as you registered as an email recipient for the cleanup cron job: Did you receive an email from previous runs of the cleanup script? I wonder why it deleted so much now. Do we need to call this more often?

Actions #10

Updated by mgriessmeier almost 7 years ago

okurz wrote:

@mgriessmeier as you registered as an email recipient for the cleanup cron job: Did you receive an email from previous runs of the cleanup script? I wonder why it deleted so much now. Do we need to call this more often?

Hmm... actually I did not receive an email - valid point
I already set it to run more often (every 2 hours) but it still checks if an asset is in use or not before deleting it
so my assumption would be, that we use too many assets in parallel right now, which are preventing the cron job to delete it...

Actions #11

Updated by mgriessmeier almost 7 years ago

  • Status changed from Feedback to Resolved

it's good for now, reopen if happens again

Actions

Also available in: Atom PDF