action #19080
closed[s390x][zkvm] test cases fails by no space left on device to download zkvm-image
0%
Description
Observation¶
Most of s390x test cases failed by no space left on device to download zkvm-image.
09:31:31.3702 26462 Command's stderr:
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: write failed on "/var/lib/libvirt/images/sle-12-SP1-Server-DVD-s390x-allpatterns.qcow2": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(322) [receiver=3.0.9]
rsync: connection unexpectedly closed (28 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]
openQA test in scenario sle-12-SP3-Server-DVD-POST-s390x-migration_offline_sle12sp1+lgm_allpatterns_fullupdate_s390x@zkvm-image fails in
bootloader_zkvm
Reproducible¶
Fails since (at least) Build 0336
Expected result¶
Last good: (unknown) (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by okurz almost 8 years ago
- Subject changed from s390x test cases fails by no space left on device to download zkvm-image to [s390x][zkvm] test cases fails by no space left on device to download zkvm-image
- Category changed from Bugs in existing tests to Infrastructure
I am sure that's an infrastructure issue not a problem in test.
Seems like the (current) limit on the number of migration scenarios for s390x-zkvm is reached. I suggest to talk to "ihno" about a potential increase of hard disk space on the host "s390pb" which is probably the culprit here.
Updated by okurz almost 8 years ago
- Has duplicate action #19116: test fails in bootloader_zkvm added
Updated by mgriessmeier almost 8 years ago
- Status changed from New to Feedback
- Assignee set to mgriessmeier
With help of gschlotter from infra team, I increased the size of the physical disk and with help of okurz we recreated the partition.
The increase was not that much, but I will monitor this to see if it's enough or if we need more
Additionally I adjusted the clean up cronjob to run more often
setting to feedback for now
Updated by mgriessmeier almost 8 years ago
still valid: in the longer run, we need a more proper clean up on s390pb, e.g. deleting the image after the job was run -> see related poo
Updated by mgriessmeier almost 8 years ago
- Precedes action #18608: [tools][sle][functional][research][medium] Implement proper clean up for images on jump hosts, e.g. s390pb, hyperv host, svirt and a proper monitoring added
Updated by mgriessmeier almost 8 years ago
- Status changed from Feedback to Resolved
100GB seems to be enough for the amount of migration jobs at the moment, so I'm closing this.
It might hit us again when more migration jobs are added
Updated by JWSun over 7 years ago
- Status changed from Resolved to New
Reopened. Found case failed again with build 0417.
https://openqa.suse.de/tests/979557#
Updated by okurz over 7 years ago
- Status changed from New to Feedback
I confirmed that the disk space was depleted, 100G have been used:
[root@s390pb ~]# ls -ltrah /var/lib/libvirt/images/
total 94G
-rw-r--r--. 1 qemu qemu 4.7G May 14 2015 sle-12-Server-DVD-x86_64-gnome.qcow2
drwxr-xr-x. 10 root root 4.0K Mar 3 2016 ..
-rw-r--r--. 1 qemu qemu 6.9G Apr 1 10:03 sle-12-SP1-Server-DVD-s390x-allpatterns.qcow2
-rw-r--r--. 1 qemu qemu 3.7G Apr 5 11:36 sle-12-SP2-Server-DVD-s390x-gnome.qcow2
-rw-r--r--. 1 qemu qemu 6.2G May 4 12:13 sle-12-SP2-Server-DVD-s390x-ha+allpatterns.qcow2
-rw-r--r--. 1 qemu qemu 7.0G May 4 14:05 SLES-12-SP1-GM-ha+geo+allpatterns-s390x.qcow2
-rw-r--r--. 1 qemu qemu 7.0G May 4 14:37 sle-12-SP1-Server-DVD-s390x-ha+allpatterns.qcow2
-rw-r--r--. 1 qemu qemu 7.9G May 4 16:27 sle-12-SP2-Server-DVD-s390x-sdk+allpatterns.qcow2
-rw-r--r--. 1 qemu qemu 4.3G May 4 16:49 sle-12-SP2-gnome-s390x-ha.qcow2
-rw-r--r--. 1 qemu qemu 7.8G May 5 13:37 sle-12-SP1-Server-DVD-s390x-ha+geo+sdk+allpatterns.qcow2
-rw-r--r--. 1 qemu qemu 5.1G May 5 13:59 SLES-12-SP1-GM-gnome-s390x-ha-geo.qcow2
-rw-r--r--. 1 qemu qemu 6.2G May 5 14:29 sle-12-SP2-Server-DVD-s390x-ha+geo+allpatterns.qcow2
drwx------. 2 root root 4.0K May 24 13:29 lost+found
-rw-r--r--. 1 qemu qemu 32M Jun 1 15:31 openQA-SUT-21.initrd
-rw-r--r--. 1 qemu qemu 11M Jun 1 15:31 openQA-SUT-21.kernel
-rw-r--r--. 1 root root 720 Jun 1 15:46 openQA-SUT-21.xml
-rw-r--r--. 1 root root 6.2G Jun 1 15:48 openQA-SUT-21a.img
-rw-r--r--. 1 qemu qemu 32M Jun 2 22:28 openQA-SUT-12.initrd
-rw-r--r--. 1 qemu qemu 11M Jun 2 22:28 openQA-SUT-12.kernel
-rw-r--r--. 1 root root 720 Jun 2 22:37 openQA-SUT-12.xml
-rw-r--r--. 1 root root 2.9G Jun 2 22:57 openQA-SUT-12a.img
-rw-r--r--. 1 qemu qemu 32M Jun 4 01:55 openQA-SUT-3.initrd
-rw-r--r--. 1 qemu qemu 11M Jun 4 01:55 openQA-SUT-3.kernel
-rw-r--r--. 1 qemu qemu 32M Jun 4 02:00 openQA-SUT-2.initrd
-rw-r--r--. 1 qemu qemu 11M Jun 4 02:00 openQA-SUT-2.kernel
-rw-r--r--. 1 root root 716 Jun 4 02:05 openQA-SUT-3.xml
-rw-r--r--. 1 root root 4.2G Jun 4 02:06 openQA-SUT-3a.img
-rw-r--r--. 1 root root 716 Jun 4 02:18 openQA-SUT-2.xml
-rw-r--r--. 1 root root 3.9G Jun 4 02:20 openQA-SUT-2a.img
-rw-r--r--. 1 qemu qemu 4.2G Jun 5 00:58 SLES-12-SP3-s390x-Build0409-gnome.qcow2
-rw-r--r--. 1 qemu qemu 32M Jun 5 05:35 openQA-SUT-1.initrd
-rw-r--r--. 1 qemu qemu 11M Jun 5 05:35 openQA-SUT-1.kernel
drwxr-xr-x. 3 root root 4.0K Jun 5 07:38 .
-rw-r--r--. 1 root root 716 Jun 5 07:38 openQA-SUT-1.xml
-rw-r--r--. 1 qemu qemu 5.4G Jun 5 07:55 openQA-SUT-1a.img
I executed /usr/local/bin/cleanup-openqa-assets
manually now and we are down to 31G:
[root@s390pb ~]# ls -ltrah /var/lib/libvirt/images/
total 31G
drwxr-xr-x. 10 root root 4.0K Mar 3 2016 ..
-rw-r--r--. 1 qemu qemu 7.9G May 4 16:27 sle-12-SP2-Server-DVD-s390x-sdk+allpatterns.qcow2
drwx------. 2 root root 4.0K May 24 13:29 lost+found
-rw-r--r--. 1 qemu qemu 32M Jun 1 15:31 openQA-SUT-21.initrd
-rw-r--r--. 1 qemu qemu 11M Jun 1 15:31 openQA-SUT-21.kernel
-rw-r--r--. 1 root root 720 Jun 1 15:46 openQA-SUT-21.xml
-rw-r--r--. 1 root root 6.2G Jun 1 15:48 openQA-SUT-21a.img
-rw-r--r--. 1 qemu qemu 32M Jun 2 22:28 openQA-SUT-12.initrd
-rw-r--r--. 1 qemu qemu 11M Jun 2 22:28 openQA-SUT-12.kernel
-rw-r--r--. 1 root root 720 Jun 2 22:37 openQA-SUT-12.xml
-rw-r--r--. 1 root root 2.9G Jun 2 22:57 openQA-SUT-12a.img
-rw-r--r--. 1 qemu qemu 32M Jun 4 01:55 openQA-SUT-3.initrd
-rw-r--r--. 1 qemu qemu 11M Jun 4 01:55 openQA-SUT-3.kernel
-rw-r--r--. 1 qemu qemu 32M Jun 4 02:00 openQA-SUT-2.initrd
-rw-r--r--. 1 qemu qemu 11M Jun 4 02:00 openQA-SUT-2.kernel
-rw-r--r--. 1 root root 716 Jun 4 02:05 openQA-SUT-3.xml
-rw-r--r--. 1 root root 4.2G Jun 4 02:06 openQA-SUT-3a.img
-rw-r--r--. 1 root root 716 Jun 4 02:18 openQA-SUT-2.xml
-rw-r--r--. 1 root root 3.9G Jun 4 02:20 openQA-SUT-2a.img
-rw-r--r--. 1 qemu qemu 32M Jun 5 05:35 openQA-SUT-1.initrd
-rw-r--r--. 1 qemu qemu 11M Jun 5 05:35 openQA-SUT-1.kernel
-rw-r--r--. 1 root root 716 Jun 5 07:38 openQA-SUT-1.xml
-rw-r--r--. 1 qemu qemu 5.4G Jun 5 07:55 openQA-SUT-1a.img
drwxr-xr-x. 3 root root 4.0K Jun 5 09:06 .
@mgriessmeier as you registered as an email recipient for the cleanup cron job: Did you receive an email from previous runs of the cleanup script? I wonder why it deleted so much now. Do we need to call this more often?
Updated by mgriessmeier over 7 years ago
okurz wrote:
@mgriessmeier as you registered as an email recipient for the cleanup cron job: Did you receive an email from previous runs of the cleanup script? I wonder why it deleted so much now. Do we need to call this more often?
Hmm... actually I did not receive an email - valid point
I already set it to run more often (every 2 hours) but it still checks if an asset is in use or not before deleting it
so my assumption would be, that we use too many assets in parallel right now, which are preventing the cron job to delete it...
Updated by mgriessmeier over 7 years ago
- Status changed from Feedback to Resolved
it's good for now, reopen if happens again