action #162062
openNo space left on device causing GitLab pipelines to fail
0%
Description
Observation¶
Various pipelines fail either running the container or during the script because there is no space left.
https://gitlab.suse.de/openqa/openqa-review/-/jobs/2706267
WARNING: Failed to pull image with policy "always": write /var/lib/docker/tmp/GetImageBlob2814730492: no space left on device (manager.go:250:0s)
ERROR: Job failed: failed to pull image "registry.opensuse.org/home/okurz/container/ca/containers/tumbleweed:openqa-review" with specified policies [always]: write /var/lib/docker/tmp/GetImageBlob2814730492: no space left on device (manager.go:250:0s)
https://gitlab.suse.de/openqa/os-autoinst-needles-opensuse-mirror/-/jobs/2706665
- Download (curl) error for 'http://download.opensuse.org/distribution/leap/15.5/repo/oss/repodata/d0aae74c050dca8d30fbccd949a136d8ed209eccf8fdf435ac8c1d739271d8e7-appdata.xml.gz':
Error code: Write error
Error message: Failure writing output to destination
[...]
Can't create metadata cache directory.
https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2706694
$ rpm --query qam-metadata-openqabot
rpmxdbOpen: No space left on device
error: cannot open Name index using unknown db - Operation not permitted (1)
rpmxdbOpen: No space left on device
error: cannot open Name index using unknown db - Operation not permitted (1)
package qam-metadata-openqabot is not installed
https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2706024
ERROR: Preparation failed: adding cache volume: set volume permissions: create permission container for volume "runner-25ldi6vv-project-11950-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70": Error response from daemon: mkdir /var/lib/docker/overlay2/3efbcea02b9b1d07ff7de63478a83f5fcd6dd3e36949db2b1766d2ac77c5f8ec-init: no space left on device (linux_set.go:95:0s)
Acceptance criteria¶
- AC1:
Suggestions¶
- DONE File an infra SD ticket to get the GitLab runners checked
Updated by livdywan 5 months ago · Edited
- Priority changed from Urgent to High
Debugging with Steven Mallindine. Pipelines running for now.
Updated by livdywan 4 months ago
- Status changed from Resolved to Blocked
And apparently the issue has come back:
Pulling docker image registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest ...
WARNING: Failed to pull image with policy "always": failed to register layer: open /usr/lib/python3.11/site-packages/ansible_collections/junipernetworks/junos/plugins/modules/junos_ospfv2.py: no space left on device (manager.go:250:17s)
ERROR: Job failed: failed to pull image "registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest" with specified policies [always]: failed to register layer: open /usr/lib/python3.11/site-packages/ansible_collections/junipernetworks/junos/plugins/modules/junos_ospfv2.py: no space left on device (manager.go:250:17s)
I filed SD-162694
Updated by livdywan 3 months ago · Edited
- Status changed from Blocked to In Progress
Apparently pipelines are failing again:
https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2998807
Will be retried in 3s ...
ERROR: Job failed (system failure): adding cache volume: set volume permissions: create permission container for volume "runner-25ldi6vv-project-6096-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70": Error response from daemon: mkdir /var/lib/docker/overlay2/ae4a05093eca6240b6d28fbf9ad94e8355082166bf93a6c3c6562cc0554f89c4-init: no space left on device (linux_set.go:95:0s)
https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2998804
Will be retried in 3s ...
ERROR: Job failed (system failure): adding cache volume: set volume permissions: create permission container for volume "runner-25ldi6vv-project-11950-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70": Error response from daemon: mkdir /var/lib/docker/overlay2/43ea0d4662f8f8842379af0cf5cc9390f9ae58ad76fb6c062a78fa2697dd5570-init: no space left on device (linux_set.go:95:0s)
https://gitlab.suse.de/openqa/os-autoinst-needles-sles/-/jobs/2998789
Fetching changes with git depth set to 3...
Initialized empty Git repository in /builds/openqa/os-autoinst-needles-sles/.git/
Created fresh repository.
fatal: write error: No space left on device
fatal: fetch-pack: invalid index-pack output
Updated by livdywan 3 months ago · Edited
- Status changed from In Progress to Feedback
I added another comment on SD-162694. In the meanwhile it seems like the latest pipelines for bot-bg and Scripts CI did pass, so I'm not applying any mitigations for now.
I don't know how openqa-pusher works. It's not a regular schedule. So I am just retrying the job.
Updated by livdywan 3 months ago · Edited
tinita wrote in #note-12:
livdywan wrote in #note-11:
?
That was supposed to be SD-162694.
And quoting from the Slack conversation:
There was some issues this morning with some large jobs running that killed the disk space (as we have looked at in the past)...
If you remember, the system used to clear the cache nightly (which wasnt working, then we fixed), then we switched to hourly. This still wasnt enough for the jobs that ran earlier this morning.....
I have since disabled the docker caching on gitlab-worker1, and keeping an eye on it to see it the issue resolves....
Updated by livdywan about 2 months ago
Happening again: https://gitlab.suse.de/openqa/osd-deployment/-/jobs/3134725
Commented on SD-162694