action #156226
closed
[bot-ng] Pipeline failed / failed to pulled image / no space left on device
Added by livdywan 10 months ago.
Updated 9 months ago.
Description
Observation¶
https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2325569
WARNING: Failed to pull image with policy "always": failed to register layer: open /var/cache/zypp/solv/@System/solv.idx: no space left on device (manager.go:237:16s)
ERROR: Job failed: failed to pull image "registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest" with specified policies [always]: failed to register layer: open /var/cache/zypp/solv/@System/solv.idx: no space left on device (manager.go:237:16s)
WARNING: Failed to pull image with policy "always": failed to register layer: mkdir /var/cache/zypp/solv/obs_repository: no space left on device (manager.go:237:13s)
ERROR: Job failed: failed to pull image "registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest" with specified policies [always]: failed to register layer: mkdir /var/cache/zypp/solv/obs_repository: no space left on device (manager.go:237:13s)
Suggestions¶
- DONE Restart pipelines
- DONE Report an infra SD ticket
- DONE Add retries to the pipeline
- Status changed from New to In Progress
- Assignee set to livdywan
I'll attempt a retry and see if that helps, and otherwise file an SD ticket
FYI We have access to https://sd.suse.com/servicedesk/customer/portal/1/SD-149509 now.
I don't think we can mitigate this? So for now I would see if someone picks up the ticket and check in again if that is not the case (blocking is not an option in this case)
Edit: Maybe we can add retry: on: system-failure if we don't have that yet? Depending on the failure rate. It seems like one job passed again. Let me take a look.
- Tags changed from qem-bot, reactive work to qem-bot, reactive work, infra
- Due date set to 2024-03-14
Setting due date based on mean cycle time of SUSE QE Tools
- Description updated (diff)
- Status changed from In Progress to Blocked
- Priority changed from Urgent to High
It would seem the mitigation worked, and pipelines have been passing again. The failure I still saw is handled as part of #156301. So we can block on SD-149509 now.
- Due date changed from 2024-03-14 to 2024-03-22
livdywan wrote in #note-11:
It would seem the mitigation worked, and pipelines have been passing again. The failure I still saw is handled as part of #156301. So we can block on SD-149509 now.
Asked for feedback on Slack. For now our mitigation still works.
- Due date changed from 2024-03-22 to 2024-03-29
Asked for feedback on Slack. For now our mitigation still works.
Routine check on SLO alert. Apparently there is another SD ticket, which I asked about. No movement otherwise.
- Priority changed from High to Normal
Routine check on SLO alert. Apparently there is another SD ticket, which I asked about. No movement otherwise.
Still no response. I guess we can lower prio here since we have a work-around.
- Status changed from Blocked to Resolved
I'm assuming the issue was fixed in the meanwhile, hence closing.
Also available in: Atom
PDF