action #156226: [bot-ng] Pipeline failed / failed to pulled image / no space left on device - openQA Infrastructure (public) - openSUSE Project Management Tool

Custom queries

openQA Infrastructure Project
openqa-review - Closed tickets last updated by openqa-review, last 30 days
QA roadmap long-term
QA SLE functional
QA SLE Functional - closed in last 14 days
QA SLE Functional - High, need to be refined
QA SLE Functional - over cycle time median
QA SLE u
QA SLE y
QA tools (tag not necessary in openQA and subprojects)
QA tools tag (tag not necessary in openQA and subprojects; excluding tickets in "Ready" version as they are already on the backlog)
QAC - Backlog
QE tools team - backlog (dev)
QE tools team - backlog (ready issues)
QE tools team - backlog SLA high
QE tools team - backlog SLA immediate
QE tools team - backlog SLA no immediate/urgent in feedback/blocked
QE tools team - backlog SLA normal
QE tools team - backlog SLA urgent
QE tools team - backlog SLO high
QE tools team - backlog SLO normal
QE tools team - backlog SLO urgent
QE tools team - backlog, high-level view (epics and higher)
QE tools team - backlog, non-reactive work, needs parent
QE tools team - backlog, top-level view (all sagas)
QE Tools Team - Beginner
QE tools team - closed within last 14 days
QE tools team - closed within last 60 days
QE tools team - closed yesterday
QE Tools Team - Collaborative Session
QE tools team - due date forecast
QE tools team - exceeding due-date
QE Tools Team - Expert
QE tools team - infrastructure backlog
QE tools team - next - sorted by update time
QE tools team - next issues
QE tools team - non-estimated (unblocked) issues (dev)
QE tools team - non-estimated (unblocked) issues (infra)
QE tools team - ready issues - Workable
QE tools team - ready, not assigned/blocked/low
QE tools team - SLO high forecast
QE tools team - update forecast
QE tools team - updated by priority
QE tools team - what members of the team are working on - Feedback (not-low)
QE Tools Team Backlog By Assignee
Tools Team Retrospective
Tools Team Retrospective (not estimated or assigned)

Actions

Copy link

action #156226

closed

[bot-ng] Pipeline failed / failed to pulled image / no space left on device

Added by livdywan over 1 year ago. Updated about 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

livdywan

Category:

Target version:

openQA Project (public) - Ready

Start date:

Due date:

2024-03-29

% Done:

Estimated time:

Tags:

infra, reactive work, qem-bot

Description

Observation¶

https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2325569

WARNING: Failed to pull image with policy "always": failed to register layer: open /var/cache/zypp/solv/@System/solv.idx: no space left on device (manager.go:237:16s)
ERROR: Job failed: failed to pull image "registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest" with specified policies [always]: failed to register layer: open /var/cache/zypp/solv/@System/solv.idx: no space left on device (manager.go:237:16s)

WARNING: Failed to pull image with policy "always": failed to register layer: mkdir /var/cache/zypp/solv/obs_repository: no space left on device (manager.go:237:13s)
ERROR: Job failed: failed to pull image "registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest" with specified policies [always]: failed to register layer: mkdir /var/cache/zypp/solv/obs_repository: no space left on device (manager.go:237:13s)

Suggestions¶

DONE Restart pipelines
DONE Report an infra SD ticket
DONE Add retries to the pipeline

History
Notes
Property changes

Actions

Copy link

Updated by livdywan over 1 year ago · Edited

Status changed from New to In Progress
Assignee set to livdywan

I'll attempt a retry and see if that helps, and otherwise file an SD ticket

Actions

Copy link

Updated by tinita over 1 year ago

See also https://suse.slack.com/archives/C029APBKLGK/p1709040801624089
According to that a ticket was already opened: https://sd.suse.com/servicedesk/customer/portal/1/SD-149509 but of course we can't read that

Actions

Copy link

Updated by osukup over 1 year ago

also : https://jira.suse.com/browse/ENGINFRA-4006

Actions

Copy link

Updated by mkittler over 1 year ago

A discussion about this was started in #help-it-ama yesterday (https://suse.slack.com/archives/C029APBKLGK/p1709040801624089) and a ticket was created: https://sd.suse.com/servicedesk/customer/portal/1/SD-149509

Actions

Copy link

Updated by livdywan over 1 year ago · Edited

FYI We have access to https://sd.suse.com/servicedesk/customer/portal/1/SD-149509 now.

I don't think we can mitigate this? So for now I would see if someone picks up the ticket and check in again if that is not the case (blocking is not an option in this case)

Edit: Maybe we can add retry: on: system-failure if we don't have that yet? Depending on the failure rate. It seems like one job passed again. Let me take a look.

Actions

Copy link

Updated by livdywan over 1 year ago · Edited

Edit: Maybe we can add retry: on: system-failure if we don't have that yet? Depending on the failure rate. It seems like one job passed again. Let me take a look.

~~https://gitlab.suse.de/livdywan/bot-ng/-/merge_requests/2~~

https://gitlab.suse.de/qa-maintenance/bot-ng/-/merge_requests/66 Apparently GitLab really wants me to propose into my own fork. Not helpful. The third attempt looks correct, though.

Actions

Copy link

Updated by okurz over 1 year ago

Tags changed from qem-bot, reactive work to qem-bot, reactive work, infra

Actions

Copy link

Updated by okurz over 1 year ago

https://gitlab.suse.de/qa-maintenance/bot-ng/-/merge_requests/66 merged

Actions

Copy link

Updated by openqa_review over 1 year ago

Due date set to 2024-03-14

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

#10

Updated by livdywan over 1 year ago

Description updated (diff)

Actions

Copy link

#11

Updated by livdywan over 1 year ago

Status changed from In Progress to Blocked
Priority changed from Urgent to High

It would seem the mitigation worked, and pipelines have been passing again. The failure I still saw is handled as part of #156301. So we can block on SD-149509 now.

Actions

Copy link

#12

Updated by livdywan about 1 year ago

Due date changed from 2024-03-14 to 2024-03-22

livdywan wrote in #note-11:

It would seem the mitigation worked, and pipelines have been passing again. The failure I still saw is handled as part of #156301. So we can block on SD-149509 now.

Asked for feedback on Slack. For now our mitigation still works.

Actions

Copy link

#13

Updated by livdywan about 1 year ago

Due date changed from 2024-03-22 to 2024-03-29

Asked for feedback on Slack. For now our mitigation still works.

Routine check on SLO alert. Apparently there is another SD ticket, which I asked about. No movement otherwise.

Actions

Copy link

#14

Updated by livdywan about 1 year ago

Priority changed from High to Normal

Routine check on SLO alert. Apparently there is another SD ticket, which I asked about. No movement otherwise.

Still no response. I guess we can lower prio here since we have a work-around.

Actions

Copy link

#15

Updated by livdywan about 1 year ago

Status changed from Blocked to Resolved

I'm assuming the issue was fixed in the meanwhile, hence closing.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #156226

[bot-ng] Pipeline failed / failed to pulled image / no space left on device

Observation¶

Suggestions¶

Updated by livdywan over 1 year ago · Edited

Updated by tinita over 1 year ago

Updated by osukup over 1 year ago

Updated by mkittler over 1 year ago

Updated by livdywan over 1 year ago · Edited

Updated by livdywan over 1 year ago · Edited

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by openqa_review over 1 year ago

Updated by livdywan over 1 year ago

Updated by livdywan over 1 year ago

Updated by livdywan about 1 year ago

Updated by livdywan about 1 year ago

Updated by livdywan about 1 year ago

Updated by livdywan about 1 year ago