action #96827: gitlab CI pipeline failed with Job failed: pod status is Failed size:S - QA (public) - openSUSE Project Management Tool

Actions

Copy link

action #96827

closed

gitlab CI pipeline failed with Job failed: pod status is Failed size:S

Added by livdywan over 3 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

livdywan

Target version:

openQA Project (public) - Ready

Start date:

2021-08-13

Due date:

% Done:

Estimated time:

Description

Observation¶

I saw this at least twice in different pipelines (salt-states-openqa and openqa/openqa-review):

Cleaning up file based variables
00:00
ERROR: Job failed: pod "runner-25ldi6vv-project-5909-concurrent-0w2q6j" status is "Failed"
test-worker
Duration: 5 minutes 11 seconds
Timeout: 1h (from project)

Runner: #462 (25Ldi6Vv) gitlab-worker2:sle15.1
Commit cb53da6b  in !548
Increase {flush_,}interval and jitter for web UI

 Pipeline #185243 for telegraf_webui_intervals
test-worker
test-general
test-general-test
test-monitor
test-webui

Affected jobs so far passed after retry.

This will likely happen again because we run more in gitlab CI pipelines.

Just from today: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/557134

Suggestions¶

These problems are all outside the normal flow of what we can control from our side. But what we should be able to do is create EngInfra ticket and escalate the problem.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by livdywan over 3 years ago

Another pipeline hit this just now (https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/551)

Actions

Copy link

Updated by jbaier_cz over 3 years ago

Seems to me like a bug in GitLab Kubernetes executor or a missconfiguration, in both cases nothing we can do anything with apart of filling a ticket for EngInfra.

Actions

Copy link

Updated by jbaier_cz over 3 years ago

I will add similar error from other project: https://gitlab.suse.de/qa-maintenance/openQABot/-/jobs/536484

Actions

Copy link

Updated by jbaier_cz over 3 years ago

And I got probably another occurrence here: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/546059

In this case, the job actually did succeed, I suspect there is an issue during artifact uploading that causes the job to fail.

Actions

Copy link

Updated by okurz over 3 years ago

Subject changed from gitlab CI pipeline failed with Job failed: pod status is Failed to gitlab CI pipeline failed with Job failed: pod status is Failed size:S
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by jbaier_cz over 3 years ago

Assignee set to jbaier_cz

Actions

Copy link

Updated by jbaier_cz over 3 years ago

Status changed from Workable to Resolved

Last week, there was an update to Gitlab 14.2 (which also included an update for the gitlab-runner). There are no new failures since last Thu in our jobs (which still runs pretty frequently), so we can hope for the best and claim this issue as solved. We might reopen this ticket if any such error reappears.

Actions

Copy link

Updated by tinita over 3 years ago

Related to action #99411: openqa-review report openqa_suse_de_status.html missing from https://openqa.io.suse.de/openqa-review/openqa_suse_de_status.html, page is 404 added

Actions

Copy link

Updated by tinita over 3 years ago

Related to deleted (action #99411: openqa-review report openqa_suse_de_status.html missing from https://openqa.io.suse.de/openqa-review/openqa_suse_de_status.html, page is 404)

Actions

Copy link

#10

Updated by tinita over 3 years ago

Has duplicate action #99411: openqa-review report openqa_suse_de_status.html missing from https://openqa.io.suse.de/openqa-review/openqa_suse_de_status.html, page is 404 added

Actions

Copy link

#11

Updated by okurz over 3 years ago

Status changed from Resolved to Feedback

So, looks like not solved, See #99411

Actions

Copy link

#12

Updated by livdywan over 3 years ago

Status changed from Feedback to Blocked
Assignee changed from jbaier_cz to livdywan

Seems like it happened again:

.
Cleaning up project directory and file based variables
ERROR: Job failed: pod "runner-jzszgdx-project-4884-concurrent-04kj77" status is "Failed"

I filed #SD-62248.

Actions

Copy link

#13

Updated by livdywan over 3 years ago

Another instance last night. Followed up on a response from infra saying this is due to disk space shortage on the container host which causes runners to be "evicted". I pointed out that manual intervention is not good enough because of alert fatigue.

Actions

Copy link

#14

Updated by livdywan over 3 years ago

Status changed from Blocked to Feedback

So SD-62248 got closed, and the situations seems under control at this point. So I'm thinking we can resolve it. There's ENGINFRA-705 on Jira for a follow-up.

Actions

Copy link

#15

Updated by livdywan over 3 years ago

Status changed from Feedback to Resolved

cdywan wrote:

So SD-62248 got closed, and the situations seems under control at this point. So I'm thinking we can resolve it. There's ENGINFRA-705 on Jira for a follow-up.

Mentioned in the daily. Closing since from our side the problem's been resolved and not observed for a couple months.

Actions

Copy link

#16

Updated by livdywan over 3 years ago

Copied to action #103762: gitlab CI pipeline failed with Error cleaning up pod: Delete ... connect: connection refused Job failed (system failure): prepare environment: waiting for pod running ... i/o timeout. Check ... for more information added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public)

Tags

Custom queries

action #96827

gitlab CI pipeline failed with Job failed: pod status is Failed size:S

Observation¶

Suggestions¶

Updated by livdywan over 3 years ago

Updated by jbaier_cz over 3 years ago

Updated by jbaier_cz over 3 years ago

Updated by jbaier_cz over 3 years ago

Updated by okurz over 3 years ago

Updated by jbaier_cz over 3 years ago

Updated by jbaier_cz over 3 years ago

Updated by tinita over 3 years ago

Updated by tinita over 3 years ago

Updated by tinita over 3 years ago

Updated by okurz over 3 years ago

Updated by livdywan over 3 years ago

Updated by livdywan over 3 years ago

Updated by livdywan over 3 years ago

Updated by livdywan over 3 years ago

Updated by livdywan over 3 years ago