Project

General

Profile

Actions

action #105145

closed

osd-deployment pipelines fail because ContainersNotInitialized size:M

Added by livdywan almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Organisational
Target version:
Start date:
2022-01-20
Due date:
2022-02-11
% Done:

0%

Estimated time:

Description

Observation

osd-deployment failed with the same error:

Waiting for pod gitlab/runner-ydlpfvpg-project-3530-concurrent-0h2jfn to be running, status is Pending
Waiting for pod gitlab/runner-ydlpfvpg-project-3530-concurrent-0h2jfn to be running, status is Pending
    ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
    ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-ydlpfvpg-project-3530-concurrent-0h2jfn to be running, status is Pending
    ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
    ContainersNotReady: "containers with unready status: [build helper]"
[...]
Waiting for pod gitlab/runner-ydlpfvpg-project-3530-concurrent-0h2jfn to be running, status is Pending
908 ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
909 ContainersNotReady: "containers with unready status: [build helper]"
911ERROR: Job failed (system failure): prepare environment: waiting for pod running: timed out waiting for pod to start. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information

Acceptance criteria

  • AC1: osd-deployment pipeline doesn't alert about internal gitlab k8s busy loops
  • AC2: osd-deployment pipelines automatically restart on known internal errors

Suggestions


Related issues 1 (0 open1 closed)

Copied from openQA Project (public) - action #103527: osd-deployment pipelines fail and alerts are not handled size:MResolvedokurz

Actions
Actions #1

Updated by livdywan almost 3 years ago

  • Copied from action #103527: osd-deployment pipelines fail and alerts are not handled size:M added
Actions #2

Updated by mkittler almost 3 years ago

I don't agree with AC1 unless the job could somehow be restarted automatically as well. Otherwise someone in the team will have to restart it (as a reaction to the alert which is therefore still needed). See #96551#note-22 for an example.

Actions #3

Updated by livdywan almost 3 years ago

  • Description updated (diff)

mkittler wrote:

I don't agree with AC1 unless the job could somehow be restarted automatically as well. Otherwise someone in the team will have to restart it (as a reaction to the alert which is therefore still needed). See #96551#note-22 for an example.

Good point. Let's have two AC's.

Actions #4

Updated by livdywan almost 3 years ago

  • Subject changed from osd-deployment pipelines fail because ContainersNotInitialized to osd-deployment pipelines fail because ContainersNotInitialized size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #5

Updated by jbaier_cz almost 3 years ago

And now a different (similar) problem can be found in https://gitlab.suse.de/qa-maintenance/openQABot/-/jobs/804138

Unschedulable: "0/10 nodes are available: 3 node(s) were not ready, 3 node(s) were out of disk space, 7 node(s) didn't match node selector."
Actions #6

Updated by jbaier_cz almost 3 years ago

jbaier_cz wrote:

And now a different (similar) problem can be found in https://gitlab.suse.de/qa-maintenance/openQABot/-/jobs/804138

Unschedulable: "0/10 nodes are available: 3 node(s) were not ready, 3 node(s) were out of disk space, 7 node(s) didn't match node selector."

Tracked (and resolved) in SD-74280

Actions #7

Updated by livdywan almost 3 years ago

  • Status changed from Workable to In Progress
  • Assignee set to livdywan

jbaier_cz wrote:

Tracked (and resolved) in SD-74280

So this has been solved for the moment. Since we'll eventually run into it again, though, I'm still proposing the trivial use of retry as suggested:

Actions #8

Updated by openqa_review almost 3 years ago

  • Due date set to 2022-02-11

Setting due date based on mean cycle time of SUSE QE Tools

Actions #10

Updated by livdywan almost 3 years ago

No complaints. I'll assume we're good here

Actions #11

Updated by livdywan almost 3 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF