Project

General

Profile

action #105145

osd-deployment pipelines fail because ContainersNotInitialized size:M

Added by cdywan 4 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Organisational
Target version:
Start date:
2022-01-20
Due date:
2022-02-11
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

osd-deployment failed with the same error:

Waiting for pod gitlab/runner-ydlpfvpg-project-3530-concurrent-0h2jfn to be running, status is Pending
Waiting for pod gitlab/runner-ydlpfvpg-project-3530-concurrent-0h2jfn to be running, status is Pending
    ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
    ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-ydlpfvpg-project-3530-concurrent-0h2jfn to be running, status is Pending
    ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
    ContainersNotReady: "containers with unready status: [build helper]"
[...]
Waiting for pod gitlab/runner-ydlpfvpg-project-3530-concurrent-0h2jfn to be running, status is Pending
908 ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
909 ContainersNotReady: "containers with unready status: [build helper]"
911ERROR: Job failed (system failure): prepare environment: waiting for pod running: timed out waiting for pod to start. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information

Acceptance criteria

  • AC1: osd-deployment pipeline doesn't alert about internal gitlab k8s busy loops
  • AC2: osd-deployment pipelines automatically restart on known internal errors

Suggestions


Related issues

Copied from openQA Project - action #103527: osd-deployment pipelines fail and alerts are not handled size:MResolved

History

#1 Updated by cdywan 4 months ago

  • Copied from action #103527: osd-deployment pipelines fail and alerts are not handled size:M added

#2 Updated by mkittler 4 months ago

I don't agree with AC1 unless the job could somehow be restarted automatically as well. Otherwise someone in the team will have to restart it (as a reaction to the alert which is therefore still needed). See #96551#note-22 for an example.

#3 Updated by cdywan 4 months ago

  • Description updated (diff)

mkittler wrote:

I don't agree with AC1 unless the job could somehow be restarted automatically as well. Otherwise someone in the team will have to restart it (as a reaction to the alert which is therefore still needed). See #96551#note-22 for an example.

Good point. Let's have two AC's.

#4 Updated by cdywan 4 months ago

  • Subject changed from osd-deployment pipelines fail because ContainersNotInitialized to osd-deployment pipelines fail because ContainersNotInitialized size:M
  • Description updated (diff)
  • Status changed from New to Workable

#5 Updated by jbaier_cz 4 months ago

And now a different (similar) problem can be found in https://gitlab.suse.de/qa-maintenance/openQABot/-/jobs/804138

Unschedulable: "0/10 nodes are available: 3 node(s) were not ready, 3 node(s) were out of disk space, 7 node(s) didn't match node selector."

#6 Updated by jbaier_cz 4 months ago

jbaier_cz wrote:

And now a different (similar) problem can be found in https://gitlab.suse.de/qa-maintenance/openQABot/-/jobs/804138

Unschedulable: "0/10 nodes are available: 3 node(s) were not ready, 3 node(s) were out of disk space, 7 node(s) didn't match node selector."

Tracked (and resolved) in SD-74280

#7 Updated by cdywan 4 months ago

  • Status changed from Workable to In Progress
  • Assignee set to cdywan

jbaier_cz wrote:

Tracked (and resolved) in SD-74280

So this has been solved for the moment. Since we'll eventually run into it again, though, I'm still proposing the trivial use of retry as suggested:

#8 Updated by openqa_review 4 months ago

  • Due date set to 2022-02-11

Setting due date based on mean cycle time of SUSE QE Tools

#10 Updated by cdywan 3 months ago

No complaints. I'll assume we're good here

#11 Updated by cdywan 3 months ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF