Project

General

Profile

action #103527

osd-deployment pipelines fail and alerts are not handled size:M

Added by cdywan about 2 months ago. Updated 10 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Organisational
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

osd-deployment failed this morning. A re-trigger produced the same error:

Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
10  ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
11  ContainersNotReady: "containers with unready status: [build helper]"
12Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
13  ContainersNotReady: "containers with unready status: [build helper]"
14Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...

Acceptance criteria

  • AC1: osd-deployment pipeline is run without errors

Suggestions

  • Re-trigger and see if it works
  • Investigate what ContainersNotReady means
  • File an infra ticket

Related issues

Related to openQA Infrastructure - action #103539: Update expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:MResolved2021-12-06

Copied to openQA Project - action #105145: osd-deployment pipelines fail because ContainersNotInitialized size:MIn Progress2022-01-202022-02-11

History

#1 Updated by okurz about 2 months ago

  • Status changed from New to In Progress
  • Assignee set to okurz

Observation

osd-deployment failed this morning. A re-trigger produced the same error:

Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
        ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
        ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
        ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...
[...]
Executing "step_script" stage of the job script
$ eval "$GRAFANA_ALERTS" > current_alerts
Cleaning up project directory and file based variables

Acceptance criteria

  • AC1: osd-deployment pipeline is run without errors
  • AC2: error is visible from the pipeline

Suggestions

  • Re-trigger and see if it works Re-trigger reproduces the same error within seconds
  • Investigate what ContainersNotReady means
  • File an infra ticket
  • Improve the pipeline to show what $GRAFANA_ALERTS is and how it fails

#2 Updated by okurz about 2 months ago

  • Status changed from In Progress to Feedback

#3 Updated by okurz about 2 months ago

  • Related to action #103539: Update expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:M added

#4 Updated by okurz about 2 months ago

  • Status changed from Feedback to Blocked

MR merged. New deployment triggered which fails quite obviously in https://gitlab.suse.de/openqa/osd-deployment/-/jobs/732508#L46 with

$ eval "$GRAFANA_ALERTS" > current_alerts
curl: (60) SSL certificate problem: certificate has expired
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: command terminated with exit code 1

so, I assume all good regarding the improvement of the error message? I will keep the ticket in "Blocked" to await resolution to the certificate problem in #103539 and then deployment on OSD.

#5 Updated by cdywan about 2 months ago

okurz wrote:

so, I assume all good regarding the improvement of the error message? I will keep the ticket in "Blocked" to await resolution to the certificate problem in #103539 and then deployment on OSD.

Looks good to me.

cdywan wrote:

Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
        ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
        ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
        ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...

Upstream issue: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27367#note_701735040

#6 Updated by okurz 10 days ago

  • Status changed from Blocked to Resolved

We have improved the error reporting and certificates have been fixed in #103539 with monitoring and alerts

#7 Updated by cdywan 9 days ago

  • Copied to action #105145: osd-deployment pipelines fail because ContainersNotInitialized size:M added

Also available in: Atom PDF