Project

General

Profile

Actions

action #103527

closed

osd-deployment pipelines fail and alerts are not handled size:M

Added by livdywan over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Organisational
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

osd-deployment failed this morning. A re-trigger produced the same error:

Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
10  ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
11  ContainersNotReady: "containers with unready status: [build helper]"
12Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
13  ContainersNotReady: "containers with unready status: [build helper]"
14Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...

Acceptance criteria

  • AC1: osd-deployment pipeline is run without errors

Suggestions

  • Re-trigger and see if it works
  • Investigate what ContainersNotReady means
  • File an infra ticket

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #103539: Update expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:MResolvednicksinger2021-12-06

Actions
Copied to openQA Project - action #105145: osd-deployment pipelines fail because ContainersNotInitialized size:MResolvedlivdywan2022-01-202022-02-11

Actions
Actions #1

Updated by okurz over 2 years ago

  • Status changed from New to In Progress
  • Assignee set to okurz

Observation

osd-deployment failed this morning. A re-trigger produced the same error:

Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
        ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
        ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
        ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...
[...]
Executing "step_script" stage of the job script
$ eval "$GRAFANA_ALERTS" > current_alerts
Cleaning up project directory and file based variables

Acceptance criteria

  • AC1: osd-deployment pipeline is run without errors
  • AC2: error is visible from the pipeline

Suggestions

  • Re-trigger and see if it works Re-trigger reproduces the same error within seconds
  • Investigate what ContainersNotReady means
  • File an infra ticket
  • Improve the pipeline to show what $GRAFANA_ALERTS is and how it fails
Actions #2

Updated by okurz over 2 years ago

  • Status changed from In Progress to Feedback
Actions #3

Updated by okurz over 2 years ago

  • Related to action #103539: Update expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:M added
Actions #4

Updated by okurz over 2 years ago

  • Status changed from Feedback to Blocked

MR merged. New deployment triggered which fails quite obviously in https://gitlab.suse.de/openqa/osd-deployment/-/jobs/732508#L46 with

$ eval "$GRAFANA_ALERTS" > current_alerts
curl: (60) SSL certificate problem: certificate has expired
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: command terminated with exit code 1

so, I assume all good regarding the improvement of the error message? I will keep the ticket in "Blocked" to await resolution to the certificate problem in #103539 and then deployment on OSD.

Actions #5

Updated by livdywan over 2 years ago

okurz wrote:

so, I assume all good regarding the improvement of the error message? I will keep the ticket in "Blocked" to await resolution to the certificate problem in #103539 and then deployment on OSD.

Looks good to me.

cdywan wrote:

Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
        ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
        ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
        ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...

Upstream issue: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27367#note_701735040

Actions #6

Updated by okurz over 2 years ago

  • Status changed from Blocked to Resolved

We have improved the error reporting and certificates have been fixed in #103539 with monitoring and alerts

Actions #7

Updated by livdywan over 2 years ago

  • Copied to action #105145: osd-deployment pipelines fail because ContainersNotInitialized size:M added
Actions

Also available in: Atom PDF