action #103527
closedosd-deployment pipelines fail and alerts are not handled size:M
Description
Observation¶
osd-deployment failed this morning. A re-trigger produced the same error:
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
10 ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
11 ContainersNotReady: "containers with unready status: [build helper]"
12Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
13 ContainersNotReady: "containers with unready status: [build helper]"
14Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...
Acceptance criteria¶
- AC1: osd-deployment pipeline is run without errors
Suggestions¶
Re-trigger and see if it works- Investigate what
ContainersNotReady
means - File an infra ticket
Updated by okurz almost 3 years ago
- Status changed from New to In Progress
- Assignee set to okurz
Observation¶
osd-deployment failed this morning. A re-trigger produced the same error:
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...
[...]
Executing "step_script" stage of the job script
$ eval "$GRAFANA_ALERTS" > current_alerts
Cleaning up project directory and file based variables
Acceptance criteria¶
- AC1: osd-deployment pipeline is run without errors
- AC2: error is visible from the pipeline
Suggestions¶
Re-trigger and see if it worksRe-trigger reproduces the same error within seconds- Investigate what
ContainersNotReady
means - File an infra ticket
- Improve the pipeline to show what $GRAFANA_ALERTS is and how it fails
Updated by okurz almost 3 years ago
- Status changed from In Progress to Feedback
Updated by okurz almost 3 years ago
- Related to action #103539: Update expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:M added
Updated by okurz almost 3 years ago
- Status changed from Feedback to Blocked
MR merged. New deployment triggered which fails quite obviously in https://gitlab.suse.de/openqa/osd-deployment/-/jobs/732508#L46 with
$ eval "$GRAFANA_ALERTS" > current_alerts
curl: (60) SSL certificate problem: certificate has expired
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: command terminated with exit code 1
so, I assume all good regarding the improvement of the error message? I will keep the ticket in "Blocked" to await resolution to the certificate problem in #103539 and then deployment on OSD.
Updated by livdywan almost 3 years ago
okurz wrote:
so, I assume all good regarding the improvement of the error message? I will keep the ticket in "Blocked" to await resolution to the certificate problem in #103539 and then deployment on OSD.
Looks good to me.
cdywan wrote:
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending ContainersNotInitialized: "containers with incomplete status: [init-permissions]" ContainersNotReady: "containers with unready status: [build helper]" Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending ContainersNotReady: "containers with unready status: [build helper]" Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...
Upstream issue: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27367#note_701735040
Updated by okurz over 2 years ago
- Status changed from Blocked to Resolved
We have improved the error reporting and certificates have been fixed in #103539 with monitoring and alerts
Updated by livdywan over 2 years ago
- Copied to action #105145: osd-deployment pipelines fail because ContainersNotInitialized size:M added