action #103527
closed
osd-deployment pipelines fail and alerts are not handled size:M
Added by livdywan about 3 years ago.
Updated almost 3 years ago.
Description
Observation¶
osd-deployment failed this morning. A re-trigger produced the same error:
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
10 ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
11 ContainersNotReady: "containers with unready status: [build helper]"
12Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
13 ContainersNotReady: "containers with unready status: [build helper]"
14Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...
Acceptance criteria¶
- AC1: osd-deployment pipeline is run without errors
Suggestions¶
Re-trigger and see if it works
- Investigate what
ContainersNotReady
means
- File an infra ticket
- Status changed from New to In Progress
- Assignee set to okurz
Observation¶
osd-deployment failed this morning. A re-trigger produced the same error:
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...
[...]
Executing "step_script" stage of the job script
$ eval "$GRAFANA_ALERTS" > current_alerts
Cleaning up project directory and file based variables
Acceptance criteria¶
- AC1: osd-deployment pipeline is run without errors
- AC2: error is visible from the pipeline
Suggestions¶
Re-trigger and see if it works Re-trigger reproduces the same error within seconds
- Investigate what
ContainersNotReady
means
- File an infra ticket
- Improve the pipeline to show what $GRAFANA_ALERTS is and how it fails
- Status changed from In Progress to Feedback
- Related to action #103539: Update expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:M added
- Status changed from Feedback to Blocked
MR merged. New deployment triggered which fails quite obviously in https://gitlab.suse.de/openqa/osd-deployment/-/jobs/732508#L46 with
$ eval "$GRAFANA_ALERTS" > current_alerts
curl: (60) SSL certificate problem: certificate has expired
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: command terminated with exit code 1
so, I assume all good regarding the improvement of the error message? I will keep the ticket in "Blocked" to await resolution to the certificate problem in #103539 and then deployment on OSD.
okurz wrote:
so, I assume all good regarding the improvement of the error message? I will keep the ticket in "Blocked" to await resolution to the certificate problem in #103539 and then deployment on OSD.
Looks good to me.
cdywan wrote:
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...
Upstream issue: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27367#note_701735040
- Status changed from Blocked to Resolved
We have improved the error reporting and certificates have been fixed in #103539 with monitoring and alerts
- Copied to action #105145: osd-deployment pipelines fail because ContainersNotInitialized size:M added
Also available in: Atom
PDF