https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842021-12-06T10:36:13ZopenSUSE Project Management ToolopenQA Project - action #103527: osd-deployment pipelines fail and alerts are not handled size:Mhttps://progress.opensuse.org/issues/103527?journal_id=4705892021-12-06T10:36:13Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>okurz</i></li></ul><a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><em>osd-deployment</em> failed this morning. A re-trigger produced <a href="https://gitlab.suse.de/openqa/osd-deployment/-/jobs/730432" class="external">the same error</a>:</p>
<pre><code>Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...
[...]
Executing "step_script" stage of the job script
$ eval "$GRAFANA_ALERTS" > current_alerts
Cleaning up project directory and file based variables
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> osd-deployment pipeline is run without errors</li>
<li><strong>AC2:</strong> error is visible from the pipeline</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li><del>Re-trigger and see if it works</del> Re-trigger reproduces the same error within seconds</li>
<li>Investigate what <code>ContainersNotReady</code> means</li>
<li>File an infra ticket</li>
<li>Improve the pipeline to show what $GRAFANA_ALERTS is and how it fails</li>
</ul>
openQA Project - action #103527: osd-deployment pipelines fail and alerts are not handled size:Mhttps://progress.opensuse.org/issues/103527?journal_id=4706042021-12-06T10:42:49Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul><p><a href="https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/41" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/41</a></p>
openQA Project - action #103527: osd-deployment pipelines fail and alerts are not handled size:Mhttps://progress.opensuse.org/issues/103527?journal_id=4708112021-12-07T09:16:18Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-5 priority-high3 closed" href="/issues/103539">action #103539</a>: Update expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:M</i> added</li></ul> openQA Project - action #103527: osd-deployment pipelines fail and alerts are not handled size:Mhttps://progress.opensuse.org/issues/103527?journal_id=4708172021-12-07T09:16:45Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Blocked</i></li></ul><p>MR merged. New deployment triggered which fails quite obviously in <a href="https://gitlab.suse.de/openqa/osd-deployment/-/jobs/732508#L46" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/jobs/732508#L46</a> with</p>
<pre><code>$ eval "$GRAFANA_ALERTS" > current_alerts
curl: (60) SSL certificate problem: certificate has expired
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: command terminated with exit code 1
</code></pre>
<p>so, I assume all good regarding the improvement of the error message? I will keep the ticket in "Blocked" to await resolution to the certificate problem in <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: Update expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:M (Resolved)" href="https://progress.opensuse.org/issues/103539">#103539</a> and then deployment on OSD.</p>
openQA Project - action #103527: osd-deployment pipelines fail and alerts are not handled size:Mhttps://progress.opensuse.org/issues/103527?journal_id=4708562021-12-07T10:23:53Zlivdywanliv.dywan@suse.com
<ul></ul><p>okurz wrote:</p>
<blockquote>
<p>so, I assume all good regarding the improvement of the error message? I will keep the ticket in "Blocked" to await resolution to the certificate problem in <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: Update expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:M (Resolved)" href="https://progress.opensuse.org/issues/103539">#103539</a> and then deployment on OSD.</p>
</blockquote>
<p>Looks good to me.</p>
<p>cdywan wrote:</p>
<blockquote>
<pre><code>Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-25ldi6vv-project-3731-concurrent-0cjhhg to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-25ldi6vv-project-3731-concurrent-0cjhhg via gitlab-worker2...
</code></pre></blockquote>
<p>Upstream issue: <a href="https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27367#note_701735040" class="external">https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27367#note_701735040</a></p>
openQA Project - action #103527: osd-deployment pipelines fail and alerts are not handled size:Mhttps://progress.opensuse.org/issues/103527?journal_id=4813002022-01-19T07:56:10Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Blocked</i> to <i>Resolved</i></li></ul><p>We have improved the error reporting and certificates have been fixed in <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: Update expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:M (Resolved)" href="https://progress.opensuse.org/issues/103539">#103539</a> with monitoring and alerts</p>
openQA Project - action #103527: osd-deployment pipelines fail and alerts are not handled size:Mhttps://progress.opensuse.org/issues/103527?journal_id=4820952022-01-20T11:35:09Zlivdywanliv.dywan@suse.com
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-4 status-3 priority-4 priority-default closed behind-schedule" href="/issues/105145">action #105145</a>: osd-deployment pipelines fail because ContainersNotInitialized size:M</i> added</li></ul>