https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842021-08-27T07:46:44ZopenSUSE Project Management ToolopenQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4389892021-08-27T07:46:44Zokurzokurz@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/438989/diff?detail_id=416234">diff</a>)</li></ul> openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4389922021-08-27T07:49:00Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>okurz</i></li></ul><p>Retriggered the one mentioned job as <a href="https://openqa.suse.de/tests/6955090" class="external">https://openqa.suse.de/tests/6955090</a></p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4389982021-08-27T07:58:44Zokurzokurz@suse.com
<ul></ul><p>sudden rise in incompletes on <a href="https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?viewPanel=17&orgId=1&from=1630028028398&to=1630050969088" class="external">https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?viewPanel=17&orgId=1&from=1630028028398&to=1630050969088</a> so triggered <code>env host=openqa.suse.de openqa-advanced-retrigger-jobs</code></p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4390102021-08-27T08:17:59Zokurzokurz@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/439010/diff?detail_id=416267">diff</a>)</li></ul><p>Disabled OSD deployment until this is ruled out: <a href="https://gitlab.suse.de/openqa/osd-deployment/-/pipeline_schedules/36/edit" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/pipeline_schedules/36/edit</a></p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4390372021-08-27T09:16:48Zokurzokurz@suse.com
<ul></ul><p>Mainly tinita and mkittler have identified <a href="https://github.com/os-autoinst/os-autoinst/pull/1748" class="external">https://github.com/os-autoinst/os-autoinst/pull/1748</a> as the culprit and together we have prepared a change <a href="https://github.com/os-autoinst/os-autoinst/pull/1757" class="external">https://github.com/os-autoinst/os-autoinst/pull/1757</a></p>
<p>Checking test coverage from the culprit PR <a href="https://app.codecov.io/gh/os-autoinst/os-autoinst/compare/1748/diff#diff-Ym13cWVtdS5wbQ==" class="external">https://app.codecov.io/gh/os-autoinst/os-autoinst/compare/1748/diff#diff-Ym13cWVtdS5wbQ==</a> shows that we do have good statement coverage in the changed areas. We do not check types of return values and I wonder if we should at all in perl.</p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4390882021-08-27T10:31:58Zokurzokurz@suse.com
<ul><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>High</i></li></ul><p>paused the deployment, os-autoinst PR merged, can trigger deployment later today, e.g. evening</p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4391392021-08-27T11:01:36Ztinitatina.mueller+trick-redmine@suse.com
<ul></ul><p>okurz wrote:</p>
<blockquote>
<p>Checking test coverage from the culprit PR <a href="https://app.codecov.io/gh/os-autoinst/os-autoinst/compare/1748/diff#diff-Ym13cWVtdS5wbQ==" class="external">https://app.codecov.io/gh/os-autoinst/os-autoinst/compare/1748/diff#diff-Ym13cWVtdS5wbQ==</a> shows that we do have good statement coverage in the changed areas. We do not check types of return values and I wonder if we should at all in perl.</p>
</blockquote>
<p>Remember that codecov only shows line coverage, while Devel::Cover is able to do statement and branch coverage. We should at least upload the Devel::Cover report as an artifact (as we do already for openQA in CircleCI).</p>
<p>Since our tests did not fail, it's possible that the case where the variable in question is a file object is not covered by our tests.<br>
And it's possible to have 100% code coverage and still having the problem that certain scenarios are not covered.</p>
<p>So in this case we should check if the scenario happens in one of our tests.</p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4392352021-08-27T15:36:16Zokurzokurz@suse.com
<ul><li><strong>Priority</strong> changed from <i>High</i> to <i>Urgent</i></li></ul><p>tinita wrote:</p>
<blockquote>
<p>And it's possible to have 100% code coverage and still having the problem that certain scenarios are not covered.</p>
</blockquote>
<p>of course</p>
<blockquote>
<p>So in this case we should check if the scenario happens in one of our tests.</p>
</blockquote>
<p>Well, we have the full stack test and a git clone test. I guess only a test scenario that would combine both could trigger a problem that a path object is part of %bmwqemu::vars <em>and</em> we try to serialize that by saving the variables</p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4392412021-08-27T15:45:54Zokurzokurz@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/439241/diff?detail_id=416471">diff</a>)</li><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>High</i></li></ul><p>fix was merged and packages have been built. I triggered a new deployment pipeline: <a href="https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/192833" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/192833</a></p>
<p>Noted down "open points" to discuss as follow-ups</p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4392502021-08-27T17:12:33Zlivdywanliv.dywan@suse.com
<ul></ul><p>I found a way to hit the problem in the git test: <a href="https://github.com/os-autoinst/os-autoinst/pull/1758" class="external">https://github.com/os-autoinst/os-autoinst/pull/1758</a></p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4392712021-08-28T04:09:28Zopenqa_reviewopenqa-review@suse.de
<ul><li><strong>Due date</strong> set to <i>2021-09-11</i></li></ul><p>Setting due date based on mean cycle time of SUSE QE Tools</p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4393582021-08-30T08:03:29Zokurzokurz@suse.com
<ul></ul><p>merged <a href="https://github.com/os-autoinst/os-autoinst/pull/1758" class="external">https://github.com/os-autoinst/os-autoinst/pull/1758</a> , thanks cdywan.</p>
<p>Triggered a new deployment now: <a href="https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/194658" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/194658</a> , currently running</p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4394002021-08-30T08:43:14Zokurzokurz@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/439400/diff?detail_id=416633">diff</a>)</li></ul><p>openqaworker-arm-2 seems to show problems in deployment, seems non-responsive. I removed from salt and retriggered deployment.</p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4394842021-08-30T11:03:44Zokurzokurz@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/439484/diff?detail_id=416720">diff</a>)</li></ul><p>deployment step passed but post-monitor failed as openqaworker-arm-2 was not reporting as online. Fixed that manually and retriggered pipeline step, succeeded. Readded openqaworker-arm-2 to salt and did <code>zypper dup</code> to bring it up to the same state as others. Increased CI job timeout in <a href="https://gitlab.suse.de/openqa/osd-deployment/-/settings/ci_cd#js-general-pipeline-settings" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/settings/ci_cd#js-general-pipeline-settings</a> from 1h to 2h.</p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4394932021-08-30T11:07:44Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul><p>And finally created <a href="https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/33" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/33</a> to ignore if individual salt nodes do not respond in time within the actual deployment step when we can't go back.</p>
openQA Infrastructure - action #97574: deployment failed in gitlab job with "ERROR: Job failed: execution took longer than 1h0m0s seconds"https://progress.opensuse.org/issues/97574?journal_id=4416832021-09-02T13:17:28Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>merged <a href="https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/33" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/33</a></p>