https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842020-01-09T21:48:04ZopenSUSE Project Management ToolopenQA Infrastructure - action #61844: auto_review:"download failed: 521 - Connect timeout" Network issues on openqaworker-arm-3 (and others)https://progress.opensuse.org/issues/61844?journal_id=2698792020-01-09T21:48:04Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>Network issues on openqaworker-arm-3</i> to <i>auto_review:"download failed: 521 - Connect timeout" Network issues on openqaworker-arm-3</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>Feedback</i></li><li><strong>Assignee</strong> set to <i>okurz</i></li><li><strong>Target version</strong> set to <i>Current Sprint</i></li></ul><p>So I did two things so far:</p>
<ul>
<li>Change ticket to be picked up by <a href="https://gitlab.suse.de/openqa/auto-review/" class="external">https://gitlab.suse.de/openqa/auto-review/</a></li>
<li>Reduce number of worker instances on arm-3 to 4 for <a class="issue tracker-4 status-3 priority-3 priority-lowest closed" title="action: all arm worker die after some time (Resolved)" href="https://progress.opensuse.org/issues/41882">#41882</a> but also to see if this has an impact on stability</li>
</ul>
openQA Infrastructure - action #61844: auto_review:"download failed: 521 - Connect timeout" Network issues on openqaworker-arm-3 (and others)https://progress.opensuse.org/issues/61844?journal_id=2698852020-01-09T21:52:33Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-3 priority-lowest closed" href="/issues/55529">action #55529</a>: job incompletes when it can not reach the openqa webui host just for a single time aka. retry on 521 connect timeout in cache</i> added</li></ul> openQA Infrastructure - action #61844: auto_review:"download failed: 521 - Connect timeout" Network issues on openqaworker-arm-3 (and others)https://progress.opensuse.org/issues/61844?journal_id=2725162020-01-19T21:07:19Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>auto_review:"download failed: 521 - Connect timeout" Network issues on openqaworker-arm-3</i> to <i>auto_review:"download failed: 521 - Connect timeout" Network issues on openqaworker-arm-3 (and others)</i></li></ul><p>This seems to be linked to <a class="issue tracker-4 status-3 priority-6 priority-high2 closed" title="action: many incompletes with just "setup failure" and no further information (Resolved)" href="https://progress.opensuse.org/issues/62237">#62237</a> , also on onther machines, e.g. <a href="https://openqa.suse.de/tests/3796147" class="external">https://openqa.suse.de/tests/3796147</a> on arm-1.</p>
openQA Infrastructure - action #61844: auto_review:"download failed: 521 - Connect timeout" Network issues on openqaworker-arm-3 (and others)https://progress.opensuse.org/issues/61844?journal_id=2798892020-02-24T06:52:35Zokurzokurz@suse.com
<ul></ul><p>The SQL query <code>select id,reason,test from jobs where (result='incomplete' and t_finished >= (NOW() - interval '240 hour') and id in (select job_id from comments where text ~ 'poo#61844')) order by id desc;</code> doesn't yield any references for the past 10 days so seems like the problem didn't happen again, at least not in the same way or with the same message.</p>
<p>The latest check for incompletes on <a href="https://gitlab.suse.de/openqa/auto-review/pipelines" class="external">https://gitlab.suse.de/openqa/auto-review/pipelines</a> in <a href="https://gitlab.suse.de/openqa/auto-review/-/jobs/172723" class="external">https://gitlab.suse.de/openqa/auto-review/-/jobs/172723</a> also only shows to other reasons for incompletes.</p>
<p>By now incomplete openQA jobs should also give a "reason" with the relevant information directly available in the info box and available over API (and of course DB). With this the next time we can have an easier time identifying the issue.</p>
<p>We are still running with the reduced number of worker instances.</p>
openQA Infrastructure - action #61844: auto_review:"download failed: 521 - Connect timeout" Network issues on openqaworker-arm-3 (and others)https://progress.opensuse.org/issues/61844?journal_id=2884022020-03-27T09:58:49Zokurzokurz@suse.com
<ul><li><strong>Blocked by</strong> <i><a class="issue tracker-4 status-3 priority-5 priority-high3 closed" href="/issues/64737">action #64737</a>: openqaworker-arm-3 is down since 2020-03-16, also IPMI unresponsive</i> added</li></ul> openQA Infrastructure - action #61844: auto_review:"download failed: 521 - Connect timeout" Network issues on openqaworker-arm-3 (and others)https://progress.opensuse.org/issues/61844?journal_id=2884052020-03-27T09:59:23Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Blocked</i></li></ul><p>I would check again and also increase number of worker instances again but openqaworker-arm-3 is completely down including the management interface, blocked by <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: openqaworker-arm-3 is down since 2020-03-16, also IPMI unresponsive (Resolved)" href="https://progress.opensuse.org/issues/64737">#64737</a></p>
openQA Infrastructure - action #61844: auto_review:"download failed: 521 - Connect timeout" Network issues on openqaworker-arm-3 (and others)https://progress.opensuse.org/issues/61844?journal_id=2928082020-04-14T17:55:43Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Blocked</i> to <i>Resolved</i></li></ul><p>openqaworker-arm-3 is back up, <a href="https://github.com/os-autoinst/openQA/pull/2895" class="external">https://github.com/os-autoinst/openQA/pull/2895</a> should help on retriable errors.</p>