https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842021-04-28T10:27:26ZopenSUSE Project Management ToolopenQA Project - action #91902: Tests incomplete with reason "Failed modules: …"https://progress.opensuse.org/issues/91902?journal_id=4017152021-04-28T10:27:26Zmkittlermarius.kittler@suse.com
<ul></ul><p>Suggestion 1. would actually be fairly simple to implement: <a href="https://github.com/os-autoinst/openQA/pull/3868" class="external">https://github.com/os-autoinst/openQA/pull/3868</a></p>
openQA Project - action #91902: Tests incomplete with reason "Failed modules: …"https://progress.opensuse.org/issues/91902?journal_id=4018892021-04-28T17:47:54Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-4 priority-default closed child" href="/issues/90152">action #90152</a>: module results missing on quick job (on auto-restarting worker)</i> added</li></ul> openQA Project - action #91902: Tests incomplete with reason "Failed modules: …"https://progress.opensuse.org/issues/91902?journal_id=4018922021-04-28T17:51:13Zokurzokurz@suse.com
<ul><li><strong>Target version</strong> set to <i>Ready</i></li></ul><p>I think you could consider this still part of your work on <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: module results missing on quick job (on auto-restarting worker) (Resolved)" href="https://progress.opensuse.org/issues/90152">#90152</a> , isn't it?</p>
openQA Project - action #91902: Tests incomplete with reason "Failed modules: …"https://progress.opensuse.org/issues/91902?journal_id=4019912021-04-29T04:10:41Zopenqa_reviewopenqa-review@suse.de
<ul><li><strong>Due date</strong> set to <i>2021-05-13</i></li></ul><p>Setting due date based on mean cycle time of SUSE QE Tools</p>
openQA Project - action #91902: Tests incomplete with reason "Failed modules: …"https://progress.opensuse.org/issues/91902?journal_id=4030442021-04-30T10:52:14Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Feedback</i></li></ul><p>The PR implementing suggestion 1 has been merged. Let's whether such incompletes now disappear in production.</p>
<p>Since the ticket has been moved to ready I can also dig a little bit within the code to find out why it actually behaves like that and maybe we can find a better fix which actually tackles the root cause.</p>
openQA Project - action #91902: Tests incomplete with reason "Failed modules: …"https://progress.opensuse.org/issues/91902?journal_id=4033992021-05-03T13:35:36Zmkittlermarius.kittler@suse.com
<ul></ul><p>There are not further occurrences on OSD/o3. (There's actually one more occurrence on OSD but the PR has only been deployed on OSD since this morning and it happened before.)</p>
<p>Maybe the root cause is an error (which for some reason doesn't show up in the log) when posting the test order. The error could be on the client side or on the server side. Since we don't post the test order again and again on every single status update we might end up uploading results for test modules the web UI doesn't know about. Likely the re-try solution is nevertheless sufficient. I'll check OSD/o3 again in a few days to confirm.</p>
openQA Project - action #91902: Tests incomplete with reason "Failed modules: …"https://progress.opensuse.org/issues/91902?journal_id=4059162021-05-10T10:10:32Zmkittlermarius.kittler@suse.com
<ul></ul><p>There were 2 more jobs matching the query from the ticket description on o3.</p>
<p>Here it is pretty clear that posting the status didn't work. Since it is a 403 error we don't retry. The subsequent errors are only a consequence of the failing status post:</p>
<pre><code>Mai 06 19:18:21 openqaworker7 worker[5293]: [info] isotovideo has been started (PID: 5711)
Mai 06 19:18:21 openqaworker7 worker[5293]: [info] 5711: WORKING 1729840
Mai 06 19:26:42 openqaworker7 worker[5293]: [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/1729840/status): 403 response: timestamp mismatch (remaining tries: 0)
Mai 06 19:26:42 openqaworker7 worker[5293]: [error] Unable to make final image uploads. Maybe the web UI considers this job already dead.
Mai 06 19:30:15 openqaworker7 worker[5293]: [info] Trying to stop job gracefully by announcing it to command server via http://localhost:20023/d_RCcPiPEhsSRaTG/broadcast
Mai 06 19:30:16 openqaworker7 worker[5293]: [info] Isotovideo exit status: 1
Mai 06 19:30:16 openqaworker7 worker[5293]: [info] +++ worker notes +++
Mai 06 19:30:16 openqaworker7 worker[5293]: [info] End time: 2021-05-06 17:30:16
Mai 06 19:30:16 openqaworker7 worker[5293]: [info] Result: api-failure
Mai 06 19:30:40 openqaworker7 worker[5293]: [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/1729840/status): 490 response: Failed modules: bootloader, isosize (remaining tries: 59)
Mai 06 19:32:41 openqaworker7 worker[5293]: [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/1729840/status): 490 response: Failed modules: bootloader, isosize (remaining tries: 58)
Mai 06 19:32:56 openqaworker7 worker[5293]: [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/1729840/status): 490 response: Failed modules: bootloader, isosize (remaining tries: 57)
…
Mai 06 19:56:39 openqaworker7 worker[5293]: [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/1729840/status): 490 response: Failed modules: bootloader, isosize (remaining tries: 2)
Mai 06 19:56:49 openqaworker7 worker[5293]: [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/1729840/status): 490 response: Failed modules: bootloader, isosize (remaining tries: 1)
Mai 06 19:56:54 openqaworker7 worker[5293]: [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/1729840/status): 490 response: Failed modules: bootloader, isosize (remaining tries: 0)
Mai 06 19:56:54 openqaworker7 worker[5293]: [error] Unable to make final image uploads. Maybe the web UI considers this job already dead.
Mai 06 19:56:54 openqaworker7 worker[5293]: [info] Registering with openQA http://openqa1-opensuse
Mai 06 19:56:55 openqaworker7 worker[5293]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/188
Mai 06 19:56:55 openqaworker7 worker[5293]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 188
Mai 06 19:56:55 openqaworker7 worker[5293]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/WebUIConnection.pm line 212.
Mai 06 19:56:55 openqaworker7 worker[5293]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/188 finished by remote side with code 1005, - trying again in 10 seconds
Mai 06 19:56:55 openqaworker7 systemd[1]: openqa-worker-auto-restart@2.service: Service RestartSec=100ms expired, scheduling restart.
</code></pre>
<p>In the other occurrence posting the status failed as well but for a different reason (which is actually rather strange):</p>
<pre><code>Mai 07 15:44:16 openqaworker7 worker[25509]: [info] isotovideo has been started (PID: 26578)
Mai 07 15:44:16 openqaworker7 worker[25509]: [info] 26578: WORKING 1730907
Mai 07 15:44:21 openqaworker7 worker[25509]: [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/1730907/status): 400 response: can't mkdir /var/lib/openqa/testresults/01730/01730907-opensuse-Tumbleweed-JeOS-for-kvm-and-xen-x86_64-Build20210505-jeos-containers@64bit_virtio: File exists at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1260.
Mai 07 15:44:21 openqaworker7 worker[25509]: (remaining tries: 0)
Mai 07 15:44:21 openqaworker7 worker[25509]: [error] Unable to make final image uploads. Maybe the web UI considers this job already dead.
Mai 07 15:44:21 openqaworker7 worker[25509]: [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/1730907/status): 490 response: Failed modules: boot_to_desktop (remaining tries: 59)
Mai 07 15:44:21 openqaworker7 worker[25509]: [info] Trying to stop job gracefully by announcing it to command server via http://localhost:20083/_sdtYjp_UeMfsyEp/broadcast
Mai 07 15:44:22 openqaworker7 worker[25509]: [info] Isotovideo exit status: 1
</code></pre>
<p>Note that these problems are not a regression of <a href="https://github.com/os-autoinst/openQA/pull/3813">https://github.com/os-autoinst/openQA/pull/3813</a> anymore. Not considering this an error wouldn't had prevented these jobs from being incompletes. However, it is problematic that the reason is overridden. At least in this particular situation it would make more sense if the first API error would be passed as reason.</p>
<p>There's also a Perl warning which is a recent regression of another change, see <a href="https://github.com/os-autoinst/openQA/pull/3893">https://github.com/os-autoinst/openQA/pull/3893</a>.</p>
openQA Project - action #91902: Tests incomplete with reason "Failed modules: …"https://progress.opensuse.org/issues/91902?journal_id=4077372021-05-17T07:39:09Zokurzokurz@suse.com
<ul></ul><blockquote>
<p>Note that these problems are not a regression of <a href="https://github.com/os-autoinst/openQA/pull/3813" class="external">https://github.com/os-autoinst/openQA/pull/3813</a> anymore. Not considering this an error wouldn't had prevented these jobs from being incompletes. However, it is problematic that the reason is overridden. At least in this particular situation it would make more sense if the first API error would be passed as reason.</p>
<p>There's also a Perl warning which is a recent regression of another change, see <a href="https://github.com/os-autoinst/openQA/pull/3893" class="external">https://github.com/os-autoinst/openQA/pull/3893</a>.</p>
</blockquote>
<p><a href="https://github.com/os-autoinst/openQA/pull/3893" class="external">https://github.com/os-autoinst/openQA/pull/3893</a> is merged. What are your further plans now?</p>
openQA Project - action #91902: Tests incomplete with reason "Failed modules: …"https://progress.opensuse.org/issues/91902?journal_id=4082832021-05-18T10:22:04Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>As discussed we consider ourselves done here. I crosschecked again that there are no new jobs with the original symptoms:</p>
<pre><code>$ psql -d postgresql://XXX:XXX@openqa.nue.suse.com/openqa -c "select id, t_finished, state, result, reason from jobs where reason like '%Failed modules:%' order by id desc limit 10;"
id | t_finished | state | result | reason
---------+---------------------+-------+------------+-------------------------------------------------------------------------
5924300 | 2021-04-30 07:56:29 | done | incomplete | api failure: 400 response: Failed modules: bootloader
5900633 | 2021-04-27 02:13:00 | done | incomplete | api failure: 400 response: Failed modules: bootloader_zkvm
5853177 | 2021-04-20 07:31:29 | done | incomplete | api failure: 400 response: Failed modules: boot_to_desktop
5853172 | 2021-04-20 07:31:18 | done | incomplete | api failure: 400 response: Failed modules: boot_to_desktop
5845784 | 2021-04-19 13:29:29 | done | incomplete | api failure: 400 response: Failed modules: boot_to_desktop
5825710 | 2021-04-15 07:58:48 | done | incomplete | api failure: 400 response: Failed modules: version_switch_origin_system
5791697 | 2021-04-08 10:42:19 | done | incomplete | api failure: 400 response: Failed modules: version_switch_origin_system
5789446 | 2021-04-07 18:58:41 | done | incomplete | api failure: 400 response: Failed modules: boot_windows
5789444 | 2021-04-07 18:58:38 | done | incomplete | api failure: 400 response: Failed modules: boot_windows
5781982 | 2021-04-08 07:32:30 | done | incomplete | api failure: 400 response: Failed modules: boot_to_desktop
(10 rows)
</code></pre>