Actions
action #91902
closedTests incomplete with reason "Failed modules: …"
Description
observation¶
Tests incomplete with reason "Failed modules: …". This doesn't happen very often but was observed a few times on o3 and OSD, e.g.:
openqa=> select id, t_finished, state, result, reason from jobs where reason like '%Failed modules:%' order by id desc limit 10;
id | t_finished | state | result | reason
---------+---------------------+-------+------------+-------------------------------------------------------
1717042 | 2021-04-28 08:34:31 | done | incomplete | api failure: 400 response: Failed modules: bootloader
1716952 | 2021-04-28 03:38:04 | done | incomplete | api failure: 400 response: Failed modules: bootloader
1700877 | 2021-04-15 23:08:05 | done | incomplete | api failure: 400 response: Failed modules: bootloader
(3 Zeilen)
openqa=> select id, t_finished, reason from jobs where reason like '%Failed modules:%' order by id desc limit 10;
id | t_finished | reason
---------+---------------------+-------------------------------------------------------------------------
5900633 | 2021-04-27 02:13:00 | api failure: 400 response: Failed modules: bootloader_zkvm
5853177 | 2021-04-20 07:31:29 | api failure: 400 response: Failed modules: boot_to_desktop
5853172 | 2021-04-20 07:31:18 | api failure: 400 response: Failed modules: boot_to_desktop
5845784 | 2021-04-19 13:29:29 | api failure: 400 response: Failed modules: boot_to_desktop
5825710 | 2021-04-15 07:58:48 | api failure: 400 response: Failed modules: version_switch_origin_system
5791697 | 2021-04-08 10:42:19 | api failure: 400 response: Failed modules: version_switch_origin_system
5789446 | 2021-04-07 18:58:41 | api failure: 400 response: Failed modules: boot_windows
5789444 | 2021-04-07 18:58:38 | api failure: 400 response: Failed modules: boot_windows
5781982 | 2021-04-08 07:32:30 | api failure: 400 response: Failed modules: boot_to_desktop
Note that the worker log isn't very verbose:
Apr 28 08:34:25 imagetester worker[11572]: [info] Finished to rsync tests
Apr 28 08:34:25 imagetester worker[11572]: [info] Preparing cgroup to start isotovideo
Apr 28 08:34:25 imagetester worker[11572]: [info] Using cgroup /sys/fs/cgroup/systemd/openqa.slice/openqa-worker.slice/openqa-worker-auto-restart@2.service/1717042
Apr 28 08:34:25 imagetester worker[11572]: [info] Starting isotovideo container
Apr 28 08:34:25 imagetester worker[11572]: [info] 14520: WORKING 1717042
Apr 28 08:34:25 imagetester worker[11572]: [info] isotovideo has been started (PID: 14520)
Apr 28 08:34:31 imagetester worker[11572]: [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/1717042/status): 400 response: Failed modules: bootloader (remaining tries: 0)
Apr 28 08:34:31 imagetester worker[11572]: [error] Unable to make final image uploads. Maybe the web UI considers this job already dead.
Apr 28 08:34:31 imagetester worker[11572]: [info] Trying to stop job gracefully by announcing it to command server via http://localhost:20023/8fm_RpciitjX9Cjo/broadcast
Apr 28 08:34:31 imagetester worker[11572]: [info] Isotovideo exit status: 1
Apr 28 08:34:31 imagetester worker[11572]: [info] +++ worker notes +++
Apr 28 08:34:31 imagetester worker[11572]: [info] End time: 2021-04-28 08:34:31
Apr 28 08:34:31 imagetester worker[11572]: [info] Result: api-failure
The jobs themselves actually show the test modules including bootloader
(e.g. https://openqa.opensuse.org/tests/1717042). So uploading the test order did at least eventually work.
suggestions¶
- The exact error message has been introduced intentionally by https://github.com/os-autoinst/openQA/pull/3813 to track down upload issues. The error condition was previously silently ignored. Maybe it makes sense to allow the worker to retry here which would be a compromise between the new and the old behavior.
- Maybe there's also something wrong within the worker (or isotovideo) code. The expectation is that the test order is always posted before or at the same time any results are posted. Maybe that wasn't the case here and therefore posting the result here fails. In this case 1. would likely still be an easy fix (considering that the test order could eventually be uploaded).
Actions