Project

General

Profile

Actions

action #91902

closed

Tests incomplete with reason "Failed modules: …"

Added by mkittler about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-04-28
Due date:
2021-05-13
% Done:

0%

Estimated time:

Description

observation

Tests incomplete with reason "Failed modules: …". This doesn't happen very often but was observed a few times on o3 and OSD, e.g.:

openqa=> select id, t_finished, state, result, reason from jobs where reason like '%Failed modules:%' order by id desc limit 10;
   id    |     t_finished      | state |   result   |                        reason                         
---------+---------------------+-------+------------+-------------------------------------------------------
 1717042 | 2021-04-28 08:34:31 | done  | incomplete | api failure: 400 response: Failed modules: bootloader
 1716952 | 2021-04-28 03:38:04 | done  | incomplete | api failure: 400 response: Failed modules: bootloader
 1700877 | 2021-04-15 23:08:05 | done  | incomplete | api failure: 400 response: Failed modules: bootloader
(3 Zeilen)
openqa=> select id, t_finished, reason from jobs where reason like '%Failed modules:%' order by id desc limit 10;
   id    |     t_finished      |                                 reason                                  
---------+---------------------+-------------------------------------------------------------------------
 5900633 | 2021-04-27 02:13:00 | api failure: 400 response: Failed modules: bootloader_zkvm
 5853177 | 2021-04-20 07:31:29 | api failure: 400 response: Failed modules: boot_to_desktop
 5853172 | 2021-04-20 07:31:18 | api failure: 400 response: Failed modules: boot_to_desktop
 5845784 | 2021-04-19 13:29:29 | api failure: 400 response: Failed modules: boot_to_desktop
 5825710 | 2021-04-15 07:58:48 | api failure: 400 response: Failed modules: version_switch_origin_system
 5791697 | 2021-04-08 10:42:19 | api failure: 400 response: Failed modules: version_switch_origin_system
 5789446 | 2021-04-07 18:58:41 | api failure: 400 response: Failed modules: boot_windows
 5789444 | 2021-04-07 18:58:38 | api failure: 400 response: Failed modules: boot_windows
 5781982 | 2021-04-08 07:32:30 | api failure: 400 response: Failed modules: boot_to_desktop

Note that the worker log isn't very verbose:

Apr 28 08:34:25 imagetester worker[11572]: [info] Finished to rsync tests
Apr 28 08:34:25 imagetester worker[11572]: [info] Preparing cgroup to start isotovideo
Apr 28 08:34:25 imagetester worker[11572]: [info] Using cgroup /sys/fs/cgroup/systemd/openqa.slice/openqa-worker.slice/openqa-worker-auto-restart@2.service/1717042
Apr 28 08:34:25 imagetester worker[11572]: [info] Starting isotovideo container
Apr 28 08:34:25 imagetester worker[11572]: [info] 14520: WORKING 1717042
Apr 28 08:34:25 imagetester worker[11572]: [info] isotovideo has been started (PID: 14520)
Apr 28 08:34:31 imagetester worker[11572]: [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/1717042/status): 400 response: Failed modules: bootloader (remaining tries: 0)
Apr 28 08:34:31 imagetester worker[11572]: [error] Unable to make final image uploads. Maybe the web UI considers this job already dead.
Apr 28 08:34:31 imagetester worker[11572]: [info] Trying to stop job gracefully by announcing it to command server via http://localhost:20023/8fm_RpciitjX9Cjo/broadcast
Apr 28 08:34:31 imagetester worker[11572]: [info] Isotovideo exit status: 1
Apr 28 08:34:31 imagetester worker[11572]: [info] +++ worker notes +++
Apr 28 08:34:31 imagetester worker[11572]: [info] End time: 2021-04-28 08:34:31
Apr 28 08:34:31 imagetester worker[11572]: [info] Result: api-failure

The jobs themselves actually show the test modules including bootloader (e.g. https://openqa.opensuse.org/tests/1717042). So uploading the test order did at least eventually work.

suggestions

  1. The exact error message has been introduced intentionally by https://github.com/os-autoinst/openQA/pull/3813 to track down upload issues. The error condition was previously silently ignored. Maybe it makes sense to allow the worker to retry here which would be a compromise between the new and the old behavior.
  2. Maybe there's also something wrong within the worker (or isotovideo) code. The expectation is that the test order is always posted before or at the same time any results are posted. Maybe that wasn't the case here and therefore posting the result here fails. In this case 1. would likely still be an easy fix (considering that the test order could eventually be uploaded).

Related issues 1 (0 open1 closed)

Related to openQA Project - action #90152: module results missing on quick job (on auto-restarting worker)Resolvedmkittler2021-03-16

Actions
Actions

Also available in: Atom PDF