Project

General

Profile

action #59273

module result missing for incompleting job

Added by okurz over 1 year ago. Updated 8 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2019-11-10
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP1-Installer-DVD-QR-x86_64-btrfs@64bit-ipmi incompletes in
installation_overview with no module result causing the label carry over to not work as no "module" can be identified.

Reproducible

Fails since (at least) Build 7.1 (current job)

https://openqa.suse.de/tests/3545493#next_previous shows the history with some "passed" jobs, then it turned to "incomplete" but with reporting the module "installation_overview" and then the following ones are just incomplete without any failed module reported which is a pre-requisite for label carry over working.

Expected result

Last good: https://openqa.suse.de/tests/3539985#step/installation_overview/1

Problem

  • Last good:
    ** os-autoinst: 4.5.1571474599.7d873cb5

  • First bad:
    ** os-autoinst: 4.6.1572372410.447dab86

os-autoinst likely candidates for regression:

$ git log1 --no-merges 7d873cb5..447dab86
447dab86 Allow consoles to persist over reset (#1232)
327be131 myjsonrpc: Go back to incremental parsing (#1248)
73261e5b Use python3 by default (#1247)
3d51864a (perlpunk/num-queues-uninitialized) Avoid warning in comparison; num_queues might be undef
87f895f3 Improve here tag handling in script_output()
7d482890 (perlpunk/new-status) Increase version numbers
87276ac7 Add new status file that worker can read from
c52303a0 Consider tests with `tools/tidy --only-changed`
ab0554c2 (okurz/feature/container) spec: Fix missing, additional runtime requirements
a45e17fa Allow tidy to run only over local changes
37d31fc7 Improve 'check_ssh_serial'
55db2e0d Make start_serial_grab blocking
5266e820 Fix svirt backend's 100 % CPU usage
6ba53926 (okurz/fix/codecov) codecov: Adjust to current coverage target

Further details

Always latest result in this scenario: latest

Related to #59267 about the problem of the incomplete itself.

Please handle this with urgency because when label carry over not working human reviewers would need to re-attach the label for every single job.


Related issues

Related to openQA Project - action #59267: test incompletes in installation_overview - when trying to switch to root-ssh?Resolved2019-11-10

Related to openQA Project - action #45062: Better visualization of incompletes - show module in which incomplete happensResolved2018-12-12

History

#1 Updated by okurz over 1 year ago

  • Related to action #59267: test incompletes in installation_overview - when trying to switch to root-ssh? added

#2 Updated by okurz over 1 year ago

  • Description updated (diff)

#3 Updated by coolo over 1 year ago

The job doesn't fail - it crashes. In which module it crashed is generally speaking not a good hint. We can't label installation_overview as failed module IMO. We shouldn't incomplete, but fail in this scenario.

#4 Updated by coolo over 1 year ago

The backend most likely is not supposed to crash here but report a failure to select_console - and such die in testapi

[2019-11-10T09:33:56.307 CET] [debug] <<< testapi::check_screen(mustmatch='ssh-open', timeout=1)
[2019-11-10T09:33:56.794 CET] [debug] no match: 2.5s, best candidate: installation_overview-ssh-open-20170911 (0.00)
[2019-11-10T09:33:57.801 CET] [debug] no match: 1.5s, best candidate: ssh-open_firewall_disabled-20190221 (0.29)
[2019-11-10T09:33:59.344 CET] [debug] >>> testapi::_handle_found_needle: found ssh-open-20190207, similarity 1.00 @ 401/542
[2019-11-10T09:33:59.344 CET] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/installation/installation_overview.pm:83 called testapi::select_console
[2019-11-10T09:33:59.344 CET] [debug] <<< testapi::select_console(testapi_console='install-shell')
/usr/lib/os-autoinst/consoles/vnc_base.pm:58:{
  'hostname' => 'localhost',
  'port' => 58291,
  'ikvm' => 0
}
[2019-11-10T09:34:00.445 CET] [debug] Connected to Xvnc - PID 30160
icewm PID is 30182
xterm PID is 30184
[2019-11-10T09:34:01.526 CET] [debug] led state 0 0 0 -261
[2019-11-10T09:34:01.537 CET] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/installation/installation_overview.pm:83 called testapi::select_console
[2019-11-10T09:34:01.537 CET] [debug] <<< testapi::select_console(testapi_console='root-ssh')
/usr/lib/os-autoinst/consoles/vnc_base.pm:58:{
  'ikvm' => 0,
  'hostname' => 'localhost',
  'port' => 49909
}
[2019-11-10T09:34:02.658 CET] [debug] Connected to Xvnc - PID 30187
icewm PID is 30207
xterm PID is 30209
[2019-11-10T09:34:03.690 CET] [debug] Could not connect to root@scooter-1.qa.suse.de, Retrying after some seconds...
[2019-11-10T09:34:13.691 CET] [debug] Could not connect to root@scooter-1.qa.suse.de, Retrying after some seconds...
[2019-11-10T09:34:23.692 CET] [debug] Could not connect to root@scooter-1.qa.suse.de, Retrying after some seconds...
[2019-11-10T09:34:33.693 CET] [debug] Could not connect to root@scooter-1.qa.suse.de, Retrying after some seconds...
[2019-11-10T09:34:43.694 CET] [debug] Could not connect to root@scooter-1.qa.suse.de, Retrying after some seconds...
[2019-11-10T09:34:53.699 CET] [debug] Backend process died, backend errors are reported below in the following lines:
Error connecting to <root@scooter-1.qa.suse.de>: Connection refused
[2019-11-10T09:34:53.748 CET] [debug] IPMI: Chassis Power Control: Down/Off
[2019-11-10T09:34:53.748 CET] [debug] flushing frames
[2019-11-10T09:34:53.822 CET] [debug] sending magic and exit
[2019-11-10T09:34:53.822 CET] [debug] received magic close
[2019-11-10T09:34:53.823 CET] [debug] THERE IS NOTHING TO READ 15 4 3
[2019-11-10T09:34:53.823 CET] [debug] killing command server 28319 because test execution ended

#5 Updated by okurz over 1 year ago

I agree that a failed select_console should not fail the job. This is what #59267 is about. This ticket is about the regression that the incomplete which is triggered in a specific module is not notifying about the module which it did in before.

#6 Updated by coolo over 1 year ago

  • Category changed from Concrete Bugs to Feature requests
  • Priority changed from Urgent to Normal

I don't believe that's the case.

#7 Updated by okurz over 1 year ago

  • Related to action #45062: Better visualization of incompletes - show module in which incomplete happens added

#8 Updated by okurz over 1 year ago

#45062 describes that this did work in the past.

#9 Updated by mkittler over 1 year ago

Whether the test module is marked as failed (and e.g. shown as failed module in the various tables) depends on os-autoinst writing the test module result accordingly. Considering the inconsistent error handling in os-autoinst it is likely not very consistent about marking the test module as failed. So it might work in some cases like in #45062 but not generally.

Note that openQA and the result (e.g. incomplete) has not much influence here. Next & Previous and other lists will even show failed modules for passed jobs (if you manage to get a passed job with failed modules into the system).

#10 Updated by okurz about 1 year ago

  • Priority changed from Normal to Low
  • Target version set to future

Meanwhile we have much better reporting for incompletes with the "reason" that is accessible from the DB, API as well as UI, so this change for incomplete jobs is less important but still valid.

#11 Updated by okurz 8 months ago

  • Status changed from New to Resolved
  • Assignee set to okurz

https://openqa.suse.de/tests/5148360#next_previous shows that we do have support for failed modules in incomplete jobs. So it could be that this only works seldomly or in limited cases, e.g. when first the test module fails and then an incomplete happens in the same module. However I think this is good enough.

#12 Updated by okurz 8 months ago

  • Target version changed from future to Ready

Also available in: Atom PDF