action #59273: module result missing for incompleting job - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #59273

closed

module result missing for incompleting job

Added by okurz over 5 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

Low

Assignee:

okurz

Category:

Feature requests

Target version:

Ready

Start date:

2019-11-10

Due date:

% Done:

Estimated time:

Description

Observation¶

openQA test in scenario sle-15-SP1-Installer-DVD-QR-x86_64-btrfs@64bit-ipmi incompletes in
installation_overview with no module result causing the label carry over to not work as no "module" can be identified.

Reproducible¶

Fails since (at least) Build 7.1 (current job)

https://openqa.suse.de/tests/3545493#next_previous shows the history with some "passed" jobs, then it turned to "incomplete" but with reporting the module "installation_overview" and then the following ones are just incomplete without any failed module reported which is a pre-requisite for label carry over working.

Expected result¶

Last good: https://openqa.suse.de/tests/3539985#step/installation_overview/1

Problem¶

Last good:
** os-autoinst: 4.5.1571474599.7d873cb5
First bad:
** os-autoinst: 4.6.1572372410.447dab86

os-autoinst likely candidates for regression:

$ git log1 --no-merges 7d873cb5..447dab86
447dab86 Allow consoles to persist over reset (#1232)
327be131 myjsonrpc: Go back to incremental parsing (#1248)
73261e5b Use python3 by default (#1247)
3d51864a (perlpunk/num-queues-uninitialized) Avoid warning in comparison; num_queues might be undef
87f895f3 Improve here tag handling in script_output()
7d482890 (perlpunk/new-status) Increase version numbers
87276ac7 Add new status file that worker can read from
c52303a0 Consider tests with `tools/tidy --only-changed`
ab0554c2 (okurz/feature/container) spec: Fix missing, additional runtime requirements
a45e17fa Allow tidy to run only over local changes
37d31fc7 Improve 'check_ssh_serial'
55db2e0d Make start_serial_grab blocking
5266e820 Fix svirt backend's 100 % CPU usage
6ba53926 (okurz/fix/codecov) codecov: Adjust to current coverage target

Further details¶

Always latest result in this scenario: latest

Related to #59267 about the problem of the incomplete itself.

Please handle this with urgency because when label carry over not working human reviewers would need to re-attach the label for every single job.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by okurz over 5 years ago

Related to action #59267: test incompletes in installation_overview - when trying to switch to root-ssh? added

Actions

Copy link

Updated by okurz over 5 years ago

Description updated (diff)

Actions

Copy link

Updated by coolo over 5 years ago

The job doesn't fail - it crashes. In which module it crashed is generally speaking not a good hint. We can't label installation_overview as failed module IMO. We shouldn't incomplete, but fail in this scenario.

Actions

Copy link

Updated by coolo over 5 years ago

The backend most likely is not supposed to crash here but report a failure to select_console - and such die in testapi

[2019-11-10T09:33:56.307 CET] [debug] <<< testapi::check_screen(mustmatch='ssh-open', timeout=1)
[2019-11-10T09:33:56.794 CET] [debug] no match: 2.5s, best candidate: installation_overview-ssh-open-20170911 (0.00)
[2019-11-10T09:33:57.801 CET] [debug] no match: 1.5s, best candidate: ssh-open_firewall_disabled-20190221 (0.29)
[2019-11-10T09:33:59.344 CET] [debug] >>> testapi::_handle_found_needle: found ssh-open-20190207, similarity 1.00 @ 401/542
[2019-11-10T09:33:59.344 CET] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/installation/installation_overview.pm:83 called testapi::select_console
[2019-11-10T09:33:59.344 CET] [debug] <<< testapi::select_console(testapi_console='install-shell')
/usr/lib/os-autoinst/consoles/vnc_base.pm:58:{
  'hostname' => 'localhost',
  'port' => 58291,
  'ikvm' => 0
}
[2019-11-10T09:34:00.445 CET] [debug] Connected to Xvnc - PID 30160
icewm PID is 30182
xterm PID is 30184
[2019-11-10T09:34:01.526 CET] [debug] led state 0 0 0 -261
[2019-11-10T09:34:01.537 CET] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/installation/installation_overview.pm:83 called testapi::select_console
[2019-11-10T09:34:01.537 CET] [debug] <<< testapi::select_console(testapi_console='root-ssh')
/usr/lib/os-autoinst/consoles/vnc_base.pm:58:{
  'ikvm' => 0,
  'hostname' => 'localhost',
  'port' => 49909
}
[2019-11-10T09:34:02.658 CET] [debug] Connected to Xvnc - PID 30187
icewm PID is 30207
xterm PID is 30209
[2019-11-10T09:34:03.690 CET] [debug] Could not connect to root@scooter-1.qa.suse.de, Retrying after some seconds...
[2019-11-10T09:34:13.691 CET] [debug] Could not connect to root@scooter-1.qa.suse.de, Retrying after some seconds...
[2019-11-10T09:34:23.692 CET] [debug] Could not connect to root@scooter-1.qa.suse.de, Retrying after some seconds...
[2019-11-10T09:34:33.693 CET] [debug] Could not connect to root@scooter-1.qa.suse.de, Retrying after some seconds...
[2019-11-10T09:34:43.694 CET] [debug] Could not connect to root@scooter-1.qa.suse.de, Retrying after some seconds...
[2019-11-10T09:34:53.699 CET] [debug] Backend process died, backend errors are reported below in the following lines:
Error connecting to <root@scooter-1.qa.suse.de>: Connection refused
[2019-11-10T09:34:53.748 CET] [debug] IPMI: Chassis Power Control: Down/Off
[2019-11-10T09:34:53.748 CET] [debug] flushing frames
[2019-11-10T09:34:53.822 CET] [debug] sending magic and exit
[2019-11-10T09:34:53.822 CET] [debug] received magic close
[2019-11-10T09:34:53.823 CET] [debug] THERE IS NOTHING TO READ 15 4 3
[2019-11-10T09:34:53.823 CET] [debug] killing command server 28319 because test execution ended

Actions

Copy link

Updated by okurz over 5 years ago

I agree that a failed select_console should not fail the job. This is what #59267 is about. This ticket is about the regression that the incomplete which is triggered in a specific module is not notifying about the module which it did in before.

Actions

Copy link

Updated by coolo over 5 years ago

Category changed from Regressions/Crashes to Feature requests
Priority changed from Urgent to Normal

I don't believe that's the case.

Actions

Copy link

Updated by okurz almost 5 years ago

Related to action #45062: Better visualization of incompletes - show module in which incomplete happens added

Actions

Copy link

Updated by okurz almost 5 years ago

#45062 describes that this did work in the past.

Actions

Copy link

Updated by mkittler almost 5 years ago

Whether the test module is marked as failed (and e.g. shown as failed module in the various tables) depends on os-autoinst writing the test module result accordingly. Considering the inconsistent error handling in os-autoinst it is likely not very consistent about marking the test module as failed. So it might work in some cases like in #45062 but not generally.

Note that openQA and the result (e.g. incomplete) has not much influence here. Next & Previous and other lists will even show failed modules for passed jobs (if you manage to get a passed job with failed modules into the system).

Actions

Copy link

#10

Updated by okurz almost 5 years ago

Priority changed from Normal to Low
Target version set to future

Meanwhile we have much better reporting for incompletes with the "reason" that is accessible from the DB, API as well as UI, so this change for incomplete jobs is less important but still valid.

Actions

Copy link

#11

Updated by okurz over 4 years ago

Status changed from New to Resolved
Assignee set to okurz

https://openqa.suse.de/tests/5148360#next_previous shows that we do have support for failed modules in incomplete jobs. So it could be that this only works seldomly or in limited cases, e.g. when first the test module fails and then an incomplete happens in the same module. However I think this is good enough.

Actions

Copy link

#12

Updated by okurz over 4 years ago

Target version changed from future to Ready

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #59273

module result missing for incompleting job

Observation¶

Reproducible¶

Expected result¶

Problem¶

Further details¶

Updated by okurz over 5 years ago

Updated by okurz over 5 years ago

Updated by coolo over 5 years ago

Updated by coolo over 5 years ago

Updated by okurz over 5 years ago

Updated by coolo over 5 years ago

Updated by okurz almost 5 years ago

Updated by okurz almost 5 years ago

Updated by mkittler almost 5 years ago

Updated by okurz almost 5 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago