Project

General

Profile

Actions

action #51650

closed

[functional][u] test fails in test_results - expecting bootloader to be passed, still running but also no autoinst-log.txt found

Added by dimstar almost 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2019-05-20
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-openqa_bootstrap@64bit fails in
test_results

Test suite description

Maintainer: dheidler. Install openQA using openqa-bootstrap script.

Reproducible

Fails since (at least) Build 20190517

Expected result

Last good: 20190516 (or more recent)

Further details

Always latest result in this scenario: latest

Boot-Loader test seems not to be progressing - the test of the product itself does not show any error though

Actions #1

Updated by SLindoMansilla almost 5 years ago

  • Subject changed from test fails in test_results to [tools] test fails in test_results

I would say this is something for @okurz

Actions #2

Updated by okurz almost 5 years ago

  • Subject changed from [tools] test fails in test_results to [functional][u] test fails in test_results - expecting bootloader to be passed, still running but also no autoinst-log.txt found
  • Status changed from New to Workable
  • Assignee set to dheidler

Hm, I checked the latest result. The test module "test_results" already failed in some cases in before however in most cases it seems to have failed because the "isosize" module was not found to be passed. https://openqa.opensuse.org/tests/938149 shows this as well but the test fails because the bootloader module is still "running", not "passed". The post_fail_hook correctly executes, at least it could gather the system journal with information about the job and the worker, the job state is done but there is no autoinst-log.txt file found.

@dheidler as you are the maintainer of the test modules within os-autoinst-distri-opensuse, WDYT?

Actions #3

Updated by dheidler almost 5 years ago

The test module would have uploaded the system journal via the post fail hook, but the post fail hook was canceled when the wget failed to download the test log from the (nested) openQA webui.
Maybe we should make the wget non-fatal and add a softfail to warn about it. This would allow the post fail hook to continue and gather more logs.

We could make the wget only fatal when executed in post run hook but not in post fail hook.

Actions #4

Updated by okurz almost 5 years ago

Sure, but a soft-fail should always refer to a known issue and I guess the issue at hand that we should investigate is the missing autoinst-log.txt itself, isn't it? I guess in this case we should just record the single step upload failing but not fail the module, e.g. as script_run does.

Actions #5

Updated by dheidler almost 5 years ago

I executed the testrun, connected via vnc and waited for the test module to exit.
Or maybe it just crashed because of me trying to open live mode.
The logfile is not very helpful: http://pastebin.nue.suse.com/28134/src

This is how the nested job dies:

[2019-05-20T10:20:27.390 EDT] [debug] /var/lib/openqa/share/tests/opensuse/tests/installation/welcome.pm:100 called testapi::assert_screen
[2019-05-20T10:20:27.393 EDT] [debug] <<< testapi::assert_screen(mustmatch=[
'inst-welcome-confirm-self-update-server',
'scc-invalid-url',
'inst-welcome',
'linuxrc-dhcp-question'
], timeout=500)
Unexpected end of data 0
[2019-05-20T10:20:30.128 EDT] [debug] backend process exited: 0
[2019-05-20T10:20:30.134 EDT] [debug] sysread failed:
[2019-05-20T10:20:30.181 EDT] [debug] killing command server 4332 because test execution ended
[2019-05-20T10:20:30.182 EDT] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/iu_APdyJYFWCc6PI/broadcast
[2019-05-20T10:20:32.553 EDT] [debug] Driver backend collected unknown process with pid 4337 and exit status: 1
[2019-05-20T10:20:45.377 EDT] [debug] isotovideo: unable to inform websocket clients about stopping command server: Request timeout at /usr/bin/isotovideo line 172.

Actions #7

Updated by dheidler almost 5 years ago

  • Status changed from Workable to In Progress

As I wasn't able to reproduce the fail on o3, the problem only seems to appear when the job is executed during normal test schedule - maybe due to higher load on the worker host.

Actions #8

Updated by dheidler almost 5 years ago

  • Status changed from In Progress to Feedback
Actions #9

Updated by dheidler almost 5 years ago

PR merged - let's see what will happen in the next TW snapshot.

Actions #11

Updated by ggardet_arm almost 5 years ago

It does fail the same way on aarch64.

Actions #12

Updated by dheidler almost 5 years ago

The new fail from today is due to https://build.opensuse.org/request/show/706752 not being in the tumbleweed snapshot, yet.

Actions #13

Updated by okurz almost 5 years ago

what does this have to do with the submission of os-autoinst to Factory?

Actions #14

Updated by dheidler almost 5 years ago

See https://openqa.opensuse.org/tests/947471/file/nested-autoinst-log.txt

Can't locate object method "set_expected_autoinst_failures" via package "Distribution::Opensuse::Tumbleweed" at /var/lib/openqa/share/tests/opensuse/products/opensuse/main.pm line 78.

So the nested openQA is using a new test distribution with an old os-autoinst.

Actions #15

Updated by szarate almost 5 years ago

  • Status changed from Feedback to Resolved

Last test runs are passing, so I guess this is done.

Actions #16

Updated by okurz almost 5 years ago

#51650#note-14 is a nice example of what already happened multiple times on our production instances. I guess we could do better to at least detect if the right os-autoinst version is there in relation to the test distribution.

Actions #17

Updated by okurz almost 5 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: openqa_bootstrap
https://openqa.opensuse.org/tests/973494

Actions #18

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: openqa_bootstrap
https://openqa.opensuse.org/tests/1134714

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions

Also available in: Atom PDF