action #105429
closedcoordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
coordination #98952: [epic] t/full-stack.t sporadically fails "clickElement: element not interactable" and other errors
openQA's fullstack test fails in `shutdown` module
Description
Observation¶
See https://github.com/os-autoinst/openQA/pull/4468#issuecomment-1019201379, the error is:
# [2022-01-22T08:07:01.494359Z] [debug] [pid:488] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2022-01-22T08:07:01.528597Z] [info] [pid:831] Uploading video.ogv
# [2022-01-22T08:07:01.529126Z] [debug] [pid:831] Uploading artefact video.ogv
# [2022-01-22T08:07:01.566264Z] [info] [pid:831] Uploading vars.json
# [2022-01-22T08:07:01.566529Z] [debug] [pid:831] Uploading artefact vars.json
# [2022-01-22T08:07:01.594899Z] [info] [pid:831] Uploading autoinst-log.txt
# [2022-01-22T08:07:01.595178Z] [debug] [pid:831] Uploading artefact autoinst-log.txt
# [2022-01-22T08:07:01.621786Z] [info] [pid:831] Uploading worker-log.txt
# [2022-01-22T08:07:01.622061Z] [debug] [pid:831] Uploading artefact worker-log.txt
Bailout called. Further testing stopped: Job 1 produced the wrong results
# Failed test 'no (unexpected) warnings (via END block)'
# at /usr/lib/perl5/5.26.1/Test/Builder.pm line 135.
# Got the following unexpected warnings:
# 1: Use of uninitialized value in subroutine entry at t/full-stack.t line 213.
The Use of uninitialized value…
is basically because the qcow image hasn't been uploaded. Those are the corresponding failing checks:
not ok 8 - local upload feature used
not ok 9 - image of hdd uploaded
not ok 10 - can stat /tmp/IWUDY9BiMA/full-stack.d/openqa/share/factory/hdd/core-hdd.qcow2
not ok 11 - exported image has correct permissions (420 -> 0644)
And that is simply because the test failed:
not ok 2 - Expected result not found
not ok 3 - job 1 passed
# job info: {
# 'assigned_worker_id' => 1,
# 'blocked_by_id' => undef,
# 'clone_id' => undef,
# 'group_id' => undef,
# 'id' => 1,
# 'name' => 'tinycore-1-flavor-i386-Build1-core@coolone',
# 'priority' => 50,
# 'result' => 'failed',
# 'settings' => {
# 'ARCH' => 'i386',
# 'BUILD' => '1',
# 'DISTRI' => 'tinycore',
# 'FLAVOR' => 'flavor',
# 'INTEGRATION_TESTS' => '1',
# 'ISO' => 'Core-7.2.iso',
# 'MACHINE' => 'coolone',
# 'NAME' => '00000001-tinycore-1-flavor-i386-Build1-core@coolone',
# 'PAUSE_AT' => 'shutdown',
# 'PUBLISH_HDD_1' => 'core-hdd.qcow2',
# 'QEMU' => 'i386',
# 'QEMU_NO_FDC_SET' => '1',
# 'QEMU_NO_KVM' => '1',
# 'QEMU_NO_TABLET' => '1',
# 'TEST' => 'core',
# 'UEFI_PFLASH_VARS' => '/usr/share/qemu/ovmf-x86_64.bin',
# 'VERSION' => '1',
# 'WORKER_CLASS' => 'qemu_i386'
# },
# 'state' => 'done',
# 't_finished' => '2022-01-22T08:07:02',
# 't_started' => '2022-01-22T08:05:09',
# 'test' => 'core'
# }
ok 4 - autoinst log file generated
ok 5 - worker log file generated
ok 6 - autoinst log uploaded
ok 7 - worker log uploaded
It fails in the shutdown
module:
[0m[1;34m[2022-01-22T08:05:26.126716Z] [debug] ||| finished boot tests (runtime: 12 s)
[0m[1;34m[2022-01-22T08:05:26.127797Z] [debug] ||| starting shutdown tests/shutdown.pm
[0m[37m[2022-01-22T08:05:26.128265Z] [debug] isotovideo: pausing test execution of shutdown because we're supposed to pause at this test module
[0m[2022-01-22T08:05:26.128630Z] [debug] tests/shutdown.pm:8 called testapi::type_string
[2022-01-22T08:05:26.128802Z] [debug] <<< testapi::type_string(string="sudo su\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[37m[2022-01-22T08:05:26.129110Z] [debug] isotovideo: paused, so not passing backend_type_string to backend
[0m[37m[2022-01-22T08:05:28.330185Z] [debug] isotovideo: test execution will be resumed
[0m[37m[2022-01-22T08:05:28.330596Z] [debug] isotovideo: resuming, continue passing backend_type_string to backend
[0m[2022-01-22T08:05:28.601320Z] [debug] tests/shutdown.pm:9 called testapi::type_string
[2022-01-22T08:05:28.601518Z] [debug] <<< testapi::type_string(string="poweroff\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-22T08:05:28.910444Z] [debug] tests/shutdown.pm:10 called testapi::assert_shutdown
[2022-01-22T08:05:28.910592Z] [debug] <<< testapi::check_shutdown(timeout=90)
[37m[2022-01-22T08:05:28.911472Z] [debug] QEMU status is not 'shutdown', it is 'running'
[0m[37m[2022-01-22T08:05:29.912649Z] [debug] QEMU status is not 'shutdown', it is 'running'
[0m[37m[2022-01-22T08:05:30.913806Z] [debug] EVENT {"data":{"path":"\/machine\/peripheral-anon\/device[2]\/virtio-backend"},"event":"NIC_RX_FILTER_CHANGED","timestamp":{"microseconds":996204,"seconds":1642838729}}
[0m[37m[2022-01-22T08:05:30.914058Z] [debug] QEMU status is not 'shutdown', it is 'running'
…
[0m[37m[2022-01-22T08:06:58.035515Z] [debug] QEMU status is not 'shutdown', it is 'running'
[0m[37m[2022-01-22T08:06:59.036839Z] [debug] QEMU status is not 'shutdown', it is 'running'
[0m[33m[2022-01-22T08:07:00.097072Z] [info] ::: basetest::runtest: # Test died: Machine didn't shut down! at tinycore/tests/shutdown.pm line 10.
[0m[1;34m[2022-01-22T08:07:00.099824Z] [debug] ||| finished shutdown tests (runtime: 94 s)
[0m[1;34m[2022-01-22T08:07:00.099889Z] [debug] ||| post fail hooks runtime: 0 s
[0m[37m[2022-01-22T08:07:00.100157Z] [debug] stopping overall test execution after a fatal test failure
Likely the SUT isn't aware that it should shutdown because one of the type_string
commands doesn't work as expected. That might have something to do with the fact that we pause/resume the test execution before the shutdown
module but could also be a general typing issue.
Acceptance criteria¶
- AC1: Retry for full stack test is removed again
Suggestion¶
- Fix #104971 first
- Have a look at the code (e.g. code for handling pausing/resuming in os-autoinst) for possible problems (e.g. lack of synchronization)
- Find a way to reproduce the problem in a unit test
- Workaround the issue by combining
type_string
commands in the test'sshutdown
module - Remove the retry again on Makefile