Project

General

Profile

action #105429

Updated by okurz about 2 years ago

## ### Observation 
 See https://github.com/os-autoinst/openQA/pull/4468#issuecomment-1019201379, the error is: 

 ``` 
 # [2022-01-22T08:07:01.494359Z] [debug] [pid:488] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status 
 # [2022-01-22T08:07:01.528597Z] [info] [pid:831] Uploading video.ogv 
 # [2022-01-22T08:07:01.529126Z] [debug] [pid:831] Uploading artefact video.ogv 
 # [2022-01-22T08:07:01.566264Z] [info] [pid:831] Uploading vars.json 
 # [2022-01-22T08:07:01.566529Z] [debug] [pid:831] Uploading artefact vars.json 
 # [2022-01-22T08:07:01.594899Z] [info] [pid:831] Uploading autoinst-log.txt 
 # [2022-01-22T08:07:01.595178Z] [debug] [pid:831] Uploading artefact autoinst-log.txt 
 # [2022-01-22T08:07:01.621786Z] [info] [pid:831] Uploading worker-log.txt 
 # [2022-01-22T08:07:01.622061Z] [debug] [pid:831] Uploading artefact worker-log.txt 
 Bailout called.    Further testing stopped:    Job 1 produced the wrong results 

 #     Failed test 'no (unexpected) warnings (via END block)' 
 #     at /usr/lib/perl5/5.26.1/Test/Builder.pm line 135. 
 # Got the following unexpected warnings: 
 #     1: Use of uninitialized value in subroutine entry at t/full-stack.t line 213. 
 ``` 

 The `Use of uninitialized value…` is basically because the qcow image hasn't been uploaded. Those are the corresponding failing checks: 

 ``` 
     not ok 8 - local upload feature used 
     not ok 9 - image of hdd uploaded 
     not ok 10 - can stat /tmp/IWUDY9BiMA/full-stack.d/openqa/share/factory/hdd/core-hdd.qcow2 
     not ok 11 - exported image has correct permissions (420 -> 0644) 
 ``` 

 And that is simply because the test failed: 

 ``` 
     not ok 2 - Expected result not found 
     not ok 3 - job 1 passed 
     # job info: { 
     #     'assigned_worker_id' => 1, 
     #     'blocked_by_id' => undef, 
     #     'clone_id' => undef, 
     #     'group_id' => undef, 
     #     'id' => 1, 
     #     'name' => 'tinycore-1-flavor-i386-Build1-core@coolone', 
     #     'priority' => 50, 
     #     'result' => 'failed', 
     #     'settings' => { 
     #       'ARCH' => 'i386', 
     #       'BUILD' => '1', 
     #       'DISTRI' => 'tinycore', 
     #       'FLAVOR' => 'flavor', 
     #       'INTEGRATION_TESTS' => '1', 
     #       'ISO' => 'Core-7.2.iso', 
     #       'MACHINE' => 'coolone', 
     #       'NAME' => '00000001-tinycore-1-flavor-i386-Build1-core@coolone', 
     #       'PAUSE_AT' => 'shutdown', 
     #       'PUBLISH_HDD_1' => 'core-hdd.qcow2', 
     #       'QEMU' => 'i386', 
     #       'QEMU_NO_FDC_SET' => '1', 
     #       'QEMU_NO_KVM' => '1', 
     #       'QEMU_NO_TABLET' => '1', 
     #       'TEST' => 'core', 
     #       'UEFI_PFLASH_VARS' => '/usr/share/qemu/ovmf-x86_64.bin', 
     #       'VERSION' => '1', 
     #       'WORKER_CLASS' => 'qemu_i386' 
     #     }, 
     #     'state' => 'done', 
     #     't_finished' => '2022-01-22T08:07:02', 
     #     't_started' => '2022-01-22T08:05:09', 
     #     'test' => 'core' 
     # } 
     ok 4 - autoinst log file generated 
     ok 5 - worker log file generated 
     ok 6 - autoinst log uploaded 
     ok 7 - worker log uploaded 
 ``` 

 It fails in the `shutdown` module: 

 ``` 
 [2022-01-22T08:05:26.126716Z] [debug] ||| finished boot tests (runtime: 12 s) 
 [2022-01-22T08:05:26.127797Z] [debug] ||| starting shutdown tests/shutdown.pm 
 [2022-01-22T08:05:26.128265Z] [debug] isotovideo: pausing test execution of shutdown because we're supposed to pause at this test module 
 [2022-01-22T08:05:26.128630Z] [debug] tests/shutdown.pm:8 called testapi::type_string 
 [2022-01-22T08:05:26.128802Z] [debug] <<< testapi::type_string(string="sudo su\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47) 
 [2022-01-22T08:05:26.129110Z] [debug] isotovideo: paused, so not passing backend_type_string to backend 
 [2022-01-22T08:05:28.330185Z] [debug] isotovideo: test execution will be resumed 
 [2022-01-22T08:05:28.330596Z] [debug] isotovideo: resuming, continue passing backend_type_string to backend 
 [2022-01-22T08:05:28.601320Z] [debug] tests/shutdown.pm:9 called testapi::type_string 
 [2022-01-22T08:05:28.601518Z] [debug] <<< testapi::type_string(string="poweroff\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47) 
 [2022-01-22T08:05:28.910444Z] [debug] tests/shutdown.pm:10 called testapi::assert_shutdown 
 [2022-01-22T08:05:28.910592Z] [debug] <<< testapi::check_shutdown(timeout=90) 
 [2022-01-22T08:05:28.911472Z] [debug] QEMU status is not 'shutdown', it is 'running' 
 [2022-01-22T08:05:29.912649Z] [debug] QEMU status is not 'shutdown', it is 'running' 
 [2022-01-22T08:05:30.913806Z] [debug] EVENT {"data":{"path":"\/machine\/peripheral-anon\/device[2]\/virtio-backend"},"event":"NIC_RX_FILTER_CHANGED","timestamp":{"microseconds":996204,"seconds":1642838729}} 
 [2022-01-22T08:05:30.914058Z] [debug] QEMU status is not 'shutdown', it is 'running' 
 … 
 [2022-01-22T08:06:58.035515Z] [debug] QEMU status is not 'shutdown', it is 'running' 
 [2022-01-22T08:06:59.036839Z] [debug] QEMU status is not 'shutdown', it is 'running' 
 [2022-01-22T08:07:00.097072Z] [info] ::: basetest::runtest: # Test died: Machine didn't shut down! at tinycore/tests/shutdown.pm line 10. 

 [2022-01-22T08:07:00.099824Z] [debug] ||| finished shutdown tests (runtime: 94 s) 
 [2022-01-22T08:07:00.099889Z] [debug] ||| post fail hooks runtime: 0 s 
 [2022-01-22T08:07:00.100157Z] [debug] stopping overall test execution after a fatal test failure 
 ``` 

 Likely the SUT isn't aware that it should shutdown because one of the `type_string` commands doesn't work as expected. That *might* have something to do with the fact that we pause/resume the test execution before the `shutdown` module but could also be a general typing issue. 

 ## Acceptance criteria 
 * **AC1:** Retry for full stack test is removed again 

 ## ### Suggestion 
 * Fix #104971 first 
 * Have a look at the code (e.g. code for handling pausing/resuming in os-autoinst) for possible problems (e.g. lack of synchronization) 
 * Find a way to reproduce the problem in a unit test 
 * Workaround the issue by combining `type_string` commands in the test's `shutdown` module 
 * Remove the retry again on Makefile

Back