Project

General

Profile

Actions

action #105429

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #98952: [epic] t/full-stack.t sporadically fails "clickElement: element not interactable" and other errors

openQA's fullstack test fails in `shutdown` module

Added by mkittler over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-01-25
Due date:
% Done:

0%

Estimated time:

Description

Observation

See https://github.com/os-autoinst/openQA/pull/4468#issuecomment-1019201379, the error is:

# [2022-01-22T08:07:01.494359Z] [debug] [pid:488] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2022-01-22T08:07:01.528597Z] [info] [pid:831] Uploading video.ogv
# [2022-01-22T08:07:01.529126Z] [debug] [pid:831] Uploading artefact video.ogv
# [2022-01-22T08:07:01.566264Z] [info] [pid:831] Uploading vars.json
# [2022-01-22T08:07:01.566529Z] [debug] [pid:831] Uploading artefact vars.json
# [2022-01-22T08:07:01.594899Z] [info] [pid:831] Uploading autoinst-log.txt
# [2022-01-22T08:07:01.595178Z] [debug] [pid:831] Uploading artefact autoinst-log.txt
# [2022-01-22T08:07:01.621786Z] [info] [pid:831] Uploading worker-log.txt
# [2022-01-22T08:07:01.622061Z] [debug] [pid:831] Uploading artefact worker-log.txt
Bailout called.  Further testing stopped:  Job 1 produced the wrong results

#   Failed test 'no (unexpected) warnings (via END block)'
#   at /usr/lib/perl5/5.26.1/Test/Builder.pm line 135.
# Got the following unexpected warnings:
#   1: Use of uninitialized value in subroutine entry at t/full-stack.t line 213.

The Use of uninitialized value… is basically because the qcow image hasn't been uploaded. Those are the corresponding failing checks:

    not ok 8 - local upload feature used
    not ok 9 - image of hdd uploaded
    not ok 10 - can stat /tmp/IWUDY9BiMA/full-stack.d/openqa/share/factory/hdd/core-hdd.qcow2
    not ok 11 - exported image has correct permissions (420 -> 0644)

And that is simply because the test failed:

    not ok 2 - Expected result not found
    not ok 3 - job 1 passed
    # job info: {
    #   'assigned_worker_id' => 1,
    #   'blocked_by_id' => undef,
    #   'clone_id' => undef,
    #   'group_id' => undef,
    #   'id' => 1,
    #   'name' => 'tinycore-1-flavor-i386-Build1-core@coolone',
    #   'priority' => 50,
    #   'result' => 'failed',
    #   'settings' => {
    #     'ARCH' => 'i386',
    #     'BUILD' => '1',
    #     'DISTRI' => 'tinycore',
    #     'FLAVOR' => 'flavor',
    #     'INTEGRATION_TESTS' => '1',
    #     'ISO' => 'Core-7.2.iso',
    #     'MACHINE' => 'coolone',
    #     'NAME' => '00000001-tinycore-1-flavor-i386-Build1-core@coolone',
    #     'PAUSE_AT' => 'shutdown',
    #     'PUBLISH_HDD_1' => 'core-hdd.qcow2',
    #     'QEMU' => 'i386',
    #     'QEMU_NO_FDC_SET' => '1',
    #     'QEMU_NO_KVM' => '1',
    #     'QEMU_NO_TABLET' => '1',
    #     'TEST' => 'core',
    #     'UEFI_PFLASH_VARS' => '/usr/share/qemu/ovmf-x86_64.bin',
    #     'VERSION' => '1',
    #     'WORKER_CLASS' => 'qemu_i386'
    #   },
    #   'state' => 'done',
    #   't_finished' => '2022-01-22T08:07:02',
    #   't_started' => '2022-01-22T08:05:09',
    #   'test' => 'core'
    # }
    ok 4 - autoinst log file generated
    ok 5 - worker log file generated
    ok 6 - autoinst log uploaded
    ok 7 - worker log uploaded

It fails in the shutdown module:

[2022-01-22T08:05:26.126716Z] [debug] ||| finished boot tests (runtime: 12 s)
[2022-01-22T08:05:26.127797Z] [debug] ||| starting shutdown tests/shutdown.pm
[2022-01-22T08:05:26.128265Z] [debug] isotovideo: pausing test execution of shutdown because we're supposed to pause at this test module
[2022-01-22T08:05:26.128630Z] [debug] tests/shutdown.pm:8 called testapi::type_string
[2022-01-22T08:05:26.128802Z] [debug] <<< testapi::type_string(string="sudo su\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-22T08:05:26.129110Z] [debug] isotovideo: paused, so not passing backend_type_string to backend
[2022-01-22T08:05:28.330185Z] [debug] isotovideo: test execution will be resumed
[2022-01-22T08:05:28.330596Z] [debug] isotovideo: resuming, continue passing backend_type_string to backend
[2022-01-22T08:05:28.601320Z] [debug] tests/shutdown.pm:9 called testapi::type_string
[2022-01-22T08:05:28.601518Z] [debug] <<< testapi::type_string(string="poweroff\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-22T08:05:28.910444Z] [debug] tests/shutdown.pm:10 called testapi::assert_shutdown
[2022-01-22T08:05:28.910592Z] [debug] <<< testapi::check_shutdown(timeout=90)
[2022-01-22T08:05:28.911472Z] [debug] QEMU status is not 'shutdown', it is 'running'
[2022-01-22T08:05:29.912649Z] [debug] QEMU status is not 'shutdown', it is 'running'
[2022-01-22T08:05:30.913806Z] [debug] EVENT {"data":{"path":"\/machine\/peripheral-anon\/device[2]\/virtio-backend"},"event":"NIC_RX_FILTER_CHANGED","timestamp":{"microseconds":996204,"seconds":1642838729}}
[2022-01-22T08:05:30.914058Z] [debug] QEMU status is not 'shutdown', it is 'running'
…
[2022-01-22T08:06:58.035515Z] [debug] QEMU status is not 'shutdown', it is 'running'
[2022-01-22T08:06:59.036839Z] [debug] QEMU status is not 'shutdown', it is 'running'
[2022-01-22T08:07:00.097072Z] [info] ::: basetest::runtest: # Test died: Machine didn't shut down! at tinycore/tests/shutdown.pm line 10.

[2022-01-22T08:07:00.099824Z] [debug] ||| finished shutdown tests (runtime: 94 s)
[2022-01-22T08:07:00.099889Z] [debug] ||| post fail hooks runtime: 0 s
[2022-01-22T08:07:00.100157Z] [debug] stopping overall test execution after a fatal test failure

Likely the SUT isn't aware that it should shutdown because one of the type_string commands doesn't work as expected. That might have something to do with the fact that we pause/resume the test execution before the shutdown module but could also be a general typing issue.

Acceptance criteria

  • AC1: Retry for full stack test is removed again

Suggestion

  • Fix #104971 first
  • Have a look at the code (e.g. code for handling pausing/resuming in os-autoinst) for possible problems (e.g. lack of synchronization)
  • Find a way to reproduce the problem in a unit test
  • Workaround the issue by combining type_string commands in the test's shutdown module
  • Remove the retry again on Makefile

Related issues 3 (1 open2 closed)

Related to openQA Tests - action #105506: [sporadic][tools] openQA-in-openQA test sporadically fails in shutdownResolvedosukup2022-01-262022-02-18

Actions
Related to openQA Project - action #105515: Optionally keep pool dir content for debugging os-autoinst + openQA full stack testsNew2022-01-26

Actions
Related to openQA Project - action #104971: [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:MResolvedmkittler2022-01-172022-02-09

Actions
Actions

Also available in: Atom PDF