Project

General

Profile

Actions

action #106912

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #98952: [epic] t/full-stack.t sporadically fails "clickElement: element not interactable" and other errors

Fullstack test can still fail due to `shutdown` module size:M

Added by mkittler almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-02-16
Due date:
% Done:

0%

Estimated time:

Description

Observation

This is is a shortened version of the relevant log:

    # [2022-02-15T12:30:06.683729Z] [debug] ||| starting boot tests/boot.pm
    # [2022-02-15T12:30:06.685985Z] [debug] tests/boot.pm:9 called testapi::assert_screen
    # [2022-02-15T12:30:06.686361Z] [debug] <<< testapi::assert_screen(mustmatch="core", timeout=15, no_wait=1)
    # [2022-02-15T12:30:06.700419Z] [debug] no match: 14.9s, best candidate: boot-core (0.00)
…
    # [2022-02-15T12:30:07.125518Z] [debug] no change: 14.5s
    # [2022-02-15T12:30:07.134823Z] [debug] no match: 14.5s, best candidate: boot-core (0.00)
    # [2022-02-15T12:30:07.331434Z] [debug] >>> testapi::_handle_found_needle: found boot-core, similarity 0.97 @ 1/74
    # [2022-02-15T12:30:07.332261Z] [debug] tests/boot.pm:10 called testapi::send_key
    # [2022-02-15T12:30:07.332397Z] [debug] <<< testapi::send_key(key="ret", wait_screen_change=0, do_wait=0)
    # [2022-02-15T12:30:07.602261Z] [debug] tests/boot.pm:14 called testapi::assert_screen
    # [2022-02-15T12:30:07.602421Z] [debug] <<< testapi::assert_screen(mustmatch="on_prompt", timeout=90)
    # [2022-02-15T12:30:08.241080Z] [debug] no match: 89.3s, best candidate: boot-on_prompt (0.00)
…
    # [2022-02-15T12:30:21.249634Z] [debug] no match: 76.3s, best candidate: boot-on_prompt (0.00)
    # [2022-02-15T12:30:22.370453Z] [debug] >>> testapi::_handle_found_needle: found boot-on_prompt, similarity 1.00 @ 2/64
    # [2022-02-15T12:30:22.374020Z] [debug] ||| finished boot tests (runtime: 16 s)
    # [2022-02-15T12:30:22.375138Z] [debug] ||| starting assert_screen tests/assert_screen.pm
    # [2022-02-15T12:30:22.375777Z] [debug] tests/assert_screen.pm:9 called testapi::assert_screen
    # [2022-02-15T12:30:22.375889Z] [debug] <<< testapi::assert_screen(mustmatch="on_prompt", timeout=60)
    # [2022-02-15T12:30:23.323411Z] [debug] >>> testapi::_handle_found_needle: found boot-on_prompt, similarity 1.00 @ 2/64
    # [2022-02-15T12:30:23.323585Z] [debug] tests/assert_screen.pm:10 called testapi::assert_screen
    # [2022-02-15T12:30:23.323700Z] [debug] <<< testapi::assert_screen(mustmatch="on_prompt", timeout=30, no_wait=1)
    # [2022-02-15T12:30:23.397251Z] [debug] >>> testapi::_handle_found_needle: found boot-on_prompt, similarity 1.00 @ 2/64
    # [2022-02-15T12:30:23.398459Z] [debug] ||| finished assert_screen tests (runtime: 1 s)
    # [2022-02-15T12:30:23.399556Z] [debug] ||| starting shutdown tests/shutdown.pm
    # [2022-02-15T12:30:23.400299Z] [debug] tests/shutdown.pm:8 called testapi::type_string
    # [2022-02-15T12:30:23.400468Z] [debug] <<< testapi::type_string(string="sudo su\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
    # [2022-02-15T12:30:23.671645Z] [debug] tests/shutdown.pm:9 called testapi::type_string
    # [2022-02-15T12:30:23.671846Z] [debug] <<< testapi::type_string(string="poweroff\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
    # [2022-02-15T12:30:23.976613Z] [debug] tests/shutdown.pm:10 called testapi::assert_shutdown
    # [2022-02-15T12:30:23.976768Z] [debug] <<< testapi::check_shutdown(timeout=90)
    # [2022-02-15T12:30:23.977601Z] [debug] QEMU status is not 'shutdown', it is 'running'
    # [2022-02-15T12:30:24.978574Z] [debug] EVENT {"data":{"path":"\/machine\/peripheral-anon\/device[2]\/virtio-backend"},"event":"NIC_RX_FILTER_CHANGED","timestamp":{"microseconds":798568,"seconds":1644928224}}
    # [2022-02-15T12:30:24.978769Z] [debug] QEMU status is not 'shutdown', it is 'running'
…
    # [2022-02-15T12:31:54.110983Z] [debug] QEMU status is not 'shutdown', it is 'running'
    # [2022-02-15T12:31:55.174986Z] [info] ::: basetest::runtest: # Test died: Machine didn't shut down! at tinycore/tests/shutdown.pm line 10.
    #   
    # [2022-02-15T12:31:55.177594Z] [debug] ||| finished shutdown tests (runtime: 92 s)
    # [2022-02-15T12:31:55.177662Z] [debug] ||| post fail hooks runtime: 0 s

The test run was conducted as "job 7" as part of openQA's fullstack test (with caching enabled). I suppose it should also be reproducible in other "fullstack scenarios" where we expect the test to pass. Maybe os-autoinst's own fullstack test is not affected because it runs a few more test modules before shutdown. The change https://github.com/os-autoinst/os-autoinst/commit/a7103b16 was supposed to fix that and it definitely improved the situation. However, it seems in very rare cases¹ the test can still fail. (The fix was definitely used here as one can see in the logs that the assert_screen module is successfully executed.)

¹ I ran openQA's fullstack test about 75 times in CircleCI until I hit an occurrence.

Acceptance criteria

  • AC1: openQA fullstack test runs 100 times without failing

Suggestions

  • Not Ondrej's fault
  • Make it fail on purpose... but we don't know the exact cause, only the symptom
  • Reveal the video from the pool directory after reproducing locally

Out of scope

  • Reveal the video from the pool directory in CI

Files

fullstack-shutdown-again.txt (42.5 KB) fullstack-shutdown-again.txt mkittler, 2022-02-16 11:59
full-stack-shutdown-again-detailed-log.txt (222 KB) full-stack-shutdown-again-detailed-log.txt mkittler, 2022-02-16 11:59
fullstack-shutdown-problem-video.ogv (128 KB) fullstack-shutdown-problem-video.ogv mkittler, 2022-02-22 11:58
mpv-shot0011.jpg (41.6 KB) mpv-shot0011.jpg mkittler, 2022-02-22 12:11
mpv-shot0013.jpg (53.7 KB) mpv-shot0013.jpg mkittler, 2022-02-22 12:11
mpv-shot0015.jpg (35.1 KB) mpv-shot0015.jpg mkittler, 2022-02-22 12:11
mpv-shot0019.jpg (72.4 KB) mpv-shot0019.jpg mkittler, 2022-02-22 12:11
mpv-shot0020.jpg (44.5 KB) mpv-shot0020.jpg mkittler, 2022-02-22 12:11
mpv-shot0021.jpg (21.8 KB) mpv-shot0021.jpg mkittler, 2022-02-22 12:11
mpv-shot0022.jpg (31 KB) mpv-shot0022.jpg mkittler, 2022-02-22 12:11
mpv-shot0023.jpg (31.6 KB) mpv-shot0023.jpg mkittler, 2022-02-22 12:11
mpv-shot0024.jpg (35.1 KB) mpv-shot0024.jpg mkittler, 2022-02-22 12:11

Related issues 1 (0 open1 closed)

Related to openQA Project - action #107002: Expose fullstack test video from pool directory in CI size:MResolvedlivdywan2022-02-17

Actions
Actions

Also available in: Atom PDF