Project

General

Profile

Actions

action #103791

closed

After module failure, the console is broken size:M

Added by jlausuch over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-12-09
Due date:
% Done:

0%

Estimated time:

Description

Observation

I have observed some situations when a module fails and openQA runs the next one, the very first command fails or times out.
Example:

In this job, after docker_compose failure, the following modules fail in the beginning.

More occurrences:
https://openqa.suse.de/tests/7806630#step/libvorbis/4
https://openqa.suse.de/tests/7810193#step/verify_default_target/4

Related Slack thread: https://suse.slack.com/archives/C02CANHLANP/p1639048127294800

Acceptance criteria

  • AC1: Better information exists about the state of the system after loading snapshots or in case of failures

Suggestions

  • Add a box to the test module that a snapshot was loaded and since the previous module failed it might affect the result e.g. due to I/O or the system clock being askew
  • After loading snapshots in os-autoinst use QEMU monitoring commands to find out whether the system is just busy/slow, see https://qemu-project.gitlab.io/qemu/system/monitor.html , e.g. "info status" and check if the system is just very busy or responsive. Other commands okurz recommends "info migrate" as we just load a snapshot before, maybe it's not completely finished? Maybe "info dirty_rate" shows if stuff needs to be handled before the system is properly responsive again?
  • The output of those commands could be used in simple debug log lines, so nothing more fancy required
  • Try to reproduce with a synthetic setup, could e.g. be part of the os-autoinst full-stack test

Related issues 1 (0 open1 closed)

Related to qe-yam - action #101295: [timebox: 8h][sporadic] test fails in verify_default_targetRejected2021-10-21

Actions
Actions

Also available in: Atom PDF