action #103791
Updated by livdywan almost 3 years ago
## Observation
I have observed some situations when a module fails and openQA runs the next one, the very first command fails or times out.
Example:
In [this job](https://openqa.suse.de/tests/7806676#step/cifs/4), after docker_compose failure, the following modules fail in the beginning.
More occurrences:
https://openqa.suse.de/tests/7806630#step/libvorbis/4
https://openqa.suse.de/tests/7810193#step/verify_default_target/4
Related Slack thread: https://suse.slack.com/archives/C02CANHLANP/p1639048127294800
## Acceptance criteria
- **AC1**: Better information exists about the state of the system after loading snapshots or in case of failures
## Suggestions
* Add a box to the test module that a snapshot was loaded and since the previous module failed it might affect the result e.g. due to I/O or the system clock being askew
* After loading snapshots in os-autoinst use QEMU monitoring commands to find out whether the system is just busy/slow, see https://qemu-project.gitlab.io/qemu/system/monitor.html , e.g. "info status" and check if the system is just very busy or responsive. Other commands okurz recommends "info migrate" as we just load a snapshot before, maybe it's not completely finished? Maybe "info dirty_rate" shows if stuff needs to be handled before the system is properly responsive again?
* The output of those commands could be used in simple debug log lines, so nothing more fancy required
* Try to reproduce with a synthetic setup, could e.g. be part of the os-autoinst full-stack test