Project

General

Profile

action #103791

Updated by livdywan about 2 years ago

## Observation 

 I have observed some situations when a module fails and openQA runs the next one, the very first command fails or times out. 
 Example: 

 In [this job](https://openqa.suse.de/tests/7806676#step/cifs/4), after docker_compose failure, the following modules fail in the beginning. 

 More occurrences: 
 https://openqa.suse.de/tests/7806630#step/libvorbis/4 
 https://openqa.suse.de/tests/7810193#step/verify_default_target/4 

 Related Slack thread: https://suse.slack.com/archives/C02CANHLANP/p1639048127294800 

 ## Acceptance criteria 
 - **AC1**: Better information exists about the state of the system after loading snapshots or in case of failures 

 ## Suggestions 
 * Add a box to the test module that a snapshot was loaded and since the previous module failed it might affect the result e.g. due to I/O or the system clock being askew 
 * After loading snapshots in os-autoinst use QEMU monitoring commands to find out whether the system is just busy/slow, see https://qemu-project.gitlab.io/qemu/system/monitor.html , e.g. "info status" and check if the system is just very busy or responsive. Other commands okurz recommends "info migrate" as we just load a snapshot before, maybe it's not completely finished? Maybe "info dirty_rate" shows if stuff needs to be handled before the system is properly    responsive again? 
 * The output of those commands could be used in simple debug log lines, so nothing more fancy required 
 * Try to reproduce with a synthetic setup, could e.g. be part of the os-autoinst full-stack test

Back