Project

General

Profile

action #154624

Updated by okurz 3 months ago

## Motivation 
 In cases like #154552 multi-machine issues (still) happen and while we monitor multi-machine test results there are cases where users notify us about problems that we don't see in our monitoring. Because we now (#138302) have a good simple ping-check multi-machine test scenario created by dheidler we can use that scenario similar to openQA-in-openQA tests running periodically very often and whenever that scenario fails - because it's so simple likely the cause is multi-machine infrastructure related problems we want to know about - then alert the tools team directly, e.g. email to Slack #team-qa-tools or something, using openqa-label-known-issues 

 ## Acceptance criteria 
 * **AC1:** simple ping-check multi-machine tests executed on x86_64 on OSD periodically covering multiple physical hosts 
 * **AC2:** same as AC1 but for ppc64le 
 * **AC3:** The tools team is alerted directly if those tests fail 

 ## Suggestions 
 * Read #138302 where dheidler added the simple ping-check for openQA-in-openQA tests 
 * The scenario can be in a new job group or just groupless 
 * Think about how to trigger periodically, possibly gitlab CI pipeline? 
 * Similar to wicked tests 
 * Ensure the alerting, possibly use https://github.com/os-autoinst/scripts/?tab=readme-ov-file#unreviewed-issues

Back