action #154624
closed
openQA Project (public) - coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens
openQA Project (public) - coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers
Periodically running simple ping-check multi-machine tests on x86_64 covering multiple physical hosts on OSD alerting tools team on failures size:M
Added by okurz 11 months ago.
Updated 10 months ago.
Description
Motivation¶
In cases like #154552 multi-machine issues (still) happen and while we monitor multi-machine test results there are cases where users notify us about problems that we don't see in our monitoring. Because we now (#138302) have a good simple ping-check multi-machine test scenario created by dheidler we can use that scenario similar to openQA-in-openQA tests running periodically very often and whenever that scenario fails - because it's so simple likely the cause is multi-machine infrastructure related problems we want to know about - then alert the tools team directly, e.g. email to Slack #team-qa-tools or something, using openqa-label-known-issues
Acceptance criteria¶
- AC1: simple ping-check multi-machine tests executed on x86_64 on OSD periodically covering multiple physical hosts
- AC2: The tools team is alerted directly if those tests fail
Suggestions¶
- Copied from action #154552: [ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.de added
- Subject changed from Periodically running simple ping-check multi-machine tests on x86_64+ppc64le covering multiple physical hosts on OSD alerting tools team on failures to Periodically running simple ping-check multi-machine tests on x86_64+ppc64le covering multiple physical hosts on OSD alerting tools team on failures size:M
- Status changed from New to Workable
- Related to action #154021: [alert] Ratio of not restarted multi-machine tests by result added
- Target version changed from Tools - Next to Ready
- Copied to action #155200: Periodically running simple ping-check multi-machine tests on ppc64le covering multiple physical hosts on OSD alerting tools team on failures size:M added
- Subject changed from Periodically running simple ping-check multi-machine tests on x86_64+ppc64le covering multiple physical hosts on OSD alerting tools team on failures size:M to Periodically running simple ping-check multi-machine tests on x86_64 covering multiple physical hosts on OSD alerting tools team on failures size:M
- Description updated (diff)
- Related to action #155278: o3 aarch64 multi-machine tests on openqaworker-arm21 and 22 fail to resolve codecs.opensuse.org size:M added
- Assignee set to jbaier_cz
- Status changed from Workable to In Progress
- Status changed from In Progress to Workable
- Status changed from Workable to In Progress
- Status changed from In Progress to Resolved
Schedule is working (for start, there is a test run every hour); in case of failure, we will be notified via e-mail. The concrete worker assignment, if needed, can be adjusted by editing the YAML schedule within the openqa-schedule-mm-ping-test script.
you even added for both OSD+O3. Nice!
- Copied to action #160628: periodic multi-machine OSD test in https://gitlab.suse.de/openqa/scripts-ci/ does not trigger any jobs size:S added
Also available in: Atom
PDF