Project

General

Profile

Actions

action #154624

closed

openQA Project (public) - coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

openQA Project (public) - coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers

Periodically running simple ping-check multi-machine tests on x86_64 covering multiple physical hosts on OSD alerting tools team on failures size:M

Added by okurz 11 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2024-01-30
Due date:
% Done:

0%

Estimated time:

Description

Motivation

In cases like #154552 multi-machine issues (still) happen and while we monitor multi-machine test results there are cases where users notify us about problems that we don't see in our monitoring. Because we now (#138302) have a good simple ping-check multi-machine test scenario created by dheidler we can use that scenario similar to openQA-in-openQA tests running periodically very often and whenever that scenario fails - because it's so simple likely the cause is multi-machine infrastructure related problems we want to know about - then alert the tools team directly, e.g. email to Slack #team-qa-tools or something, using openqa-label-known-issues

Acceptance criteria

  • AC1: simple ping-check multi-machine tests executed on x86_64 on OSD periodically covering multiple physical hosts
  • AC2: The tools team is alerted directly if those tests fail

Suggestions


Related issues 5 (1 open4 closed)

Related to openQA Project (public) - action #154021: [alert] Ratio of not restarted multi-machine tests by resultResolvedmkittler2024-01-222024-02-12

Actions
Related to openQA Project (public) - action #155278: o3 aarch64 multi-machine tests on openqaworker-arm21 and 22 fail to resolve codecs.opensuse.org size:MResolveddheidler2024-02-09

Actions
Copied from openQA Project (public) - action #154552: [ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.deResolvedmkittler2024-01-30

Actions
Copied to openQA Infrastructure (public) - action #155200: Periodically running simple ping-check multi-machine tests on ppc64le covering multiple physical hosts on OSD alerting tools team on failures size:MWorkable2024-01-30

Actions
Copied to openQA Project (public) - action #160628: periodic multi-machine OSD test in https://gitlab.suse.de/openqa/scripts-ci/ does not trigger any jobs size:SResolvedmkittler2024-01-30

Actions
Actions #1

Updated by okurz 11 months ago

  • Copied from action #154552: [ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.de added
Actions #2

Updated by okurz 11 months ago

  • Subject changed from Periodically running simple ping-check multi-machine tests on x86_64+ppc64le covering multiple physical hosts on OSD alerting tools team on failures to Periodically running simple ping-check multi-machine tests on x86_64+ppc64le covering multiple physical hosts on OSD alerting tools team on failures size:M
  • Status changed from New to Workable
Actions #3

Updated by okurz 11 months ago

  • Related to action #154021: [alert] Ratio of not restarted multi-machine tests by result added
Actions #4

Updated by okurz 10 months ago

  • Target version changed from Tools - Next to Ready
Actions #5

Updated by okurz 10 months ago

  • Copied to action #155200: Periodically running simple ping-check multi-machine tests on ppc64le covering multiple physical hosts on OSD alerting tools team on failures size:M added
Actions #6

Updated by okurz 10 months ago

  • Subject changed from Periodically running simple ping-check multi-machine tests on x86_64+ppc64le covering multiple physical hosts on OSD alerting tools team on failures size:M to Periodically running simple ping-check multi-machine tests on x86_64 covering multiple physical hosts on OSD alerting tools team on failures size:M
  • Description updated (diff)
Actions #7

Updated by jbaier_cz 10 months ago

  • Related to action #155278: o3 aarch64 multi-machine tests on openqaworker-arm21 and 22 fail to resolve codecs.opensuse.org size:M added
Actions #8

Updated by jbaier_cz 10 months ago

  • Assignee set to jbaier_cz
Actions #9

Updated by jbaier_cz 10 months ago

  • Status changed from Workable to In Progress
Actions #10

Updated by jbaier_cz 10 months ago

  • Status changed from In Progress to Workable

I created a simple script https://github.com/os-autoinst/scripts/pull/290, which could be executed without any parameters to schedule a multi-machine ping test on o3: https://openqa.opensuse.org/tests/3933553; with some non-default variables, for example openqa_url=openqa.suse.de distri=sle version=15-SP5 flavor=Server-DVD-Updates test_name=ovs-client it could be also used on osd: https://openqa.suse.de/tests/13503761. As this is using openqa-cli schedule --monitor, it should be usable within any other CI/script. My plan is to integrate it into https://gitlab.suse.de/openqa/scripts-ci, where we already have notifications on failed pipeline and where we can easily schedule/pause the execution.

Actions #11

Updated by okurz 10 months ago

Sounds perfect

Actions #12

Updated by jbaier_cz 10 months ago

  • Status changed from Workable to In Progress
Actions #13

Updated by jbaier_cz 10 months ago

Support for using openqa-cli (creating config file) added in https://gitlab.suse.de/openqa/scripts-ci/-/merge_requests/3

Actions #14

Updated by okurz 10 months ago

Actions #15

Updated by jbaier_cz 10 months ago

I created new schedules in https://gitlab.suse.de/openqa/scripts-ci/-/pipeline_schedules; and now, when the MR is merged, it should be able to see first result for osd: https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2290689 (and also o3: https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2290690).

If I get a successful pipeline run, I can enable periodic run for osd and we are done.

Actions #16

Updated by jbaier_cz 10 months ago

  • Status changed from In Progress to Resolved

Schedule is working (for start, there is a test run every hour); in case of failure, we will be notified via e-mail. The concrete worker assignment, if needed, can be adjusted by editing the YAML schedule within the openqa-schedule-mm-ping-test script.

Actions #17

Updated by okurz 10 months ago

you even added for both OSD+O3. Nice!

Actions #18

Updated by okurz 7 months ago

  • Copied to action #160628: periodic multi-machine OSD test in https://gitlab.suse.de/openqa/scripts-ci/ does not trigger any jobs size:S added
Actions

Also available in: Atom PDF