Project

General

Profile

Actions

action #157855

closed

coordination #151816: [epic] Handle openQA fixes and job group setup

[Research:16h] Do the research about if we should change to use 'pvm-hmc' instead of 'spvm'

Added by tinawang123 7 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
2024-03-25
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Related topic: https://suse.slack.com/archives/C02CLB2LB7Z/p1711336690600979
Related ticket: https://progress.opensuse.org/issues/139199
Need check if pvm-hmc works well for our jobs.

Failures that you find related with changing in backend should be discussed with the squad internally, as there are old problems that were never resolved and no squad took responsibility for them in the past, so we should discuss it internally and try to find existing ticket.

Acceptance

AC1: Select a subset of test suite running installation and migration and run them with hmc backend.
AC2: Run them to provide statistics, 10 times each test suite for example.
AC3: Share findings with squad about the rate of failures found.

Actions #1

Updated by JERiveraMoya 7 months ago

  • Subject changed from [Research] Do the research about if we should change to use 'pvm-hmc' instead of 'spvm' to [Research:8h] Do the research about if we should change to use 'pvm-hmc' instead of 'spvm'
  • Description updated (diff)
  • Status changed from New to Workable
  • Parent task set to #151816
Actions #2

Updated by JERiveraMoya 7 months ago

  • Tags set to qe-yam-mar-sprint
Actions #3

Updated by JERiveraMoya 6 months ago

  • Tags changed from qe-yam-mar-sprint to qe-yam-apr-sprint
Actions #4

Updated by zoecao 6 months ago

  • Status changed from Workable to In Progress
  • Assignee set to zoecao
Actions #5

Updated by zoecao 6 months ago

Triggered the jobs with pvm_hmc machine, will check the results later.
https://openqa.suse.de/tests/overview?distri=sle&version=15-SP6&build=73.1&groupid=251

Actions #6

Updated by JERiveraMoya 6 months ago · Edited

zoecao wrote in #note-5:

Triggered the jobs with pvm_hmc machine, will check the results later.
https://openqa.suse.de/tests/overview?distri=sle&version=15-SP6&build=73.1&groupid=251

I think we need much more verification runs of different test suite to consider it.
It would be better that you provide the verification using a loop changing the name of the test suite for each iteration with a post-fixed number, so it is more clear for reviewer to see how many you run it with each test suite without navigating inside and kind of ensure that you did all of them with the same commit.

Actions #7

Updated by zoecao 6 months ago

JERiveraMoya wrote in #note-6:

zoecao wrote in #note-5:

Triggered the jobs with pvm_hmc machine, will check the results later.
https://openqa.suse.de/tests/overview?distri=sle&version=15-SP6&build=73.1&groupid=251

I think we need much more verification runs of different test suite to consider it.
It would be better that you provide the verification using a loop changing the name of the test suite for each iteration with a post-fixed number, so it is more clear for reviewer to see how many you run it with each test suite without navigating inside and kind of ensure that you did all of them with the same commit.

Yes, it's better to run more testsuites numbers then re-run one testsuite for several times. I added some more testsuites and triggered the jobs:
https://openqa.suse.de/tests/overview?distri=sle&version=15-SP6&build=73.1&groupid=251

Actions #8

Updated by JERiveraMoya 6 months ago

zoecao wrote in #note-7:

JERiveraMoya wrote in #note-6:

zoecao wrote in #note-5:

Triggered the jobs with pvm_hmc machine, will check the results later.
https://openqa.suse.de/tests/overview?distri=sle&version=15-SP6&build=73.1&groupid=251

I think we need much more verification runs of different test suite to consider it.
It would be better that you provide the verification using a loop changing the name of the test suite for each iteration with a post-fixed number, so it is more clear for reviewer to see how many you run it with each test suite without navigating inside and kind of ensure that you did all of them with the same commit.

Yes, it's better to run more testsuites numbers then re-run one testsuite for several times. I added some more testsuites and triggered the jobs:
https://openqa.suse.de/tests/overview?distri=sle&version=15-SP6&build=73.1&groupid=251

There is a way to run in a loop and change the test suite name, so we can be sure you run all with the same commit, without visually checking it, something like:
for i in {1..10} ; do openqa-clone-custom-git-refspec https://github.com/<your-user>/os-autoinst-distri-opensuse/tree/<your-branch> https://<openqa-instance>/tests/<job-id> TEST=<test-name>_$i _SKIP_POST_FAIL_HOOKS=1 ; done

Actions #9

Updated by zoecao 6 months ago

This is the results of using 'pvm-hmc':
https://openqa.suse.de/tests/overview?distri=sle&version=15-SP6&build=73.1&groupid=251
In my opine, both 'pvm-hmc' and 'spvm' are stable now.
Some of the failures in migration daily group (using 'spvm') are caused by incorrect settings, I'll submit a MR to fix them, these failures also happened when using 'pvm-hmc', I already correct the settings in my VRs.

Actions #10

Updated by JERiveraMoya 6 months ago

  • Subject changed from [Research:8h] Do the research about if we should change to use 'pvm-hmc' instead of 'spvm' to [Research:16h] Do the research about if we should change to use 'pvm-hmc' instead of 'spvm'

Adding 1 more day to this spike ticket,
It would be interesting to know if this also happens with hmc: https://openqa.suse.de/tests/13959269#step/bootloader_start/22

Actions #11

Updated by zoecao 6 months ago · Edited

JERiveraMoya wrote in #note-10:

Adding 1 more day to this spike ticket,
It would be interesting to know if this also happens with hmc: https://openqa.suse.de/tests/13959269#step/bootloader_start/22

This issue is caused by incorrect setting of testsuite, not related with spvm machine. I submit a MR and verified with hmc machine:
MR: https://gitlab.suse.de/qe-yam/openqa-job-groups/-/merge_requests/143 (please help to review)
VRs on hmc: https://openqa.suse.de/tests/overview?distri=sle&version=15-SP6&build=73.1&groupid=251
The MR is to fix the settings issues, not to switch spvm to pvm-hmc.
And I believe this MR could fix the issues on spvm in daily group, because when I reviewing RC1 milestone build, I met the same errors with https://openqa.suse.de/tests/13959269#step/bootloader_start/22, I had submitted MR and re-triggered the milestone build, it worked fine (milestone is also using spvm), so I think the same fix would work for daily group jobs too.

Actions #12

Updated by zoecao 6 months ago

  • Status changed from In Progress to Resolved

My research result is that both 'pvm-hmc' and 'spvm' are stable now.
The failures in daily job group (with spvm) are caused by setting issues.
And I canceled my MR: https://gitlab.suse.de/qe-yam/openqa-job-groups/-/merge_requests/143 because we will not use TESTSUITES: XXX any more, and Lemon is working on it: https://gitlab.suse.de/qe-yam/openqa-job-groups/-/merge_requests/125. When his VRs pass, then no failures for these ppc64le jobs any more.

I resolve here.

Actions

Also available in: Atom PDF