action #159231
closedcoordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
coordination #123800: [epic] Provide SUSE QE Tools services running in PRG2 aka. Prg CoLo
Bring back worker class "hmc_ppc64le-4disk" on redcurrant or another machine size:M
0%
Description
Motivation¶
Most PowerPC machines are being setup in PRG2 within #132140 and are at least discoverable from HMC. Now we can setup redcurrant as production openQA PowerVM worker in OSD again.
https://suse.slack.com/archives/C02CANHLANP/p1713437618501069
Hello, Is there any plan/ticket to bring back ppc64le hmc worker with 4 disk?
https://openqa.suse.de/tests/overview?state=assigned&state=setup&state=running&state=uploading&state=scheduled&distri=sle&version=15-SP6&build=80.1&groupid=129
Acceptance criteria¶
- AC1: redcurrant openQA instances as referenced in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls as "hmc_ppc64le-4disk-poo139199" are able to pass openQA jobs after the move to PRG2
- AC2: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls does not mention #132140 or this ticket
Suggestions¶
- Read #139199 about the general setup of redcurrant
- See what is needed from old tests: https://openqa.suse.de/tests/overview?arch=&flavor=&machine=ppc64le-hmc-4disk&test=&modules=&module_re=&group_glob=¬_group_glob=&comment=&distri=sle&version=15-SP5&build=102.1&groupid=129#, e.g. https://openqa.suse.de/tests/11163027 with https://openqa.suse.de/tests/11163027/file/vars.json
- Ensure that we have those "4 disks", maybe needs to be configured for redcurrant-1 and redcurrant-2 in powerhmc1.oqa.prg2.suse.org
- Then verify within https://openqa.suse.de/tests/latest?arch=ppc64le&machine=ppc64le-hmc-4disk&test=RAID0 and enable the worker classes again
Updated by nicksinger 8 months ago
- Subject changed from Bring back worker class "hmc_ppc64le-4disk" on redcurrant or another machine to Bring back worker class "hmc_ppc64le-4disk" on redcurrant or another machine size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by ybonatakis 7 months ago
- Status changed from Workable to In Progress
- Assignee set to ybonatakis
Updated by ybonatakis 7 months ago
- Status changed from In Progress to Workable
- Assignee deleted (
ybonatakis)
I tried to run some jobs. the one that run assigned to worker29. but they never boot.
I also cant ssh into redcurrant-1.oqa.prg2.suse.org.
From other ticket my understanding is that I have to add three disks on the machine but I really not sure how.
Updated by livdywan 7 months ago
https://suse.slack.com/archives/C02AJ1E568M/p1716291191446569 conducted investigation
Updated by nicksinger 7 months ago
- Status changed from Workable to Feedback
So I added 3 additional disks via the HMC ( https://10.145.14.33 ) to redcurrant-1 and redcurrant-2 - 20GB for each of these disks. Then tried to clone https://openqa.suse.de/tests/11163027 with a more recent build and SP. I downloaded a vars.json of a newer job and used:
cat vars.json | grep -v "\"BUILD\"" | grep -E "SP5|102\\.1" | tr -d "\"" | tr -d "," | tr ":" "=" | tr -d " " | tr "\n" " " | sed "s/102\\.1/90\\.1/g" | sed "s/-SP5/-SP6/g" | sed "s/http=/http:/g" | sed "s/ftp=/ftp:/g" | sed "s/https=/https:/g" | sed "s/nfs=/nfs:/g" | sed "s/smb=/smb:/g"
which is very hacky but produced updated variables needed to run a job. I also had to update SCC-codes and then appended the output of the above and SCC-codes to my clone-job call:
openqa-clone-job --within-instance https://openqa.suse.de/tests/11163027 YAML_SCHEDULE=schedule/yast/raid/raid0_sle_gpt_prep_boot_pvm.yaml YAML_SCHEDULE_DEFAULT=schedule/yast/sle/flows/default_ppc64le.yaml _GROUP=0 WORKER_CLASS=redcurrant-2 {TEST,BUILD}+=_poo139199 SCC_REGCODE=[…] [output of adjusted vars.json]
which produced: https://openqa.suse.de/tests/14516901 and proofs that the setup works as in before the VIOS reinstallation. Created https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/828 to enable the class in production again.
Updated by okurz 7 months ago
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/828 merged. Technically both ACs are covered but as it was such a long time that production tests had to been disabled for those worker classes please ensure that there are now production tests using this worker class again to resolve.
Updated by nicksinger 7 months ago
- Status changed from Feedback to Resolved
I asked our testing squads to enable tests again and Joaquin followed up: https://suse.slack.com/archives/C02CANHLANP/p1717564405852019?thread_ts=1713437618.501069&cid=C02CANHLANP
There doesn't seem to be a very big interest otherwise but we have some production jobs now again: https://openqa.suse.de/tests/14525632#dependencies