Project

General

Profile

Actions

action #153718

open

coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

coordination #123800: [epic] Provide SUSE QE Tools services running in PRG2 aka. Prg CoLo

coordination #137630: [epic] QE (non-openQA) setup in PRG2

Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldir size:M

Added by okurz 3 months ago. Updated about 21 hours ago.

Status:
In Progress
Priority:
Normal
Assignee:
Target version:
Start date:
2024-01-16
Due date:
2024-05-01 (Due in 4 days)
% Done:

0%

Estimated time:

Description

Acceptance criteria

Suggestions


Related issues 4 (1 open3 closed)

Related to QA - action #139199: Ensure OSD openQA PowerPC machine redcurrant is operational from PRG2 size:MResolvednicksinger2023-06-29

Actions
Related to QA - action #157777: Provide more consistent PowerPC openQA ressources by migrating all novalink instances to hmc size:MBlockedokurz

Actions
Copied from QA - action #153715: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - whaleResolvedokurz2024-01-16

Actions
Copied to QA - action #153721: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - legolasResolvedokurz2024-01-16

Actions
Actions #1

Updated by okurz 3 months ago

  • Copied from action #153715: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - whale added
Actions #2

Updated by okurz 3 months ago

  • Copied to action #153721: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - legolas added
Actions #3

Updated by okurz about 1 month ago

  • Due date set to 2024-04-09
  • Status changed from Blocked to Feedback
  • Target version changed from future to Ready

https://jira.suse.com/browse/ENGINFRA-3744 was set to "Done" but I have no full confirmation that the machine is usable as in before.

@dawei_pang can you confirm that haldir.qe.prg2.suse.org is fully usable? The machine is within the HMC https://powerhmc1.oqa.prg2.suse.org/ and you have access there so you could use that access.

Actions #4

Updated by dawei_pang about 1 month ago

Hello Oliver, haldir is not assigned to my squad.

I am not sure if I can check the system or modify any configuration, thanks!

Actions #5

Updated by okurz about 1 month ago

Asked in
https://suse.slack.com/archives/C02CANHLANP/p1711522254790089

@channel apparently nobody claims the PowerPC machine "haldir" for use so unless there are objections in the next days I will plan to use the machine as openQA worker then.

Actions #7

Updated by okurz 24 days ago

  • Related to action #139199: Ensure OSD openQA PowerPC machine redcurrant is operational from PRG2 size:M added
Actions #8

Updated by okurz 24 days ago

  • Related to action #157777: Provide more consistent PowerPC openQA ressources by migrating all novalink instances to hmc size:M added
Actions #9

Updated by okurz 24 days ago

  • Description updated (diff)
  • Due date deleted (2024-04-09)
  • Status changed from Feedback to New
  • Assignee deleted (okurz)
  • Priority changed from Low to Normal

Let's use the machine as openQA worker like redcurrant

Actions #10

Updated by okurz 23 days ago

  • Subject changed from Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldir to Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldir size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #11

Updated by okurz 21 days ago

  • Status changed from Workable to Feedback
  • Assignee set to okurz

Will need to clarify with acarvajal who mentioned haldir on https://confluence.suse.com/display/qasle/QE-SAP+Power9+Infrastructure

Actions #12

Updated by okurz 18 days ago

  • Due date set to 2024-04-23

https://suse.slack.com/archives/C02CANHLANP/p1712665391961199

@Alvaro Carvajal in https://confluence.suse.com/display/qasle/QE-SAP+Power9+Infrastructure you mentioned haldir but previously when I asked I got the answer that QE-SAP doesn't use haldir. So should we prepare haldir as a generic openQA PowerVM host or do you use haldir within QE-SAP? Context: https://progress.opensuse.org/issues/153718

Actions #13

Updated by okurz 18 days ago

  • Due date deleted (2024-04-23)
  • Status changed from Feedback to Workable
  • Assignee deleted (okurz)

Got confirmation from acarvajal that haldir is free

yes. that confluence page is WIP and the haldir part was taken (copy & paste) from the old document at https://gitlab.suse.de/hsehic/qa-css-docs/-/blob/master/infrastructure/power9-configuration.md. I asked internally and haldir should be free

Actions #14

Updated by okurz 16 days ago

  • Assignee set to nicksinger
Actions #15

Updated by nicksinger 16 days ago

  • Status changed from Workable to In Progress

Recovered the password for the VIOS padmin-user and reset it to the default. https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/4976 created to give the VIOS a basic working network config. Alvaro made me aware of https://sd.suse.com/servicedesk/customer/portal/1/SD-153996. Next I try to setup a working RMC connection to the HMC to be able to configure a virtual network+disk for the LPARs and connect them with each other.

Actions #16

Updated by nicksinger 16 days ago

  • Status changed from In Progress to Blocked

managed to figure out which interface is connected and already statically configured haldir-vios4 which is able to be reached:

workstation pillar/domain ‹add_haldir› » ping 10.145.0.103
PING 10.145.0.103 (10.145.0.103) 56(84) bytes of data.
64 bytes from 10.145.0.103: icmp_seq=1 ttl=253 time=218 ms
^C
--- 10.145.0.103 ping statistics ---
2 packets transmitted, 1 received, 50% packet loss, time 1001ms
rtt min/avg/max/mdev = 217.517/217.517/217.517/0.000 ms

I now hit the same issue as described in https://sd.suse.com/servicedesk/customer/portal/1/SD-153996 and will follow that one. But currently we can't do anything more from our side.

Actions #17

Updated by nicksinger 15 days ago

  • Status changed from Blocked to Workable

SD ticket resolved and the RMC connection works now. I will continue with a network and disk configuration before defining the SUT LPARs.

Actions #18

Updated by nicksinger 11 days ago

  • Status changed from Workable to In Progress
Actions #19

Updated by openqa_review 10 days ago

  • Due date set to 2024-05-01

Setting due date based on mean cycle time of SUSE QE Tools

Actions #20

Updated by nicksinger 9 days ago

LPAR network configured, disks created, LPARs created, disks+network attached to each LPAR and created https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/5009 to give them all an according DHCP+DNS config. After the MR is merged and live we need to validate that the LPARs can reach and boot the PXE-server. If so, add them to our workerconf and let them run production jobs.

Actions #21

Updated by nicksinger 8 days ago

  • Status changed from In Progress to Feedback

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/782 which can already be merged but only tested after the OPS-request is merged as well.

Actions #23

Updated by nicksinger 5 days ago

  • Status changed from Feedback to In Progress
Actions #24

Updated by nicksinger 5 days ago

  • Status changed from In Progress to Feedback

started to validate the instances but realized that the OPS request is still not merged. Asking in #dct-migration
Also for later my command to create validation jobs:

for i in {1..10}; do echo openqa-clone-job --within-instance https://openqa.suse.de/tests/14092616 --skip-chained-deps --skip-download TEST+=-poo153718#${i} BUILD=nsinger_validate_poo153718 _GROUP=0 WORKER_CLASS=hmc_ppc64le_poo153718; done
Actions #26

Updated by nicksinger 5 days ago

  • Status changed from Feedback to Workable

All jobs reached PXE and even loaded kernel+initrd silently fail and fall back into grub. Not sure yet what causes this, have to investigate.

Actions #27

Updated by nicksinger about 21 hours ago

  • Status changed from Workable to In Progress
Actions

Also available in: Atom PDF