action #153718: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldir size:M - QA - openSUSE Project Management Tool

Actions

Copy link

action #153718

open

coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

coordination #123800: [epic] Provide SUSE QE Tools services running in PRG2 aka. Prg CoLo

coordination #137630: [epic] QE (non-openQA) setup in PRG2

Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldir size:M

Added by okurz 3 months ago. Updated about 21 hours ago.

Status:

In Progress

Priority:

Normal

Assignee:

nicksinger

Target version:

openQA Project - Ready

Start date:

2024-01-16

Due date:

2024-05-01 (Due in 4 days)

% Done:

Estimated time:

Tags:

infra, maxtorhof, SRV2, nue1, prg2, prg2e

Description

Acceptance criteria¶

AC1: haldir is usable from PRG2
AC2: https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=15308 is up-to-date

Suggestions¶

DONE Follow https://jira.suse.com/browse/ENGINFRA-3744
Ensure machine can be reached
DONE Ensure machine is used as in before migration -> Apparently nobody claims needing this machine
Let's use the machine as openQA worker like redcurrant using the pvm_hmc backend
Add network configuration, e.g. based on #139199-22
Add partitions in https://powerhmc1.oqa.prg2.suse.org/
Add according DHCP/DNS entries in https://gitlab.suse.de/OPS-Service/salt/
Add according config to https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls
Verify
Update https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=15308

Related issues 4 (1 open — 3 closed)

Actions

Copy link

Updated by okurz 3 months ago

Copied from action #153715: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - whale added

Actions

Copy link

Updated by okurz 3 months ago

Copied to action #153721: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - legolas added

Actions

Copy link

Updated by okurz about 1 month ago

Due date set to 2024-04-09
Status changed from Blocked to Feedback
Target version changed from future to Ready

https://jira.suse.com/browse/ENGINFRA-3744 was set to "Done" but I have no full confirmation that the machine is usable as in before.

@dawei_pang can you confirm that haldir.qe.prg2.suse.org is fully usable? The machine is within the HMC https://powerhmc1.oqa.prg2.suse.org/ and you have access there so you could use that access.

Actions

Copy link

Updated by dawei_pang about 1 month ago

Hello Oliver, haldir is not assigned to my squad.

I am not sure if I can check the system or modify any configuration, thanks!

Actions

Copy link

Updated by okurz about 1 month ago

Asked in
https://suse.slack.com/archives/C02CANHLANP/p1711522254790089

@channel apparently nobody claims the PowerPC machine "haldir" for use so unless there are objections in the next days I will plan to use the machine as openQA worker then.

Actions

Copy link

Updated by okurz 24 days ago

Related to action #139199: Ensure OSD openQA PowerPC machine redcurrant is operational from PRG2 size:M added

Actions

Copy link

Updated by okurz 24 days ago

Related to action #157777: Provide more consistent PowerPC openQA ressources by migrating all novalink instances to hmc size:M added

Actions

Copy link

Updated by okurz 24 days ago

Description updated (diff)
Due date deleted (~~2024-04-09~~)
Status changed from Feedback to New
Assignee deleted (~~okurz~~)
Priority changed from Low to Normal

Let's use the machine as openQA worker like redcurrant

Actions

Copy link

#10

Updated by okurz 23 days ago

Subject changed from Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldir to Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldir size:M
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

#11

Updated by okurz 21 days ago

Status changed from Workable to Feedback
Assignee set to okurz

Will need to clarify with acarvajal who mentioned haldir on https://confluence.suse.com/display/qasle/QE-SAP+Power9+Infrastructure

Actions

Copy link

#12

Updated by okurz 18 days ago

Due date set to 2024-04-23

https://suse.slack.com/archives/C02CANHLANP/p1712665391961199

@Alvaro Carvajal in https://confluence.suse.com/display/qasle/QE-SAP+Power9+Infrastructure you mentioned haldir but previously when I asked I got the answer that QE-SAP doesn't use haldir. So should we prepare haldir as a generic openQA PowerVM host or do you use haldir within QE-SAP? Context: https://progress.opensuse.org/issues/153718

Actions

Copy link

#13

Updated by okurz 18 days ago

Due date deleted (~~2024-04-23~~)
Status changed from Feedback to Workable
Assignee deleted (~~okurz~~)

Got confirmation from acarvajal that haldir is free

yes. that confluence page is WIP and the haldir part was taken (copy & paste) from the old document at https://gitlab.suse.de/hsehic/qa-css-docs/-/blob/master/infrastructure/power9-configuration.md. I asked internally and haldir should be free

Actions

Copy link

#14

Updated by okurz 16 days ago

Assignee set to nicksinger

Actions

Copy link

#15

Updated by nicksinger 16 days ago

Status changed from Workable to In Progress

Recovered the password for the VIOS padmin-user and reset it to the default. https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/4976 created to give the VIOS a basic working network config. Alvaro made me aware of https://sd.suse.com/servicedesk/customer/portal/1/SD-153996. Next I try to setup a working RMC connection to the HMC to be able to configure a virtual network+disk for the LPARs and connect them with each other.

Actions

Copy link

#16

Updated by nicksinger 16 days ago

Status changed from In Progress to Blocked

managed to figure out which interface is connected and already statically configured haldir-vios4 which is able to be reached:

workstation pillar/domain ‹add_haldir› » ping 10.145.0.103
PING 10.145.0.103 (10.145.0.103) 56(84) bytes of data.
64 bytes from 10.145.0.103: icmp_seq=1 ttl=253 time=218 ms
^C
--- 10.145.0.103 ping statistics ---
2 packets transmitted, 1 received, 50% packet loss, time 1001ms
rtt min/avg/max/mdev = 217.517/217.517/217.517/0.000 ms

I now hit the same issue as described in https://sd.suse.com/servicedesk/customer/portal/1/SD-153996 and will follow that one. But currently we can't do anything more from our side.

Actions

Copy link

#17

Updated by nicksinger 15 days ago

Status changed from Blocked to Workable

SD ticket resolved and the RMC connection works now. I will continue with a network and disk configuration before defining the SUT LPARs.

Actions

Copy link

#18

Updated by nicksinger 11 days ago

Status changed from Workable to In Progress

Actions

Copy link

#19

Updated by openqa_review 10 days ago

Due date set to 2024-05-01

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

#20

Updated by nicksinger 9 days ago

LPAR network configured, disks created, LPARs created, disks+network attached to each LPAR and created https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/5009 to give them all an according DHCP+DNS config. After the MR is merged and live we need to validate that the LPARs can reach and boot the PXE-server. If so, add them to our workerconf and let them run production jobs.

Actions

Copy link

#21

Updated by nicksinger 8 days ago

Status changed from In Progress to Feedback

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/782 which can already be merged but only tested after the OPS-request is merged as well.

Actions

Copy link

#22

Updated by okurz 6 days ago

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/782 merged

Actions

Copy link

#23

Updated by nicksinger 5 days ago

Status changed from Feedback to In Progress

Actions

Copy link

#24

Updated by nicksinger 5 days ago

Status changed from In Progress to Feedback

started to validate the instances but realized that the OPS request is still not merged. Asking in #dct-migration
Also for later my command to create validation jobs:

for i in {1..10}; do echo openqa-clone-job --within-instance https://openqa.suse.de/tests/14092616 --skip-chained-deps --skip-download TEST+=-poo153718#${i} BUILD=nsinger_validate_poo153718 _GROUP=0 WORKER_CLASS=hmc_ppc64le_poo153718; done

Actions

Copy link

#25

Updated by nicksinger 5 days ago

MR merged, validation jobs running in https://openqa.suse.de/tests/overview?build=nsinger_validate_poo153718 and production MR prepared in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/790

Actions

Copy link

#26

Updated by nicksinger 5 days ago

Status changed from Feedback to Workable

All jobs reached PXE and even loaded kernel+initrd silently fail and fall back into grub. Not sure yet what causes this, have to investigate.

Actions

Copy link

#27

Updated by nicksinger about 21 hours ago

Status changed from Workable to In Progress

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA

Tags

Custom queries

action #153718

Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldir size:M

Acceptance criteria¶

Suggestions¶

Updated by okurz 3 months ago

Updated by okurz 3 months ago

Updated by okurz about 1 month ago

Updated by dawei_pang about 1 month ago

Updated by okurz about 1 month ago

Updated by okurz 24 days ago

Updated by okurz 24 days ago

Updated by okurz 24 days ago

Updated by okurz 23 days ago

Updated by okurz 21 days ago

Updated by okurz 18 days ago

Updated by okurz 18 days ago

Updated by okurz 16 days ago

Updated by nicksinger 16 days ago

Updated by nicksinger 16 days ago

Updated by nicksinger 15 days ago

Updated by nicksinger 11 days ago

Updated by openqa_review 10 days ago

Updated by nicksinger 9 days ago

Updated by nicksinger 8 days ago

Updated by okurz 6 days ago

Updated by nicksinger 5 days ago

Updated by nicksinger 5 days ago

Updated by nicksinger 5 days ago

Updated by nicksinger 5 days ago

Updated by nicksinger about 21 hours ago