action #153733: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - soapberry size:S - QA - openSUSE Project Management Tool

Actions

Copy link

action #153733

closed

coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

coordination #123800: [epic] Provide SUSE QE Tools services running in PRG2 aka. Prg CoLo

coordination #137630: [epic] QE (non-openQA) setup in PRG2

Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - soapberry size:S

Added by okurz 6 months ago. Updated about 1 month ago.

Status:

Resolved

Priority:

Low

Assignee:

okurz

Target version:

openQA Project - Ready

Start date:

2024-01-16

Due date:

% Done:

Estimated time:

Tags:

infra, maxtorhof, SRV2, nue1, prg2, prg2e

Description

Acceptance criteria¶

AC1: soapberry is usable from PRG2

Suggestions¶

Follow https://jira.suse.com/browse/ENGINFRA-3748
Ensure machine can be reached
Ensure machine is used as in before migration

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by okurz 6 months ago

Copied from action #153730: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - huckleberry added

Actions

Copy link

Updated by okurz 4 months ago

Due date set to 2024-04-09
Status changed from Blocked to Feedback
Target version changed from future to Ready

https://jira.suse.com/browse/ENGINFRA-3748 was set to "Done" but I am still missing final confirmation of being able to use the machine, comment https://jira.suse.com/browse/ENGINFRA-3748?focusedId=1334763&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1334763 not yet answered

@acarvajal according to https://confluence.suse.com/dosearchsite.action?cql=siteSearch ~ "soapberry" I suspect QE SAP is owner of the machine soapberry. Can you confirm the machine is fully usable by you?

Actions

Copy link

Updated by acarvajal 4 months ago

Confirming. soapberry, huckleberry, blackcurrant and legolas are QE-SAP and Hana Perf machines.

See https://gitlab.suse.de/hsehic/qa-css-docs/-/blob/master/infrastructure/power9-configuration.md?ref_type=heads for a list of the Power 9 machines that used to be assigned to the old QA-CSS (former QE-SAP) team.

I checked soapberry & huckleberry yesterday and both are reachable from the HMC and LPARs are running. Currently all resources are assigned to a huge LPAR, but I want to change this to cover for the missing nessberry.

I did not check yet VIOS, networking or installation in either system.

Is there a way to connect to the HMC via SSH? Or this is only possible from a host in oqa.prg2.suse.org?

Actions

Copy link

Updated by okurz 4 months ago

acarvajal wrote in #note-4:

Confirming. soapberry, huckleberry, blackcurrant and legolas are QE-SAP and Hana Perf machines.

See https://gitlab.suse.de/hsehic/qa-css-docs/-/blob/master/infrastructure/power9-configuration.md?ref_type=heads for a list of the Power 9 machines that used to be assigned to the old QA-CSS (former QE-SAP) team.

ok, I updated all the racktable entries accordingly.

I checked soapberry & huckleberry yesterday and both are reachable from the HMC and LPARs are running. Currently all resources are assigned to a huge LPAR, but I want to change this to cover for the missing nessberry.

I did not check yet VIOS, networking or installation in either system.

Is there a way to connect to the HMC via SSH? Or this is only possible from a host in oqa.prg2.suse.org?

yes, same credentials as for https://powerhmc1.oqa.prg2.suse.org. After you logged in over ssh with the password you can also add your ssh key to the HMC with the command "mkauthkeys".

Actions

Copy link

Updated by acarvajal 4 months ago

okurz wrote in #note-5:

yes, same credentials as for https://powerhmc1.oqa.prg2.suse.org. After you logged in over ssh with the password you can also add your ssh key to the HMC with the command "mkauthkeys".

Thanks! Yes, it works from VPN, but does not from Franken Campus which is what I tried yesterday.

Actions

Copy link

Updated by okurz 4 months ago

acarvajal wrote in #note-6:

okurz wrote in #note-5:

yes, same credentials as for https://powerhmc1.oqa.prg2.suse.org. After you logged in over ssh with the password you can also add your ssh key to the HMC with the command "mkauthkeys".

Thanks! Yes, it works from VPN, but does not from Franken Campus which is what I tried yesterday.

Works for me from "Frankencampus Wifi", maybe different for "Frankencampus Office" wired network? If yes then please record an SD ticket to fix that and include me in so I will approve that as "owner" of the target system so that IT doesn't get confused.

Actions

Copy link

Updated by acarvajal 4 months ago

okurz wrote in #note-7:

Works for me from "Frankencampus Wifi", maybe different for "Frankencampus Office" wired network? If yes then please record an SD ticket to fix that and include me in so I will approve that as "owner" of the target system so that IT doesn't get confused.

Got it. Will do that next week.

Actions

Copy link

Updated by acarvajal 4 months ago

Checked VIOS server in soapberry. It's running, but not reachable via the network; in fact, it still has configured the IP for qa.suse.de:

# ifconfig -a
en5: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.162.8.40 netmask 0xffffc000 broadcast 10.162.63.255
         tcp_sendspace 262144 tcp_recvspace 131072 rfc1323 1
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1%1/64
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

The LPAR's network configuration seems to have been updated for PRG2:

Interpartition Logical LAN: U9008.22L.7888E1A-V2-C2-T1
 1.   Client IP Address                    [10.145.0.70]
 2.   Server IP Address                    [10.146.5.254]
 3.   Gateway IP Address                   [10.146.5.254]
 4.   Subnet Mask                          [255.255.254.0]

Is this configuration correct?

However PXE boot failed, and installed OS is configuring an old IP address from qa.suse.de.

I noticed that only redcurrant has LPARs defined for DHCP in https://gitlab.suse.de/OPS-Service/salt, which may explain why PXE boot failed here, but want to confirm the information above is correct before submitting a MR to add LPARs from soapberry to DHCP.

Questions:

Is there a document or related ticket with information how redcurrant was configured?
If I configure LPARs in soapberry to be used for openqa.suse.de, does soapberry's network need to be moved from qe.prg2.suse.org to oqa.prg2.suse.org as redcurrant?

Actions

Copy link

#10

Updated by okurz 4 months ago

Due date changed from 2024-04-09 to 2024-04-23

@nicksinger can you please answer the above questions?

Actions

Copy link

#11

Updated by acarvajal 4 months ago

Another question @nicksinger: by any chance did you have to recreate VIOS and LPARs for redcurrant after the migration?

It seems I cannot manage virtual storage on soapberry, huckleberry and blackcurrant via the HMC webUI and I initially thought it was due to some role assigned to my account, but it seems I can see logical volumes assigned to LPARs from redcurrant, so perhaps I need to destroy everything everything and start from scratch.

Actions

Copy link

#12

Updated by nicksinger 4 months ago

acarvajal wrote in #note-9:

Checked VIOS server in soapberry. It's running, but not reachable via the network; in fact, it still has configured the IP for qa.suse.de:

# ifconfig -a
en5: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.162.8.40 netmask 0xffffc000 broadcast 10.162.63.255
         tcp_sendspace 262144 tcp_recvspace 131072 rfc1323 1
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1%1/64
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

I had the same with other VIOS and resolved it by using smitty after a oem_setup_env on the VIOS terminal. In there, you can configure DHCP for the en5 interface (which is a pretty good first test if the network for the machine is working)

The LPAR's network configuration seems to have been updated for PRG2:

Interpartition Logical LAN: U9008.22L.7888E1A-V2-C2-T1
 1.   Client IP Address                    [10.145.0.70]
 2.   Server IP Address                    [10.146.5.254]
 3.   Gateway IP Address                   [10.146.5.254]
 4.   Subnet Mask                          [255.255.254.0]

Is this configuration correct?

Client IP: https://gitlab.suse.de/OPS-Service/salt/-/blob/production/salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org?plain=1#L126
Server IP == dhcp for this network, so: https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/domain/qe_prg2_suse_org/init.sls#L41
Gateway IP: https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/domain/qe_prg2_suse_org/init.sls#L41
Subnet Mask: https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/domain/qe_prg2_suse_org/init.sls#L42

However PXE boot failed, and installed OS is configuring an old IP address from qa.suse.de.

I noticed that only redcurrant has LPARs defined for DHCP in https://gitlab.suse.de/OPS-Service/salt, which may explain why PXE boot failed here, but want to confirm the information above is correct before submitting a MR to add LPARs from soapberry to DHCP.

LPARs are defined here: https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/domain/qe_prg2_suse_org/hosts.yaml#L519-568 - is this the correct network? If not the config needs to be moved of course.

Questions:

Is there a document or related ticket with information how redcurrant was configured?

https://progress.opensuse.org/issues/139199#note-22 and following should cover all what I did with redcurrant. I also documented in https://progress.opensuse.org/issues/155521#note-17 which I setup a new VIOS on redcurrant because the old one was broken. You should not need to set up a new one if you can still access the old.

If I configure LPARs in soapberry to be used for openqa.suse.de, does soapberry's network need to be moved from qe.prg2.suse.org to oqa.prg2.suse.org as redcurrant?

No, the network the LPARs reside in shouldn't matter to openQA. You just need a working PXE which can access dist.

Actions

Copy link

#13

Updated by nicksinger 4 months ago

acarvajal wrote in #note-11:

Another question @nicksinger: by any chance did you have to recreate VIOS and LPARs for redcurrant after the migration?

It seems I cannot manage virtual storage on soapberry, huckleberry and blackcurrant via the HMC webUI and I initially thought it was due to some role assigned to my account, but it seems I can see logical volumes assigned to LPARs from redcurrant, so perhaps I need to destroy everything everything and start from scratch.

Yes, I did have to recreate the VIOS of redcurrant because it wasn't able to boot (Something about a missing network-based root filesystem). I was suspecting some FC connection is not present any longer but was not aware that these machines share their storage with it.

Actions

Copy link

#14

Updated by acarvajal 3 months ago

okurz wrote in #note-7:

Works for me from "Frankencampus Wifi", maybe different for "Frankencampus Office" wired network? If yes then please record an SD ticket to fix that and include me in so I will approve that as "owner" of the target system so that IT doesn't get confused.

https://sd.suse.com/servicedesk/customer/portal/1/SD-153352

Actions

Copy link

#15

Updated by acarvajal 3 months ago

nicksinger wrote in #note-13:

Yes, I did have to recreate the VIOS of redcurrant because it wasn't able to boot (Something about a missing network-based root filesystem). I was suspecting some FC connection is not present any longer but was not aware that these machines share their storage with it.

@nicksinger @okurz I'm having issues managing storage devices in soapberry, and I also see that the HMC is not able to determine the running version of the current soapberry-vios (IIRC it was VIOS_3.1.0.21) so I decided to start from scratch and reinstall everything.

While attempting to install the VIOS I get the following error if I try to install the VIOS version which is currently available in the Management Console (VIOS_SP_3.1.0):

acarvajal@powerhmc1:~> installios
Logging session output to /tmp/installios.209507.log.
ERROR installios: You must be a superadmin to run this command.

Is it possible that my user lacks the permissions to do this? Where can I request this to be changed?

P.S.: I also attempted installing using the same method via the webUI (same error) and using 10.255.255.1 as a NIM server (failed bringing up the network interfaces). At least I know Nick could successfully install a VIOS using installios, so I guess that if I have the same role in powerhmc1 as his account, I could go further.

Actions

Copy link

#16

Updated by nicksinger 3 months ago

I made you hmcsuperadmin which should give you sufficient permissions. To install the VIOS you need to (temporarily) connect the network interfaces of soapberry into the same network as the ASM/HMC (10.255.255.0/24). We're currently still looking for a solution to do this installation over two networks.

Actions

Copy link

#17

Updated by acarvajal 3 months ago

nicksinger wrote in #note-16:

I made you hmcsuperadmin which should give you sufficient permissions. To install the VIOS you need to (temporarily) connect the network interfaces of soapberry into the same network as the ASM/HMC (10.255.255.0/24). We're currently still looking for a solution to do this installation over two networks.

Thanks a lot. As expected, VIOS installation fails (it goes a bit further than on https://progress.opensuse.org/issues/155521#note-18, but it fails to lpar_netboot).

As the VIOS boots with what's already installed there, and with the hmcsuperadmin role, I tried to see if anything else was possible, but I think current issue with the soapberry VIOS is that its RMC cannot reach the HMC. HMC cannot detect the VIOS version, nor its IP, nor its status:

acarvajal@powerhmc1:~> lssyscfg -r lpar -m soapberry -F rmc_ipaddr,lpar_id,name,state,rmc_state
,1,soapberry-vios,Running,inactive

I think this is what's blocking me at the moment. I expect a full VIOS installation would fix the issue, but will need to look into connecting the interfaces to the 10.255.255/24 network.

Just out of curiosity, can you share with me the output of the following commands on the VIOS from redcurrant?

# lslpp -l rsct.core.rmc
  Fileset                      Level  State      Description         
  ----------------------------------------------------------------------------
Path: /usr/lib/objrepos
  rsct.core.rmc              3.2.4.2  COMMITTED  RSCT Resource Monitoring and
                                                 Control

Path: /etc/objrepos
  rsct.core.rmc              3.2.4.2  COMMITTED  RSCT Resource Monitoring and
                                                 Control
# /usr/sbin/rsct/bin/ctsvhbac
------------------------------------------------------------------------
Host Based Authentication Mechanism Verification Check

Private and Public Key Verifications

      Configuration file:  /opt/rsct/cfg/ctcasd.cfg
                  Status:  Available
                Key Type:  rsa512
                           RSA key generation method, 512-bit key.

        Private Key file:  /var/ct/cfg/ct_has.qkf
                  Source:  Configuration file.
                  Status:  Available
                Key Type:  rsa512
                           RSA key generation method, 512-bit key.

         Public Key file:  /var/ct/cfg/ct_has.pkf
                  Source:  Configuration file.
                  Status:  Attention - Permissions not as expected,
                           Expected -r--r--r--
                Key Type:  rsa512
                           RSA key generation method, 512-bit key.

              Key Parity:  Public and private keys are in pair.

Trusted Host List File Verifications

  Trusted Host List file:  /var/ct/cfg/ct_has.thl
                  Source:  Configuration file.
                  Status:  Attention - Permissions not as expected,
                           Expected -r--r--r--

                Identity:  soapberry-vios.qe.prg2.suse.org
                  Status:  Trusted host.

                Identity:  10.145.0.79
                  Status:  Trusted host.

                Identity:  127.0.0.1
                  Status:  Trusted host.

                Identity:  localhost
                  Status:  Trusted host.

                Identity:  ::1
                  Status:  Trusted host.

                Identity:  ::1%1
                  Status:  Trusted host.

Host Based Authentication Mechanism Verification Check completed.
------------------------------------------------------------------------
# lsrsrc IBM.MCP
Resource Persistent Attributes for IBM.MCP
# telnet 10.145.14.33 657
Trying...
^C# telnet 10.255.255.1 657
Trying...
^C#

I suspect the last 3 commands hold the key about why this VIOS is not able to have a RMC connection with the HMC.

Perhaps we need to move these to 10.145.8/21 as redcurrant was.

Actions

Copy link

#18

Updated by nicksinger 3 months ago

Did you try to use oem_setup_env and smitty to configure dhcp for your en5 interface? Once you can ping the HMC from the VIOS LPAR you should be fine with the RMC connection.

Actions

Copy link

#19

Updated by acarvajal 3 months ago · Edited

nicksinger wrote in #note-18:

Did you try to use oem_setup_env and smitty to configure dhcp for your en5 interface? Once you can ping the HMC from the VIOS LPAR you should be fine with the RMC connection.

Network connectivity is there, but RMC connection from VIOS to HMC (10.145.14.33, port 657) is not possible:

acarvajal@linux-mkji:~> ssh padmin@soapberry-vios.qe.prg2.suse.org
padmin@soapberry-vios.qe.prg2.suse.org's password: 
Last unsuccessful login: Tue Mar  5 14:40:39 NFT 2024 on /dev/vty0 from soapberry-vios.qa.suse.de
Last login: Sat Apr  6 01:26:00 DFT 2024 on ssh from 10.149.242.58
$ oem_setup_env  
# ping 10.145.14.33
PING 10.145.14.33: (10.145.14.33): 56 data bytes
64 bytes from 10.145.14.33: icmp_seq=0 ttl=63 time=0 ms
64 bytes from 10.145.14.33: icmp_seq=1 ttl=63 time=0 ms
64 bytes from 10.145.14.33: icmp_seq=2 ttl=63 time=0 ms
64 bytes from 10.145.14.33: icmp_seq=3 ttl=63 time=0 ms
^C
--- 10.145.14.33 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 0/0/0 ms
# telnet 10.145.14.33 657
Trying...
telnet: connect: A remote host did not respond within the timeout period.
# ^D
$ ^D

That's why I was asking for the output of those commands in redcurrant's VIOS. If it's working from there I think we would need to either move these systems to the same network redcurrant is at the moment (10.145.8/21), or allow connections to port 657 from 10.145.0/21 to 10.145.8/21 in the firewall.

Actions

Copy link

#20

Updated by okurz 3 months ago

I connected to the VIOS instance with ssh powerhmc1.oqa.prg2.suse.org and mkvterm -m redcurrant -p redcurrant-vios but I don't know how to become "root" on the VIOS so I couldn't successfully execute those commands.

Regarding the network the problem: Is soapberry intended to run as part of openQA? Then it should be moved to the same network as redcurrant. We can also move the machine to the openQA-network if it's not directly intended to be used for openQA however the correct way for a non-openQA machine would be to stay within qe.prg2.suse.org and then possibly wait for https://jira.suse.com/browse/ENGINFRA-3764 "Ensure a PRG2 based QE PowerPC HMC is reachable over proper FQDN and reverse PTR" to be implemented first.

Actions

Copy link

#21

Updated by acarvajal 3 months ago

okurz wrote in #note-20:

I connected to the VIOS instance with ssh powerhmc1.oqa.prg2.suse.org and mkvterm -m redcurrant -p redcurrant-vios but I don't know how to become "root" on the VIOS so I couldn't successfully execute those commands.

Thanks Oliver. You use oem_setup_env to become "root" on the VIOS.

Regarding the network the problem: Is soapberry intended to run as part of openQA? Then it should be moved to the same network as redcurrant. We can also move the machine to the openQA-network if it's not directly intended to be used for openQA however the correct way for a non-openQA machine would be to stay within qe.prg2.suse.org and then possibly wait for https://jira.suse.com/browse/ENGINFRA-3764 "Ensure a PRG2 based QE PowerPC HMC is reachable over proper FQDN and reverse PTR" to be implemented first.

One of soapberry, huckleberry or nessberry should supply 4 to 6 SAP HANA capable LPARs for osd. Before the datacenter migration, this was covered by nessberry, but due to the ongoing issues with nessberry, I planned to replace nessberry with soapberry as I think huckleberry does not have enough storage for this.

Regarding https://jira.suse.com/browse/ENGINFRA-3764, is the intention to have an HMC for osd and another for general QE, is powerhmc1.oqa.prg2.suse.org going to be moved to qe.prg2.suse.org, or is there going to be one HMC connected to both networks?

Actions

Copy link

#22

Updated by okurz 3 months ago

acarvajal wrote in #note-21:

Regarding https://jira.suse.com/browse/ENGINFRA-3764, is the intention to have an HMC for osd and another for general QE, is powerhmc1.oqa.prg2.suse.org going to be moved to qe.prg2.suse.org, or is there going to be one HMC connected to both networks?

Undecided. Maybe we will just stick with the current one as I don't see the benefit security-wise to separate but might be necessary based on what IT or Cybersecurity want to achieve. Such decision also depends on outcomes from requests that you might bring up regarding making connections work to more machines and such. So regarding network access maybe easiest is if you create an SD ticket to IT, ask for according access from the two networks you mentioned and/or the according additional access to DHCP, switches, firewall, etc., to be able to debug what's going on

Actions

Copy link

#23

Updated by acarvajal 3 months ago

Created an SD ticket to request access to port 657 on the HMC from 10.145.0/21: https://sd.suse.com/servicedesk/customer/portal/1/SD-153996

Actions

Copy link

#24

Updated by acarvajal 3 months ago

okurz wrote in #note-3:

@acarvajal according to https://confluence.suse.com/dosearchsite.action?cql=siteSearch ~ "soapberry" I suspect QE SAP is owner of the machine soapberry. Can you confirm the machine is fully usable by you?

Going back to this note to give a current status.

With https://sd.suse.com/servicedesk/customer/portal/1/SD-153996 processed and the hmcsuperadmin permission granted (see #note-16), I can fully manage the system.
VIOS server is up and available at the intended address: soapberry-vios.qe.prg2.suse.org.
Regarding usability, it seems something is still missing, but not sure if the issue is Network or DHCP. Details below:

I am trying to setup a new LPAR soapberry-1.qe.prg2.suse.org. DHCP entry for the LPAR seems to be defined in https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/domain/qe_prg2_suse_org/hosts.yaml?ref_type=heads#L519-523

And its corresponding IP address is in https://gitlab.suse.de/OPS-Service/salt/-/raw/production/salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org?ref_type=heads.

Information is:

hosts.yaml:

soapberry-1:
  mac: '5a:9a:cd:db:63:02'
  ip4: soapberry-1.qe.prg2.suse.org
  dhcp_next_server: 10.168.192.10
  dhcp_filename: 'ppc64le/grub2'

DNS zone:

soapberry-1             300  A     10.145.0.70
soapberry-1             300  AAAA  a07:de40:b203:8:10:145:0:70

On the LPAR itself, the following addresses were configured:

Interpartition Logical LAN: U9008.22L.7888E1A-V2-C2-T1
 1.   Client IP Address                    [10.145.0.70]
 2.   Server IP Address                    [10.145.7.254]
 3.   Gateway IP Address                   [10.145.7.254]
 4.   Subnet Mask                          [255.255.248.0]

This configuration seems to be working in the sense that it is possible to reach the Server IP with a Ping Test:

 SMS (c) Copyright IBM Corp. 2000,2019 All rights reserved.
-------------------------------------------------------------------------------
 Ping Test
 Interpartition Logical LAN: U9008.22L.7888E1A-V2-C2-T1
 Speed, Duplex: auto,auto
 Client IP Address: 10.145.0.70
 Server IP Address: 10.145.7.254
 Gateway IP Address: 10.145.7.254
 Subnet Mask: 255.255.248.0
 Protocol: Standard
 Spanning Tree Enabled: 1
 Connector Type: 
 VLAN Priority: 0
 VLAN ID: 0
 VLAN Tag: 
 UDP checksum validation: Enabled
 1. Execute Ping Test

And:

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=3  ttl=? time=20  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=4  ttl=? time=10  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=5  ttl=? time=10  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=6  ttl=? time=10  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=7  ttl=? time=11  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=8  ttl=? time=20  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=9  ttl=? time=20  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=10  ttl=? time=21  ms

                              .-----------------.
                              |  Ping  Success. |
                              `-----------------'

However, attempts to boot from PXE fail:

BOOTP Parameters: 
----------------  
chosen-network-type = ethernet,auto,rj45,auto
server IP           = 10.145.7.254
client IP           = 10.145.0.70
gateway IP          = 10.145.7.254
device              = /vdevice/l-lan@30000002
MAC address         = 5a 9a cd db 63 02 
loc-code            = U9008.22L.7888E1A-V2-C2-T1

BOOTP request retry attempt: 1 
BOOTP request retry attempt: 2 
BOOTP request retry attempt: 3 
BOOTP request retry attempt: 4 
    !BA01B015 !

    !BA010003 !

                      .----------------------------------.
                      |  No Operating Systems Installed  |
                      `----------------------------------'

redcurrant's DHCP configuration is different than soapberry's. See for example redcurrant-1's entry in pillar/domain/oqa_prg2_suse_org/hosts.yaml:

redcurrant-1:
  ip4: 10.145.10.222
  mac: 'f6:6b:46:d3:fd:03'
  hostname: redcurrant-1
  dhcp_next_server: 10.168.192.10
  dhcp_filename: 'ppc64le/grub2'

However as I said, I can't be sure current issue is DHCP or the Network as soapberry is also located in a different network segment than redcurrant.

Actions

Copy link

#25

Updated by acarvajal 3 months ago

Solved the issue by replacing the BOOTP server IP from 10.145.7.254 for 10.145.0.1 (fozziebear). Got the information from https://sd.suse.com/servicedesk/customer/portal/1/SD-154008?sda_source=notification-email, so thanks @okurz :)

Actions

Copy link

#26

Updated by nicksinger 3 months ago

@acarvajal can you share the ticket with osd-admins group (the icon with 3 people on it in jira SD) please? It sounds like it contains valuable information :)

Actions

Copy link

#27

Updated by acarvajal 3 months ago

nicksinger wrote in #note-26:

@acarvajal can you share the ticket with osd-admins group (the icon with 3 people on it in jira SD) please? It sounds like it contains valuable information :)

Done.

I've collected this information in https://confluence.suse.com/display/qasle/QE-SAP+Power9+Infrastructure#QESAPPower9Infrastructure-IPAddresssetupviaSMS as well.

Actions

Copy link

#28

Updated by acarvajal 3 months ago

acarvajal wrote in #note-25:

Solved the issue by replacing the BOOTP server IP from 10.145.7.254 for 10.145.0.1 (fozziebear). Got the information from https://sd.suse.com/servicedesk/customer/portal/1/SD-154008?sda_source=notification-email, so thanks @okurz :)

Well, this worked for one day. :(

PXE boot is not possible again: http://mango.qe.nue2.suse.org/tests/5744#step/bootloader/20

Any ideas what may be happening, or what to do?

Actions

Copy link

#29

Updated by okurz 3 months ago

Subject changed from Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - soapberry to Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - soapberry size:S

General network problems right now: https://suse.slack.com/archives/C02AET1AAAD/p1713523884131739

Actions

Copy link

#30

Updated by okurz 3 months ago

Due date deleted (~~2024-04-23~~)
Status changed from Feedback to Resolved

acarvajal I assume except for general problems which are already handled over other communication channels there is not more support we need to provide from the tools team in this ticket. You can reach out to us in general anyway :) So resolving assuming that soapberry is generally fully usable again from PRG2.

Actions

Copy link

#31

Updated by acarvajal 3 months ago

Due date set to 2024-04-23

okurz wrote in #note-30:

acarvajal I assume except for general problems which are already handled over other communication channels there is not more support we need to provide from the tools team in this ticket. You can reach out to us in general anyway :) So resolving assuming that soapberry is generally fully usable again from PRG2.

Yes, I agree. We can resolve this and follow later in a different ticket.

Current status is the same: http://mango.qe.nue2.suse.org/tests/5763#step/bootloader/20

I spoke with Martin Caj in Ostrava and he has some ideas which we can try. I'll sync with you directly.

Actions

Copy link

#32

Updated by acarvajal 2 months ago

Managed to have the 4 soapberry LPARs working from openQA: http://mango.qe.nue2.suse.org/tests/overview?build=88.1&distri=sle&version=15-SP6&groupid=1

I'm submitting a PR for the reset of the LPAR's NVRAM variables before boot as this helped solve the issue. PR is in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/19290 with corresponding needles in https://gitlab.suse.de/openqa/os-autoinst-needles-sles/-/merge_requests/1664.

Actions

Copy link

#33

Updated by okurz about 1 month ago

Due date deleted (~~2024-04-23~~)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA

Tags

Custom queries

action #153733

Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - soapberry size:S

Acceptance criteria¶

Suggestions¶

Updated by okurz 6 months ago

Updated by okurz 4 months ago

Updated by acarvajal 4 months ago

Updated by okurz 4 months ago

Updated by acarvajal 4 months ago

Updated by okurz 4 months ago

Updated by acarvajal 4 months ago

Updated by acarvajal 4 months ago

Updated by okurz 4 months ago

Updated by acarvajal 4 months ago

Updated by nicksinger 4 months ago

Updated by nicksinger 4 months ago

Updated by acarvajal 3 months ago

Updated by acarvajal 3 months ago

Updated by nicksinger 3 months ago

Updated by acarvajal 3 months ago

Updated by nicksinger 3 months ago

Updated by acarvajal 3 months ago · Edited

Updated by okurz 3 months ago

Updated by acarvajal 3 months ago

Updated by okurz 3 months ago

Updated by acarvajal 3 months ago

Updated by acarvajal 3 months ago

Updated by acarvajal 3 months ago

Updated by nicksinger 3 months ago

Updated by acarvajal 3 months ago

Updated by acarvajal 3 months ago

Updated by okurz 3 months ago

Updated by okurz 3 months ago

Updated by acarvajal 3 months ago

Updated by acarvajal 2 months ago

Updated by okurz about 1 month ago