Project

General

Profile

Actions

action #153733

closed

coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

coordination #123800: [epic] Provide SUSE QE Tools services running in PRG2 aka. Prg CoLo

coordination #137630: [epic] QE (non-openQA) setup in PRG2

Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - soapberry size:S

Added by okurz 3 months ago. Updated 5 days ago.

Status:
Resolved
Priority:
Low
Assignee:
Target version:
Start date:
2024-01-16
Due date:
% Done:

0%

Estimated time:

Description

Acceptance criteria

  • AC1: soapberry is usable from PRG2

Suggestions


Related issues 2 (1 open1 closed)

Copied from QA - action #153730: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - huckleberryResolvedokurz2024-01-16

Actions
Copied to QA - action #153736: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - nessberryBlockedokurz2024-01-16

Actions
Actions #1

Updated by okurz 3 months ago

  • Copied from action #153730: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - huckleberry added
Actions #2

Updated by okurz 3 months ago

  • Copied to action #153736: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - nessberry added
Actions #3

Updated by okurz about 1 month ago

  • Due date set to 2024-04-09
  • Status changed from Blocked to Feedback
  • Target version changed from future to Ready

https://jira.suse.com/browse/ENGINFRA-3748 was set to "Done" but I am still missing final confirmation of being able to use the machine, comment https://jira.suse.com/browse/ENGINFRA-3748?focusedId=1334763&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1334763 not yet answered

@acarvajal according to https://confluence.suse.com/dosearchsite.action?cql=siteSearch ~ "soapberry" I suspect QE SAP is owner of the machine soapberry. Can you confirm the machine is fully usable by you?

Actions #4

Updated by acarvajal about 1 month ago

Confirming. soapberry, huckleberry, blackcurrant and legolas are QE-SAP and Hana Perf machines.

See https://gitlab.suse.de/hsehic/qa-css-docs/-/blob/master/infrastructure/power9-configuration.md?ref_type=heads for a list of the Power 9 machines that used to be assigned to the old QA-CSS (former QE-SAP) team.

I checked soapberry & huckleberry yesterday and both are reachable from the HMC and LPARs are running. Currently all resources are assigned to a huge LPAR, but I want to change this to cover for the missing nessberry.

I did not check yet VIOS, networking or installation in either system.

Is there a way to connect to the HMC via SSH? Or this is only possible from a host in oqa.prg2.suse.org?

Actions #5

Updated by okurz about 1 month ago

acarvajal wrote in #note-4:

Confirming. soapberry, huckleberry, blackcurrant and legolas are QE-SAP and Hana Perf machines.

See https://gitlab.suse.de/hsehic/qa-css-docs/-/blob/master/infrastructure/power9-configuration.md?ref_type=heads for a list of the Power 9 machines that used to be assigned to the old QA-CSS (former QE-SAP) team.

ok, I updated all the racktable entries accordingly.

I checked soapberry & huckleberry yesterday and both are reachable from the HMC and LPARs are running. Currently all resources are assigned to a huge LPAR, but I want to change this to cover for the missing nessberry.

I did not check yet VIOS, networking or installation in either system.

Is there a way to connect to the HMC via SSH? Or this is only possible from a host in oqa.prg2.suse.org?

yes, same credentials as for https://powerhmc1.oqa.prg2.suse.org. After you logged in over ssh with the password you can also add your ssh key to the HMC with the command "mkauthkeys".

Actions #6

Updated by acarvajal about 1 month ago

okurz wrote in #note-5:

yes, same credentials as for https://powerhmc1.oqa.prg2.suse.org. After you logged in over ssh with the password you can also add your ssh key to the HMC with the command "mkauthkeys".

Thanks! Yes, it works from VPN, but does not from Franken Campus which is what I tried yesterday.

Actions #7

Updated by okurz 30 days ago

acarvajal wrote in #note-6:

okurz wrote in #note-5:

yes, same credentials as for https://powerhmc1.oqa.prg2.suse.org. After you logged in over ssh with the password you can also add your ssh key to the HMC with the command "mkauthkeys".

Thanks! Yes, it works from VPN, but does not from Franken Campus which is what I tried yesterday.

Works for me from "Frankencampus Wifi", maybe different for "Frankencampus Office" wired network? If yes then please record an SD ticket to fix that and include me in so I will approve that as "owner" of the target system so that IT doesn't get confused.

Actions #8

Updated by acarvajal 30 days ago

okurz wrote in #note-7:

Works for me from "Frankencampus Wifi", maybe different for "Frankencampus Office" wired network? If yes then please record an SD ticket to fix that and include me in so I will approve that as "owner" of the target system so that IT doesn't get confused.

Got it. Will do that next week.

Actions #9

Updated by acarvajal 30 days ago

Checked VIOS server in soapberry. It's running, but not reachable via the network; in fact, it still has configured the IP for qa.suse.de:

# ifconfig -a
en5: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.162.8.40 netmask 0xffffc000 broadcast 10.162.63.255
         tcp_sendspace 262144 tcp_recvspace 131072 rfc1323 1
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1%1/64
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

The LPAR's network configuration seems to have been updated for PRG2:

Interpartition Logical LAN: U9008.22L.7888E1A-V2-C2-T1
 1.   Client IP Address                    [10.145.0.70]
 2.   Server IP Address                    [10.146.5.254]
 3.   Gateway IP Address                   [10.146.5.254]
 4.   Subnet Mask                          [255.255.254.0]

Is this configuration correct?

However PXE boot failed, and installed OS is configuring an old IP address from qa.suse.de.

I noticed that only redcurrant has LPARs defined for DHCP in https://gitlab.suse.de/OPS-Service/salt, which may explain why PXE boot failed here, but want to confirm the information above is correct before submitting a MR to add LPARs from soapberry to DHCP.

Questions:

  • Is there a document or related ticket with information how redcurrant was configured?
  • If I configure LPARs in soapberry to be used for openqa.suse.de, does soapberry's network need to be moved from qe.prg2.suse.org to oqa.prg2.suse.org as redcurrant?
Actions #10

Updated by okurz 30 days ago

  • Due date changed from 2024-04-09 to 2024-04-23

@nicksinger can you please answer the above questions?

Actions #11

Updated by acarvajal 30 days ago

Another question @nicksinger: by any chance did you have to recreate VIOS and LPARs for redcurrant after the migration?

It seems I cannot manage virtual storage on soapberry, huckleberry and blackcurrant via the HMC webUI and I initially thought it was due to some role assigned to my account, but it seems I can see logical volumes assigned to LPARs from redcurrant, so perhaps I need to destroy everything everything and start from scratch.

Actions #12

Updated by nicksinger 25 days ago

acarvajal wrote in #note-9:

Checked VIOS server in soapberry. It's running, but not reachable via the network; in fact, it still has configured the IP for qa.suse.de:

# ifconfig -a
en5: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.162.8.40 netmask 0xffffc000 broadcast 10.162.63.255
         tcp_sendspace 262144 tcp_recvspace 131072 rfc1323 1
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1%1/64
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

I had the same with other VIOS and resolved it by using smitty after a oem_setup_env on the VIOS terminal. In there, you can configure DHCP for the en5 interface (which is a pretty good first test if the network for the machine is working)

The LPAR's network configuration seems to have been updated for PRG2:

Interpartition Logical LAN: U9008.22L.7888E1A-V2-C2-T1
 1.   Client IP Address                    [10.145.0.70]
 2.   Server IP Address                    [10.146.5.254]
 3.   Gateway IP Address                   [10.146.5.254]
 4.   Subnet Mask                          [255.255.254.0]

Is this configuration correct?

Client IP: https://gitlab.suse.de/OPS-Service/salt/-/blob/production/salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org?plain=1#L126
Server IP == dhcp for this network, so: https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/domain/qe_prg2_suse_org/init.sls#L41
Gateway IP: https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/domain/qe_prg2_suse_org/init.sls#L41
Subnet Mask: https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/domain/qe_prg2_suse_org/init.sls#L42

However PXE boot failed, and installed OS is configuring an old IP address from qa.suse.de.

I noticed that only redcurrant has LPARs defined for DHCP in https://gitlab.suse.de/OPS-Service/salt, which may explain why PXE boot failed here, but want to confirm the information above is correct before submitting a MR to add LPARs from soapberry to DHCP.

LPARs are defined here: https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/domain/qe_prg2_suse_org/hosts.yaml#L519-568 - is this the correct network? If not the config needs to be moved of course.

Questions:

  • Is there a document or related ticket with information how redcurrant was configured?

https://progress.opensuse.org/issues/139199#note-22 and following should cover all what I did with redcurrant. I also documented in https://progress.opensuse.org/issues/155521#note-17 which I setup a new VIOS on redcurrant because the old one was broken. You should not need to set up a new one if you can still access the old.

  • If I configure LPARs in soapberry to be used for openqa.suse.de, does soapberry's network need to be moved from qe.prg2.suse.org to oqa.prg2.suse.org as redcurrant?

No, the network the LPARs reside in shouldn't matter to openQA. You just need a working PXE which can access dist.

Actions #13

Updated by nicksinger 25 days ago

acarvajal wrote in #note-11:

Another question @nicksinger: by any chance did you have to recreate VIOS and LPARs for redcurrant after the migration?

It seems I cannot manage virtual storage on soapberry, huckleberry and blackcurrant via the HMC webUI and I initially thought it was due to some role assigned to my account, but it seems I can see logical volumes assigned to LPARs from redcurrant, so perhaps I need to destroy everything everything and start from scratch.

Yes, I did have to recreate the VIOS of redcurrant because it wasn't able to boot (Something about a missing network-based root filesystem). I was suspecting some FC connection is not present any longer but was not aware that these machines share their storage with it.

Actions #14

Updated by acarvajal 23 days ago

okurz wrote in #note-7:

Works for me from "Frankencampus Wifi", maybe different for "Frankencampus Office" wired network? If yes then please record an SD ticket to fix that and include me in so I will approve that as "owner" of the target system so that IT doesn't get confused.

https://sd.suse.com/servicedesk/customer/portal/1/SD-153352

Actions #15

Updated by acarvajal 23 days ago

nicksinger wrote in #note-13:

Yes, I did have to recreate the VIOS of redcurrant because it wasn't able to boot (Something about a missing network-based root filesystem). I was suspecting some FC connection is not present any longer but was not aware that these machines share their storage with it.

@nicksinger @okurz I'm having issues managing storage devices in soapberry, and I also see that the HMC is not able to determine the running version of the current soapberry-vios (IIRC it was VIOS_3.1.0.21) so I decided to start from scratch and reinstall everything.

While attempting to install the VIOS I get the following error if I try to install the VIOS version which is currently available in the Management Console (VIOS_SP_3.1.0):

acarvajal@powerhmc1:~> installios
Logging session output to /tmp/installios.209507.log.
ERROR installios: You must be a superadmin to run this command.

Is it possible that my user lacks the permissions to do this? Where can I request this to be changed?

P.S.: I also attempted installing using the same method via the webUI (same error) and using 10.255.255.1 as a NIM server (failed bringing up the network interfaces). At least I know Nick could successfully install a VIOS using installios, so I guess that if I have the same role in powerhmc1 as his account, I could go further.

Actions #16

Updated by nicksinger 23 days ago

I made you hmcsuperadmin which should give you sufficient permissions. To install the VIOS you need to (temporarily) connect the network interfaces of soapberry into the same network as the ASM/HMC (10.255.255.0/24). We're currently still looking for a solution to do this installation over two networks.

Actions #17

Updated by acarvajal 22 days ago

nicksinger wrote in #note-16:

I made you hmcsuperadmin which should give you sufficient permissions. To install the VIOS you need to (temporarily) connect the network interfaces of soapberry into the same network as the ASM/HMC (10.255.255.0/24). We're currently still looking for a solution to do this installation over two networks.

Thanks a lot. As expected, VIOS installation fails (it goes a bit further than on https://progress.opensuse.org/issues/155521#note-18, but it fails to lpar_netboot).

As the VIOS boots with what's already installed there, and with the hmcsuperadmin role, I tried to see if anything else was possible, but I think current issue with the soapberry VIOS is that its RMC cannot reach the HMC. HMC cannot detect the VIOS version, nor its IP, nor its status:

acarvajal@powerhmc1:~> lssyscfg -r lpar -m soapberry -F rmc_ipaddr,lpar_id,name,state,rmc_state
,1,soapberry-vios,Running,inactive

I think this is what's blocking me at the moment. I expect a full VIOS installation would fix the issue, but will need to look into connecting the interfaces to the 10.255.255/24 network.

Just out of curiosity, can you share with me the output of the following commands on the VIOS from redcurrant?

# lslpp -l rsct.core.rmc
  Fileset                      Level  State      Description         
  ----------------------------------------------------------------------------
Path: /usr/lib/objrepos
  rsct.core.rmc              3.2.4.2  COMMITTED  RSCT Resource Monitoring and
                                                 Control

Path: /etc/objrepos
  rsct.core.rmc              3.2.4.2  COMMITTED  RSCT Resource Monitoring and
                                                 Control
# /usr/sbin/rsct/bin/ctsvhbac
------------------------------------------------------------------------
Host Based Authentication Mechanism Verification Check

Private and Public Key Verifications

      Configuration file:  /opt/rsct/cfg/ctcasd.cfg
                  Status:  Available
                Key Type:  rsa512
                           RSA key generation method, 512-bit key.

        Private Key file:  /var/ct/cfg/ct_has.qkf
                  Source:  Configuration file.
                  Status:  Available
                Key Type:  rsa512
                           RSA key generation method, 512-bit key.

         Public Key file:  /var/ct/cfg/ct_has.pkf
                  Source:  Configuration file.
                  Status:  Attention - Permissions not as expected,
                           Expected -r--r--r--
                Key Type:  rsa512
                           RSA key generation method, 512-bit key.

              Key Parity:  Public and private keys are in pair.

Trusted Host List File Verifications

  Trusted Host List file:  /var/ct/cfg/ct_has.thl
                  Source:  Configuration file.
                  Status:  Attention - Permissions not as expected,
                           Expected -r--r--r--

                Identity:  soapberry-vios.qe.prg2.suse.org
                  Status:  Trusted host.

                Identity:  10.145.0.79
                  Status:  Trusted host.

                Identity:  127.0.0.1
                  Status:  Trusted host.

                Identity:  localhost
                  Status:  Trusted host.

                Identity:  ::1
                  Status:  Trusted host.

                Identity:  ::1%1
                  Status:  Trusted host.

Host Based Authentication Mechanism Verification Check completed.
------------------------------------------------------------------------
# lsrsrc IBM.MCP
Resource Persistent Attributes for IBM.MCP
# telnet 10.145.14.33 657
Trying...
^C# telnet 10.255.255.1 657
Trying...
^C# 

I suspect the last 3 commands hold the key about why this VIOS is not able to have a RMC connection with the HMC.

Perhaps we need to move these to 10.145.8/21 as redcurrant was.

Actions #18

Updated by nicksinger 22 days ago

Did you try to use oem_setup_env and smitty to configure dhcp for your en5 interface? Once you can ping the HMC from the VIOS LPAR you should be fine with the RMC connection.

Actions #19

Updated by acarvajal 19 days ago ยท Edited

nicksinger wrote in #note-18:

Did you try to use oem_setup_env and smitty to configure dhcp for your en5 interface? Once you can ping the HMC from the VIOS LPAR you should be fine with the RMC connection.

Network connectivity is there, but RMC connection from VIOS to HMC (10.145.14.33, port 657) is not possible:

acarvajal@linux-mkji:~> ssh padmin@soapberry-vios.qe.prg2.suse.org
padmin@soapberry-vios.qe.prg2.suse.org's password: 
Last unsuccessful login: Tue Mar  5 14:40:39 NFT 2024 on /dev/vty0 from soapberry-vios.qa.suse.de
Last login: Sat Apr  6 01:26:00 DFT 2024 on ssh from 10.149.242.58
$ oem_setup_env  
# ping 10.145.14.33
PING 10.145.14.33: (10.145.14.33): 56 data bytes
64 bytes from 10.145.14.33: icmp_seq=0 ttl=63 time=0 ms
64 bytes from 10.145.14.33: icmp_seq=1 ttl=63 time=0 ms
64 bytes from 10.145.14.33: icmp_seq=2 ttl=63 time=0 ms
64 bytes from 10.145.14.33: icmp_seq=3 ttl=63 time=0 ms
^C
--- 10.145.14.33 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 0/0/0 ms
# telnet 10.145.14.33 657
Trying...
telnet: connect: A remote host did not respond within the timeout period.
# ^D
$ ^D

That's why I was asking for the output of those commands in redcurrant's VIOS. If it's working from there I think we would need to either move these systems to the same network redcurrant is at the moment (10.145.8/21), or allow connections to port 657 from 10.145.0/21 to 10.145.8/21 in the firewall.

Actions #20

Updated by okurz 19 days ago

I connected to the VIOS instance with ssh powerhmc1.oqa.prg2.suse.org and mkvterm -m redcurrant -p redcurrant-vios but I don't know how to become "root" on the VIOS so I couldn't successfully execute those commands.

Regarding the network the problem: Is soapberry intended to run as part of openQA? Then it should be moved to the same network as redcurrant. We can also move the machine to the openQA-network if it's not directly intended to be used for openQA however the correct way for a non-openQA machine would be to stay within qe.prg2.suse.org and then possibly wait for https://jira.suse.com/browse/ENGINFRA-3764 "Ensure a PRG2 based QE PowerPC HMC is reachable over proper FQDN and reverse PTR" to be implemented first.

Actions #21

Updated by acarvajal 19 days ago

okurz wrote in #note-20:

I connected to the VIOS instance with ssh powerhmc1.oqa.prg2.suse.org and mkvterm -m redcurrant -p redcurrant-vios but I don't know how to become "root" on the VIOS so I couldn't successfully execute those commands.

Thanks Oliver. You use oem_setup_env to become "root" on the VIOS.

Regarding the network the problem: Is soapberry intended to run as part of openQA? Then it should be moved to the same network as redcurrant. We can also move the machine to the openQA-network if it's not directly intended to be used for openQA however the correct way for a non-openQA machine would be to stay within qe.prg2.suse.org and then possibly wait for https://jira.suse.com/browse/ENGINFRA-3764 "Ensure a PRG2 based QE PowerPC HMC is reachable over proper FQDN and reverse PTR" to be implemented first.

One of soapberry, huckleberry or nessberry should supply 4 to 6 SAP HANA capable LPARs for osd. Before the datacenter migration, this was covered by nessberry, but due to the ongoing issues with nessberry, I planned to replace nessberry with soapberry as I think huckleberry does not have enough storage for this.

Regarding https://jira.suse.com/browse/ENGINFRA-3764, is the intention to have an HMC for osd and another for general QE, is powerhmc1.oqa.prg2.suse.org going to be moved to qe.prg2.suse.org, or is there going to be one HMC connected to both networks?

Actions #22

Updated by okurz 18 days ago

acarvajal wrote in #note-21:

Regarding https://jira.suse.com/browse/ENGINFRA-3764, is the intention to have an HMC for osd and another for general QE, is powerhmc1.oqa.prg2.suse.org going to be moved to qe.prg2.suse.org, or is there going to be one HMC connected to both networks?

Undecided. Maybe we will just stick with the current one as I don't see the benefit security-wise to separate but might be necessary based on what IT or Cybersecurity want to achieve. Such decision also depends on outcomes from requests that you might bring up regarding making connections work to more machines and such. So regarding network access maybe easiest is if you create an SD ticket to IT, ask for according access from the two networks you mentioned and/or the according additional access to DHCP, switches, firewall, etc., to be able to debug what's going on

Actions #23

Updated by acarvajal 16 days ago

Created an SD ticket to request access to port 657 on the HMC from 10.145.0/21: https://sd.suse.com/servicedesk/customer/portal/1/SD-153996

Actions #24

Updated by acarvajal 11 days ago

okurz wrote in #note-3:

@acarvajal according to https://confluence.suse.com/dosearchsite.action?cql=siteSearch ~ "soapberry" I suspect QE SAP is owner of the machine soapberry. Can you confirm the machine is fully usable by you?

Going back to this note to give a current status.

  • With https://sd.suse.com/servicedesk/customer/portal/1/SD-153996 processed and the hmcsuperadmin permission granted (see #note-16), I can fully manage the system.
  • VIOS server is up and available at the intended address: soapberry-vios.qe.prg2.suse.org.
  • Regarding usability, it seems something is still missing, but not sure if the issue is Network or DHCP. Details below:

I am trying to setup a new LPAR soapberry-1.qe.prg2.suse.org. DHCP entry for the LPAR seems to be defined in https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/domain/qe_prg2_suse_org/hosts.yaml?ref_type=heads#L519-523

And its corresponding IP address is in https://gitlab.suse.de/OPS-Service/salt/-/raw/production/salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org?ref_type=heads.

Information is:

hosts.yaml:

soapberry-1:
  mac: '5a:9a:cd:db:63:02'
  ip4: soapberry-1.qe.prg2.suse.org
  dhcp_next_server: 10.168.192.10
  dhcp_filename: 'ppc64le/grub2'

DNS zone:

soapberry-1             300  A     10.145.0.70
soapberry-1             300  AAAA  a07:de40:b203:8:10:145:0:70

On the LPAR itself, the following addresses were configured:

Interpartition Logical LAN: U9008.22L.7888E1A-V2-C2-T1
 1.   Client IP Address                    [10.145.0.70]
 2.   Server IP Address                    [10.145.7.254]
 3.   Gateway IP Address                   [10.145.7.254]
 4.   Subnet Mask                          [255.255.248.0]

This configuration seems to be working in the sense that it is possible to reach the Server IP with a Ping Test:

 SMS (c) Copyright IBM Corp. 2000,2019 All rights reserved.
-------------------------------------------------------------------------------
 Ping Test
 Interpartition Logical LAN: U9008.22L.7888E1A-V2-C2-T1
 Speed, Duplex: auto,auto
 Client IP Address: 10.145.0.70
 Server IP Address: 10.145.7.254
 Gateway IP Address: 10.145.7.254
 Subnet Mask: 255.255.248.0
 Protocol: Standard
 Spanning Tree Enabled: 1
 Connector Type: 
 VLAN Priority: 0
 VLAN ID: 0
 VLAN Tag: 
 UDP checksum validation: Enabled
 1. Execute Ping Test

And:

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=3  ttl=? time=20  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=4  ttl=? time=10  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=5  ttl=? time=10  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=6  ttl=? time=10  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=7  ttl=? time=11  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=8  ttl=? time=20  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=9  ttl=? time=20  ms

10.145.0.70:    24  bytes from 10.145.7.254:  icmp_seq=10  ttl=? time=21  ms

                              .-----------------.
                              |  Ping  Success. |
                              `-----------------'

However, attempts to boot from PXE fail:

BOOTP Parameters: 
----------------  
chosen-network-type = ethernet,auto,rj45,auto
server IP           = 10.145.7.254
client IP           = 10.145.0.70
gateway IP          = 10.145.7.254
device              = /vdevice/l-lan@30000002
MAC address         = 5a 9a cd db 63 02 
loc-code            = U9008.22L.7888E1A-V2-C2-T1

BOOTP request retry attempt: 1 
BOOTP request retry attempt: 2 
BOOTP request retry attempt: 3 
BOOTP request retry attempt: 4 
    !BA01B015 !

    !BA010003 !

                      .----------------------------------.
                      |  No Operating Systems Installed  |
                      `----------------------------------'

redcurrant's DHCP configuration is different than soapberry's. See for example redcurrant-1's entry in pillar/domain/oqa_prg2_suse_org/hosts.yaml:

redcurrant-1:
  ip4: 10.145.10.222
  mac: 'f6:6b:46:d3:fd:03'
  hostname: redcurrant-1
  dhcp_next_server: 10.168.192.10
  dhcp_filename: 'ppc64le/grub2'

However as I said, I can't be sure current issue is DHCP or the Network as soapberry is also located in a different network segment than redcurrant.

Actions #25

Updated by acarvajal 10 days ago

Solved the issue by replacing the BOOTP server IP from 10.145.7.254 for 10.145.0.1 (fozziebear). Got the information from https://sd.suse.com/servicedesk/customer/portal/1/SD-154008?sda_source=notification-email, so thanks @okurz :)

Actions #26

Updated by nicksinger 10 days ago

@acarvajal can you share the ticket with osd-admins group (the icon with 3 people on it in jira SD) please? It sounds like it contains valuable information :)

Actions #27

Updated by acarvajal 10 days ago

nicksinger wrote in #note-26:

@acarvajal can you share the ticket with osd-admins group (the icon with 3 people on it in jira SD) please? It sounds like it contains valuable information :)

Done.

I've collected this information in https://confluence.suse.com/display/qasle/QE-SAP+Power9+Infrastructure#QESAPPower9Infrastructure-IPAddresssetupviaSMS as well.

Actions #28

Updated by acarvajal 8 days ago

acarvajal wrote in #note-25:

Solved the issue by replacing the BOOTP server IP from 10.145.7.254 for 10.145.0.1 (fozziebear). Got the information from https://sd.suse.com/servicedesk/customer/portal/1/SD-154008?sda_source=notification-email, so thanks @okurz :)

Well, this worked for one day. :(

PXE boot is not possible again: http://mango.qe.nue2.suse.org/tests/5744#step/bootloader/20

Any ideas what may be happening, or what to do?

Actions #29

Updated by okurz 8 days ago

  • Subject changed from Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - soapberry to Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - soapberry size:S
Actions #30

Updated by okurz 5 days ago

  • Due date deleted (2024-04-23)
  • Status changed from Feedback to Resolved

acarvajal I assume except for general problems which are already handled over other communication channels there is not more support we need to provide from the tools team in this ticket. You can reach out to us in general anyway :) So resolving assuming that soapberry is generally fully usable again from PRG2.

Actions

Also available in: Atom PDF