Project

General

Profile

action #119059

Use qa-power8 for ppc tests in o3 - network connected? size:M

Added by okurz 3 months ago. Updated 3 days ago.

Status:
Workable
Priority:
High
Assignee:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

The machine "power8" is currently not available for o3 meaning there is currently no ppc testing in o3 at all. Use the existing machine qa-power8 for o3 ppc testing

Acceptance criteria

  • AC1: ppc openQA tests on openqa.opensuse.org passed after running on qa-power8

Suggestions

  • DONE: Find out BMC details -> covered with https://gitlab.suse.de/qa-sle/qanet-configs/-/merge_requests/43
  • Power on the machine over BMC https://qa-power8.qa.suse.de
  • Ensure there is a physical network connection as according to https://progress.opensuse.org/issues/119059 there is currently no ethernet connection for the guest system
  • Try to login over ssh or virt-manager or reinstall in case there is no or no usable OS on it
  • Install openQA-worker on the machine
  • Create SUSE-IT EngInfra ticket to move the machine's ethernet interface based on racktables information to the o3 network (VLAN 662)
  • Add according networking information on o3 into /etc/dnsmasq.d/openqa.conf
  • Configure the machine OS for dhcp client mode and ensure the machine gets an address from o3 dnsmasq
  • Ensure the system is able to execute openQA tests from o3

Related issues

Related to openQA Infrastructure - action #122302: Support SD-105827 "PowerPC often fails to boot from network with 'error: time out opening'"Feedback2022-12-212023-01-19

Copied to openQA Infrastructure - action #119755: Use a PowerVM machine to serve both PowerVM LPARs for testing as well as one VM running qemu testsNew2022-11-02

Copied to openQA Infrastructure - action #123712: PowerVM HMC within o3, e.g. on VMNew2023-01-26

History

#3 Updated by okurz 3 months ago

  • Description updated (diff)

#4 Updated by okurz 3 months ago

  • Tags set to next-office-day

According to BMC the machine is on. According to racktables https://racktables.nue.suse.com/index.php?page=object&object_id=2350 there should be an IPv4 address 10.162.6.170 (ps64mm1070.qa.suse.de) but there is no response on ping. According to racktables there might not be any ethernet connection. The BMC menu "network access" mentions an IPv4 address 10.163.41.238 which does respond on ping and also serves ssh. I did not manage to login over ssh with any password known to me.

nmap -O for OS detection says:

$ sudo nmap -AO 10.163.41.238
Starting Nmap 7.92 ( https://nmap.org ) at 2022-10-19 12:11 CEST
Nmap scan report for 10.163.41.238
Host is up (0.000075s latency).
Not shown: 997 closed tcp ports (reset)
PORT   STATE SERVICE    VERSION
22/tcp open  ssh        OpenSSH 8.4 (protocol 2.0)
| ssh-hostkey: 
|   3072 8f:8f:9c:6d:ae:30:0d:e1:65:1a:c5:0f:2e:21:65:16 (RSA)
|   256 9c:d3:c1:7e:ce:e7:8c:93:0c:af:b6:24:60:95:59:88 (ECDSA)
|_  256 d0:31:38:c4:41:f7:78:14:da:be:a8:f3:b0:ea:ea:55 (ED25519)
25/tcp open  smtp       Postfix smtpd
|_smtp-commands: localhost, PIPELINING, SIZE, ETRN, ENHANCEDSTATUSCODES, 8BITMIME, DSN, SMTPUTF8, CHUNKING
53/tcp open  tcpwrapped
Device type: general purpose
Running: Linux 2.6.X
OS CPE: cpe:/o:linux:linux_kernel:2.6.32
OS details: Linux 2.6.32
Network Distance: 0 hops
Service Info: Host: localhost

OS and Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 21.56 seconds

nsinger said that the linux is likely 4.X

#5 Updated by okurz 3 months ago

  • Assignee deleted (okurz)

So next step could be to learn how to install something, e.g. PXE boot or so.

#6 Updated by okurz 3 months ago

ok, we learned that the reported IPv4 address 10.163.41.238 is actually my own so not helping.

#7 Updated by tinita 3 months ago

  • Subject changed from Use qa-power8 for ppc tests in o3 to Use qa-power8 for ppc tests in o3 size:M
  • Description updated (diff)
  • Status changed from New to Workable

#8 Updated by okurz 3 months ago

I asked in #eng-testing as well as #help-it-ama

(Oliver Kurz) anyone being in or going to Nbg Maxtorhof SRV2 now/soon who could ensure that there is a network connection for QA-Power8 ? Or do you know of a better channel where to coordinate such work?

If somebody in the meantime or if there is no answer stops by one can still ensure that there is a cable connection.

#9 Updated by okurz 3 months ago

acarvajal found that there is a physical connection but he did not remember on which switch port the cable is connected.

#10 Updated by okurz 3 months ago

  • Tags deleted (next-office-day)

no, sorry. IIRC it was the switch at the top of the rack, the row of ports below and either the 5th or 6th port from the left when seeing it from the back

With that and IPMI SoL access that should be enough to make the machine useable again without needing physical access

#11 Updated by mkittler 3 months ago

  • Description updated (diff)

This comment is actually irrelevant, it is about the o3 worker (which #116078 is about).


With that

Ok, so the 3rd suggestion is not relevant after all.

and IPMI SoL access

I still cannot connect via IPMI:

ipmitool -I lanplus -C 3 -H openqaworker-power8-ipmi.suse.de -U … -P … power status
Error in open session response message : insufficient resources for session

Error: Unable to establish IPMI v2 / RMCP+ session

Broken IPMI access is the big problem here. Powering the machine on and off and many more things I've already tried via the BMC didn't help.

#12 Updated by okurz 3 months ago

We can also try make use of the HMC control servers, e.g. "powerhmc1.arch.suse.de" or something

#13 Updated by okurz 3 months ago

We added the machine to both powerhmc1.arch.suse.de and powerhmc3.arch.suse.de and in both cases we can configure the "partitions", i.e. virtual machines, on the PowerVM managed machines. The mode between OPAL – for bare-metal installations – and PowerVM can be switched in ASM, in this case https://qa-power8.qa.suse.de in the menu "Firmware Configuration". For this no HMC connection must be configured. As necessary remove the connection in the HMC and in the ASM menu "Hardware Management Consoles". The problem is that ipmi seems to be not configured to allow remote access so we would need an OS installation on the host first to call ipmitool locally to allow remote access. As alternative we should try to use one of the VMs or create a new VM and try to enable remote access ipmitool.

So the suggestion is: (Re-)install one of the LPAR VMs and try to call ipmitool to allow remote access.

#14 Updated by okurz 3 months ago

  • Copied to action #119755: Use a PowerVM machine to serve both PowerVM LPARs for testing as well as one VM running qemu tests added

#15 Updated by mkittler 3 months ago

What are the ASM credentials for https://qa-power8.qa.suse.de? I tried multiple combinations of standard ones we use elsewhere but nothing worked.

#16 Updated by okurz 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

mkittler wrote:

What are the ASM credentials for https://qa-power8.qa.suse.de? I tried multiple combinations of standard ones we use elsewhere but nothing worked.

provided in chat.

I am now trying myself. First over https://powerhmc3.arch.suse.de/ deleted "doc_vm" and "kdump" to free some ressources, then creating new LPAR.

Following a "template" I created an LPAR named "aixlinux1" using a wizard and selected "Network boot" pointing to 10.162.0.1, qanet.qa.suse.de, using an IPv4 address 10.162.6.80 as from https://gitlab.suse.de/qa-sle/qanet-configs/-/blob/master/etc/dhcpd.conf#L439 assuming that this one is currently unused. The wizard eventually showed an error "REST0293 Error while trying to retrieve Network Adapters: lpar_netboot: The network boot ended in an error."

Then I asked around and got a hint in https://suse.slack.com/archives/C02CANHLANP/p1668603577713149?thread_ts=1668602055.990499&cid=C02CANHLANP from szarate to check instructions from
https://confluence.suse.com/display/qasle/Power8+and+Power9+%28PowerVM%29+on+SLES
From there I also learned that one can connect to an HMC over ssh. I could login over ssh as okurz@powerhmc3.arch.suse.de but I could not add my ssh key. Anyway I considered the terminal access over this path more convenient with just mkvterm -m qa-power8 -p aixlinux1 on powerhmc3.arch.suse.de. With that me with the help of szarate tried multiple approaches, reconfiguring the network devices and such but did not see any DHCP or BOOTP requests on qanet.qa.suse.de . So maybe the ethernet ports are really not connected to a switch.
Also tried to hardcode network settings including gateway 10.162.63.254 based on the template we have seen on "redcurrant".

szarate and me come to the conclusion that likely the network is really not connected, not the BMC ones but the ethernet for the system. So this needs to be checked on "next-office-day" at Maxtorhof.

#17 Updated by okurz 2 months ago

  • Tags set to next-office-day
  • Status changed from In Progress to Workable
  • Assignee deleted (okurz)

#18 Updated by okurz 2 months ago

  • Project changed from SUSE QA to openQA Infrastructure

#19 Updated by okurz 2 months ago

  • Project changed from openQA Infrastructure to SUSE QA
  • Subject changed from Use qa-power8 for ppc tests in o3 size:M to [next-office-day] Use qa-power8 for ppc tests in o3 size:M

#20 Updated by okurz 2 months ago

  • Subject changed from [next-office-day] Use qa-power8 for ppc tests in o3 size:M to [next-office-day] Use qa-power8 for ppc tests in o3 - network connected? size:M

#21 Updated by okurz 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

#22 Updated by okurz 2 months ago

I checked the physical network connections. Updated racktables with entries for the two quad-ethernet access cards. There was already one connection to C10-1 (top) connector, gi30. Updated racktables accordingly. gi30 on qanet15nue.qa.suse.de is in VLAN12 so should be reachable in QA network. Connected a second connection C12-2 (second card on the right, second connector from top) to gi8, added to VLAN12 as well. And connected C10-4 (bottom) to gi2, VLAN12.

#23 Updated by okurz 2 months ago

  • Tags deleted (next-office-day)
  • Subject changed from [next-office-day] Use qa-power8 for ppc tests in o3 - network connected? size:M to Use qa-power8 for ppc tests in o3 - network connected? size:M
  • Due date set to 2022-12-02
  • Status changed from In Progress to Feedback

https://suse.slack.com/archives/C02CANHLANP/p1669026762157029?thread_ts=1668602055.990499&cid=C02CANHLANP

(Oliver Kurz) I have checked the machine QA-Power8 aka ps64mm1070 physically and I can confirm that it has ethernet connections. Moreover I connected two additional network cables now, see https://racktables.nue.suse.com/index.php?page=object&tab=ports&object_id=2350 . @Santiago Zarate @Zaoliang Luo can you help with setting up the network for LPARs?

#24 Updated by okurz 2 months ago

no response yet, asked again

#25 Updated by okurz 2 months ago

  • Status changed from Feedback to In Progress

I will look into setting up LPARs with network on qa-power8

#26 Updated by okurz 2 months ago

https://www.youtube.com/@nigelargriffiths could be helpful. as a reference for "training". In the meantime mgriessmeier and me have experimented and compared the setup on qa-power8 with other machines, e.g. "redcurrant" and "blackcurrant". with the views of the "virtual network topology" it looked like on other machines there is a proper network connection from the physical network all down to the LPARs, on qa-power8 there isn't. We managed to install the VIOS instance. First we tried the internal network connection (192.168.X.X) and a VIOS version which failed to deploy, then we selected the external network connection (10.X.X.X) and select a VIOS version with "SP" in the name. This eventually got stuck. Then we connected over ssh to powerhmc1, text console to the vios instance. That showed a progress of 0% and an error message that no volume group could be found with a waiting time of 105 minutes. We just tried to do something useful in the system, typed "q", "quit", "exit" and then suddenly the installation continued and progressed successfully till the end. Then in HMC the vios instance had a "RMC" connection. Then in the LPAR we could select the "bridged" network, configure an entry in the dhcp config on qanet including PXE, and then we could boot over network from a manual grub prompt using the existing outdated PXE boot entries as template with linux+initrd specified from relative paths to what we could find on qanet. We verified that we could access the YaST installer over ssh but did not progress with the installation. I suggest to redo all the above steps, could even be done on power8.openqanet.opensuse.org, and document again with all knowledge holes filled.

#27 Updated by okurz 2 months ago

  • Status changed from In Progress to Workable
  • Assignee deleted (okurz)

#28 Updated by cdywan 2 months ago

  • Due date deleted (2022-12-02)

No due date for unassigned tickets

#29 Updated by okurz about 2 months ago

  • Project changed from SUSE QA to openQA Infrastructure

#30 Updated by okurz about 1 month ago

  • Related to action #122302: Support SD-105827 "PowerPC often fails to boot from network with 'error: time out opening'" added

#31 Updated by cdywan 20 days ago

Next steps:

  • Connect via SSH
  • Install an operating system
  • Make the machine usable as a worker

We could do that in our mob session on Thursday since this should be a good learning opportunity for those not familiar with the setup, or another ad-hoc slot where there's more people involved.

We'll probably need a follow-up after that to get the network setup correctly after the machine's installed. See #119008 for reference.

#32 Updated by mkittler 11 days ago

I looked into this ticket again, so here a short summary:

  • I was able to login on https://qa-power8.qa.suse.de. Looks like the machine is configured to be in PowerVM mode (which should also explain why IPMI access is not working).
  • The system also appears to be already powered on. Not sure how to connect via SSH, though (and thus also not sure how to proceed with this ticket).
  • I have access to https://powerhmc3.arch.suse.de (not https://powerhmc1.arch.suse.de). It shows the host qa-power8 and I can even open the terminal for the test VM that's present. I'm still not sure how to proceed from there to install an OS. Maybe it helps to connect to the HMC via SSH. What that meant with "Connect via SSH" in the previous comment?
  • I've read #119059#note-16. So this has already been attempted but the network didn't work. After #119059#note-23 and #119059#note-26 this shouldn't be a problem anymore.
  • I could theoretically try to follow-up on #119059#note-26 but the details (which are left to be documented) aren't clear to me. The comment also mentions hmc1 but I have only access to hmc3.

We could do that in our mob session on Thursday since this should be a good learning opportunity for those not familiar with the setup, or another ad-hoc slot where there's more people involved.

This would be a good idea. I feel a bit lost here.

#33 Updated by okurz 11 days ago

So first: It shouldn't technically matter if we use hmc1 or hmc3 but in the future hmc3 is to be used as we have full control over it and everybody has their own account. And I recommend you try to connect to the serial terminal by connecting to the HMC over ssh and then call the mkvterm commands to connect to the terminal. Regarding "ssh" to the final system I what I did was start an installation with Linuxrc parameter ssh=1 and then connect to the SUT, not the HMC

#34 Updated by okurz 3 days ago

#35 Updated by okurz 3 days ago

We took another go at this story and connected with ssh to powerhmc3.arch.suse.de, rebooted the machine in SMS and selected network boot ending up in the grub menu served by qanet. We could continue from here, install something like a current Leap 15.4 and similar as for grenache-1 we could try to use nested kvm on that. But to be able to use the machine within o3 we also need an HVM there. Consider this out of scope for the current ticket, see #123712 for that.

I disconnected qa-power8 from HMC within https://powerhmc3.arch.suse.de/dashboard/#resources/systems so that we can switch to OPAL mode in the ASM and try to have IPMI and then continue with an installation of bare-metal Leap and then use as openQA kvm worker and then move to o3 network.

#36 Updated by mkittler 3 days ago

I disconnected qa-power8 from HMC within https://powerhmc3.arch.suse.de/dashboard/#resources/systems

Somehow that's not recognized by the ASM. It still says "Changing the firmware type is only allowed when the system is in powered off state and not managed by a management console." despite the system being powered off for quite a while now. That also means I could not enable IPMI so far. I hope I've been using the right ASM. (I have been using https://qa-power8.qa.suse.de as mentioned in the ticket description.)

#37 Updated by okurz 3 days ago

Well, then I suggest to reconnect with the HMC, ensure it's powered off and then disconnect again

Also available in: Atom PDF