action #181004
openTry to install and connect a SUSE managed Power 10 LPAR to o3 and use it as (qemu) worker size:S
Added by nicksinger about 2 months ago. Updated 2 days ago.
0%
Description
Motivation¶
There is an ongoing discussion within SUSE how to properly connect our currently available ppc Power 10 resources and test within o3.
In a private Slack discussion we agreed on testing a Power 10 LPAR with the "normal" openQA qemu instances as worker.
Michal Suchanek provided us with a SLE15 SP7 test LPAR called "lingonberry-15". We should test if we can install openqa-worker instances on there and connect them to o3
Acceptance Criteria¶
- AC1: Tests can be scheduled on lingonberry
- AC2: lingonberry is mentioned in workerconf
- AC3: https://progress.opensuse.org/projects/openqav3/wiki/#SSH-configuration lists the machine, similar to kerosene
Suggestions¶
- Use our common resources to install required packages to make the LPAR act as worker
- Connect it to o3 and try if it can execute tests
- Lookup lingonberry-15 on Slack or ask Michael or Nick for more details
- c.f. kerosene for a similar setup which is also in SUSE network but connects to o3 over https
Out of scope¶
The machine should be ready to use as-is. Setting up LPAR is not needed. We also knowingly try to avoid a openSUSE/o3 operated HMC so the existing resources withing SUSE should suffice to conduct this experiment
Files
Screenshot from 2025-05-29 16-04-32.png (383 KB) Screenshot from 2025-05-29 16-04-32.png | gpathak, 2025-05-29 10:42 | ||
Screenshot from 2025-05-29 16-12-50.png (205 KB) Screenshot from 2025-05-29 16-12-50.png | gpathak, 2025-05-29 10:42 |
Updated by livdywan about 2 months ago · Edited
- Category set to Feature requests
- Target version set to Tools - Next
@nicksinger As you marked it as High , and the machine is already available, I take it we should look into it quite soon? Hence putting this in Next .
Updated by livdywan about 2 months ago
- Tags set to infra
And I guess this is o3 production infrastructure, hence adding the tag.
Updated by okurz about 1 month ago
- Target version changed from Tools - Next to Ready
Updated by livdywan about 1 month ago
- Subject changed from Try to install and connect a SUSE managed Power 10 LPAR to o3 and use it as (qemu) worker to Try to install and connect a SUSE managed Power 10 LPAR to o3 and use it as (qemu) worker size:S
- Description updated (diff)
Updated by gpathak 13 days ago
- Status changed from Workable to Blocked
- Assignee set to gpathak
Blocking this because o3 is down: https://suse.slack.com/archives/C02AET1AAAD/p1747431091299049?thread_ts=1747407687.687679&cid=C02AET1AAAD
Updated by gpathak 7 days ago
Installed openqa-worker in lingonberry. Used https://download.opensuse.org/repositories/devel:/openQA/openSUSE_Factory_PowerPC/devel:openQA.repo
for installing openQA packages.
Also, enabled
SUSEConnect --product sle-module-transactional-server/15.7/ppc64le
SUSEConnect --product sle-module-development-tools/15.7/ppc64le
Tried running two tests as well:
But lingonberry is unable to ping and unable to ssh into ariel.dmz-prg2.suse.org
Updated by gpathak 6 days ago
- Status changed from In Progress to Workable
- Assignee deleted (
gpathak)
SSH from lingonberry-15 to openqa.opensuse.org isn't working, and because of this above two tests failed.
SSH gets stuck at debug3: set_sock_tos: set socket 3 IP_TOS 0x10
, I also tried using -o IPQoS=none
but it didn't work.
I am unassigning this, maybe someone else can take a look to check if there is any issue with the network.
I will focus on other tasks at hand right now.
Updated by okurz 6 days ago
gpathak wrote in #note-16:
[…]
I am unassigning this, maybe someone else can take a look to check if there is any issue with the network.
As I wrote in #181004-15 I actually expected that hosts from SUSE networks can not generally access ariel. I guess that NUE2 QE machines as well as any workstations or notebooks can access ssh to ariel but lingonberry being in a separate network can't. As workaround don't configure TESTPOOLSERVER and setup a systemd timer every minute to execute https://github.com/os-autoinst/openQA/blob/master/script/fetchneedles on the machine.
Updated by gpathak 5 days ago
okurz wrote in #note-17:
As workaround don't configure TESTPOOLSERVER and setup a systemd timer every minute to execute https://github.com/os-autoinst/openQA/blob/master/script/fetchneedles on the machine.
Does this script https://github.com/os-autoinst/openQA/blob/master/script/fetchneedles gets installed with openQA-worker package?
Seems like it is a part of openQA webui: https://github.com/os-autoinst/openQA/blob/master/dist/rpm/openQA.spec#L637
Updated by okurz 5 days ago
gpathak wrote in #note-18:
okurz wrote in #note-17:
As workaround don't configure TESTPOOLSERVER and setup a systemd timer every minute to execute https://github.com/os-autoinst/openQA/blob/master/script/fetchneedles on the machine.
Does this script https://github.com/os-autoinst/openQA/blob/master/script/fetchneedles gets installed with openQA-worker package?
Seems like it is a part of openQA webui: https://github.com/os-autoinst/openQA/blob/master/dist/rpm/openQA.spec#L637
Yes. Better just download that one script
Updated by gpathak 5 days ago
Performed manual steps for using fetchneedles script using systemd timer.
-
Created two directories:
sudo mkdir -p /var/lib/openqa/share/{tests,factory}
-
Created a new user
geekotest
:
sudo useradd -N -d /var/lib/openqa -u 1001 -g 65534 geekotest -s /bin/bash -c "openQA user"
-
Changed password
sudo passwd geekotest
-
Created a file
/var/lib/openqa/.gitconfig
with contents:
[safe]
directory = /var/lib/openqa/share/tests/opensuse
-
Changed ownership:
sudo chown -v geekotest:nogroup /var/lib/openqa/.gitconfig
-
Changed ownership of tests dir:
sudo chown -Rv geekotest:nogroup /var/lib/openqa/share/tests
-
Copied fetchneedles to
/var/lib/openqa/script
-
Created
fetch-needles.timer
andfetch-needles.service
in/usr/lib/systemd/system/
:
fetch-needles.timer:
[Unit]
Description=Fetch openQA needles every 60s from remote git repository
[Timer]
OnCalendar=*:*:00
Persistent=true
[Install]
WantedBy=timers.target
fetch-needles.service:
[Unit]
Description=openQA needles fetcher task
ConditionPathIsReadWrite=/var/lib/openqa/share/tests
[Service]
Type=exec
User=_openqa-worker
ExecStart=/usr/share/openqa/script/fetchneedles
- Started and enabled
fetch-needles.timer
:sudo systemctl enable --now fetch-needles.timer
Updated by gpathak 4 days ago
Created MR for workerconf.sls
: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/1035
Updated by okurz 4 days ago
- Status changed from Feedback to Workable
- Priority changed from Normal to High
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/1035 merged. Looks good. I also added the worker class cpu-power10
and wrote a message in Slack #eng-testing https://suse.slack.com/archives/C02CANHLANP/p1748449440749439
To everyone interested in PowerPC testing: Thanks to
@Gaurav Pathak
’s great work with support by the QE Tools team we have a Power10 qemu worker in o3. https://openqa.opensuse.org/tests/5072114# as a passed test run is proof. Details are in https://progress.opensuse.org/issues/181004 . You are welcome to try by scheduling more tests on ppc64le. If you need to run on PowerPC+qemu+Power10 in particular then schedule with the additional test settingWORKER_CLASS=cpu-power10
to restrict to only those worker instances.
Also in Slack #discuss-powerpc-architecture https://suse.slack.com/archives/C04K6388YUX/p1748449139260319
@Michal Suchanek
you provided lingonberry-15 to us and we managed to set this LPAR up as openqa.opensuse.org worker. I wanted to report that thanks to
@Gaurav Pathak
’s great work https://openqa.opensuse.org/tests/5072114# as a passed test run is proof that the machine is now in operation. Details are in https://progress.opensuse.org/issues/181004
Right now https://openqa.opensuse.org/admin/workers/1454 shows "Unavailable" and [warn] Worker cache not available via http://127.0.0.1:9530: Cache service queue already full (5)
and a previous job incomplete due to "out of space". Also multiple incompletes in https://openqa.opensuse.org/tests/overview?distri=microos&distri=opensuse&version=Tumbleweed&build=20250527&groupid=4 . Please look into that.
Updated by gpathak 3 days ago
- File Screenshot from 2025-05-29 16-04-32.png Screenshot from 2025-05-29 16-04-32.png added
- File Screenshot from 2025-05-29 16-12-50.png Screenshot from 2025-05-29 16-12-50.png added
Thanks to @michals for pointing to 420G unused volume.
I have utilised it for /var/lib/openqa
Updated by openqa_review 3 days ago
- Due date set to 2025-06-13
Setting due date based on mean cycle time of SUSE QE Tools
Updated by livdywan 3 days ago
A couple points raised in the daily
- Please try and see if you can start apparmor
- Consider checking why the uploading is so slow
- https://openqa.opensuse.org/tests/5074209#dependencies looks to have a very slow bitrate
- Does this affect any other jobs? Machines?
Updated by gpathak 2 days ago
livdywan wrote in #note-34:
A couple points raised in the daily
- Please try and see if you can start apparmor
Starting apparmor didn't help, the upload speed was same, also tried disabling firewalld completely with no improvement.
- Consider checking why the uploading is so slow
- https://openqa.opensuse.org/tests/5074209#dependencies looks to have a very slow bitrate
- Does this affect any other jobs? Machines?
Normally, the upload speed is upto ~950 KiB/s, and upload completes in approximately 26 to 30 minutes:
Updated by gpathak 2 days ago
Seems like nothing much can be done from Linux to change LPAR ethernet settings:
lingonberry-15:~/:[0]# ethtool eth0
Settings for eth0:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: off
Port: Other
PHYAD: 0
Transceiver: internal
Link detected: yes
lingonberry-15:~/:[0]#
Here Auto-negotiation
and Advertised auto-negotiation
is off as compared to other power LPAR kerosene
:
kerosene-8:~ # ethtool eth7
Settings for eth7:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: on
Port: Twisted Pair
PHYAD: 4
Transceiver: internal
MDI-X: on
Supports Wake-on: g
Wake-on: g
Current message level: 0x000000ff (255)
drv probe link timer ifdown ifup rx_err tx_err
Link detected: yes
kerosene-8:~ #
Updated by michals 2 days ago
The network speed looks normal, this is probably not inherent to the LPAR networking:
scp /scratch/SL-Micro.ppc64le-6.2-Default-4096-SelfInstall-Beta4.install.iso lingonberry-15.arch.suse.de:
(root@lingonberry-15.arch.suse.de) Password:
SL-Micro.ppc64le-6.2-Default-4096-SelfInstall-Beta4.install.iso 100% 1327MB 110.2MB/s 00:12
scp lingonberry-15.arch.suse.de:SL-Micro.ppc64le-6.2-Default-4096-SelfInstall-Beta4.install.iso /scratch/SL-Micro.ppc64le-6.2-Default-4096-SelfInstall-Beta4.install.iso.1
(root@lingonberry-15.arch.suse.de) Password:
SL-Micro.ppc64le-6.2-Default-4096-SelfInstall-Beta4.install.iso 100% 1327MB 88.8MB/s 00:14