action #169564
closedConfigure wireguard tunnels on OSD production hosts needed for openQA located in the NUE2 server room size:S
0%
Description
Acceptance criteria¶
- AC1: All OSD production hosts needed for openQA in the NUE2 server room that are managed via Salt have WireGuard setup via Salt so they can reach the CC area
- AC2: The setup is reproducible
Suggestions¶
- Follow steps on https://confluence.suse.com/display/~dawei_pang/VMs+on+vm-server.qa2.suse.asia+accessing+CC+area#VMsonvmserver.qa2.suse.asiaaccessingCCarea-HowtoprepareWGonyourVMs on one host and prepare a Salt change to apply this to other relevant hosts.
- Introduce a special role or add a condition based on worker classes to setup WireGuard only on hosts in the NUE2 server room.
- Take https://confluence.suse.com/display/enginfra/Wireguard+gateway+-+auto+configuration+tool as inspiration for the Salt change.
- This involves letting IT do the final configuration manually. Supposedly that's also where the keypair is generated and the public key copied over to the WG gateway.
- Have a look at https://sd.suse.com/servicedesk/customer/portal/1/SD-171369 in case we get a response from IT after all.
- Talk to Beijing Colleagues who have already been through this.
- Put into salt or documentation what needs to be done if we want to reproduce, e.g. put private keys into the salt pillar repo
- When done, add affected workers back to Salt, e.g. via
for key in petrol.qe.nue2.suse.org sapworker1.qe.nue2.suse.org diesel.qe.nue2.suse.org mania.qe.nue2.suse.org; do salt-key --accept="$key" --include-rejected --yes; done
Updated by mkittler 4 months ago
- Blocks action #169159: Allow variable expansion incorporating worker settings size:S added
Updated by mkittler 4 months ago · Edited
MR: https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/5779
Updated SD ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-171369
Slack thread: https://suse.slack.com/archives/C029APBKLGK/p1731323391495109
I added also the monitoring host even though we probably have different long-term plans for this hosts. The config might be useful until we have moved the host.
I added also the powered-off arm worker because it might be useful if we decide to use it again.
I also installed wireguard-tools on all relevant hosts and added the authorized key as mentioned on the Confluence page. This needs to be done manually because Salt is also affected. I nevertheless created a draft to still have the setup "documented" in Salt: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1304
Waiting for feedback from IT.
Updated by okurz 4 months ago
- Related to action #169348: Custom, non-IT-provided wireguard tunnels to connect NUE2 OSD openQA workers to OSD added
Updated by szarate 4 months ago
- Blocks openqa-force-result #169834: [qe-core] Unschedule PowerKVM tests for Maintenance updates while keeping ppc64le architecture still running for PowerVM - auto_review:".*_EXIT_AFTER_SCHEDULE. Only evaluating test schedule":force_result:softfailed added
Updated by okurz 4 months ago
- Copied to action #170041: Configure wireguard tunnels on hosts located in the NUE2 server room - at least one KVM@PowerNV host size:S added
Updated by okurz 4 months ago
mkittler wrote:
Acceptance criteria¶
- AC1: All hosts in the NUE2 server room that are managed via Salt have WireGuard setup via Salt so they can reach the CC area […]
- When done, add affected workers back to Salt, e.g. via
for key in petrol.qe.nue2.suse.org sapworker1.qe.nue2.suse.org diesel.qe.nue2.suse.org mania.qe.nue2.suse.org; do salt-key --accept="$key" --include-rejected --yes; done
Hi mkittler, in AC1 there is "All hosts in NUE2 […] managed via Salt" but the last suggestion only mentions openQA workers which is a discrepancy as there are more salt controlled hosts which are not OSD openQA workers, e.g. monitor, backup, etc. I suggest you create separate tickets for according groups. I just created #170041 for the KVM@PowerNV hosts diesel, petrol, mania. You could look into the other group of hosts based on if it turns out if we actually need wireguard tunnels.
Updated by okurz 3 months ago
Nikolay, Lazaros and me had a call today about the last comments and open points:
- Priority is to have https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/5779 merged. Nikolay already provided comments. We will react.
- As older ppc64le Linux 5.3 do not support the current wireguard package we should focus on sapworker1
- For bare-metal test hosts we should try to get them working using sapworker1 as openQA control host as before. As plan B we could follow up with setting up wireguard for those but that would need test maintainers to adapt test code to install and setup wireguard as part of the tests.
- Certain problems regarding DNS resolution are expected which are likely less of a concern for openQA workers as they establish the connection to the openQA webUI.
Updated by okurz 3 months ago
I did
for i in backup-qam.qe.nue2.suse.org backup-vm.qe.nue2.suse.org baremetal-support.qe.nue2.suse.org jenkins.qe.nue2.suse.org monitor.qe.nue2.suse.org openqa-piworker.qe.nue2.suse.org osiris-1.qe.nue2.suse.org qamaster.qe.nue2.suse.org schort-server.qe.nue2.suse.org tumblesle.qe.nue2.suse.org unreal6.qe.nue2.suse.org openqaworker1.qe.nue2.suse.org diesel.qe.nue2.suse.org mania.qe.nue2.suse.org petrol.qe.nue2.suse.org sapworker1.qe.nue2.suse.org ; do echo "### $i" && ssh $i "sudo grep -q 'root@atlas$' /root/.ssh/authorized_keys || echo 'ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBOdQtABW5WPNpAtV0shvOTQi05M6SEUGrXLGuMByWApgwQpWEM41vjWeVIoKim7Y7x62rX99UvC5CiKvG4Do9CI= root@atlas' | sudo tee -a /root/.ssh/authorized_keys" ; done
to deploy the ssh key https://confluence.suse.com/download/attachments/1593344189/wg-prg2-nue2.pub?version=1&modificationDate=1731513804592&api=v2 as suggested in https://sd.suse.com/servicedesk/customer/portal/1/SD-171369
Updated by mkittler 3 months ago
It looks like the setup works on sapworker1. I can reach OSD and download.suse.de via HTTP. The worker also appears as online and is picking up jobs.
So far test results don't look good, though: https://openqa.suse.de/tests/15987565
So we'll have to have an eye on that.
An additional problem is that the salt-minion still cannot connect to OSD:
Nov 25 10:55:55 sapworker1 salt-minion[76940]: [ERROR ] Failed to send msg SaltReqTimeoutError('Message timed out',)
Nov 25 10:55:55 sapworker1 salt-minion[76940]: [ERROR ] Error while bringing up minion for multi-master. Is master at openqa.suse.de responding?
Of course I accepted the key on OSD and I have also restarted salt-minion.service
.
I replied on the SD ticket to have the config applied on all hosts where it is possible.
Updated by okurz 3 months ago
Please put openqaworker-arm-1 out of production again and power it off. https://racktables.nue.suse.com/index.php?page=object&object_id=9886 has the machine correctly marked as "unused" with a link to #167057. Priority should be machines that are currently in production use.
Updated by openqa_review 3 months ago
- Due date set to 2024-12-10
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 3 months ago
- Copied to action #170260: Help others (or ourselves) to configure wireguard tunnels on other hosts needing wireguard to PRG2 in the NUE2 server room size:M added
Updated by mkittler 3 months ago
I created https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/941 to avoid further test failures due baremetal hosts not reaching OSD for assets. We might need to create a follow-up ticket, so far I tracked it via #168097#note-29.
Updated by mkittler 3 months ago
- Blocks deleted (action #169159: Allow variable expansion incorporating worker settings size:S)
Updated by mkittler 3 months ago
I updated https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1304. It now also contains a README section to explain the Wireguard setup so we can continue with other hosts more easily in the follow-up ticket.
Not sure whether it makes sense to add /etc/wireguard/prg2wg.conf
to Salt. It contains the private key so we needed to add that to the Pillars first. It also contains a list of allowed IPs which differs between hosts and I'm not sure how it is generated. Maybe we should skip this file considering it is configured by Eng-Infra. We could salt the configured systemd units but they depend on the config file so it doesn't make that much sense alone. So I only added this information to the README (for the sake of troubleshooting).
Updated by openqa_review 3 months ago
- Due date set to 2024-12-11
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 3 months ago
- Related to action #170338: No monitoring data from OSD since 2024-11-25 1449Z size:M added
Updated by mkittler 3 months ago
- Status changed from In Progress to Feedback
I hope this simple MR suffices as a backup: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/949
The backup and the added documentation are hopefully enough to call the setup "reproducible" as per AC2.
Updated by okurz 3 months ago
- Due date deleted (
2024-12-11) - Status changed from Feedback to Resolved
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/949 merged. I agree that with this we should consider this ticket resolved. I guess we will find out in #170260 if it's clear enough where to follow up for other hosts :)