action #169564
closedConfigure wireguard tunnels on OSD production hosts needed for openQA located in the NUE2 server room size:S
Added by mkittler about 1 month ago. Updated 20 days ago.
0%
Description
Acceptance criteria¶
- AC1: All OSD production hosts needed for openQA in the NUE2 server room that are managed via Salt have WireGuard setup via Salt so they can reach the CC area
- AC2: The setup is reproducible
Suggestions¶
- Follow steps on https://confluence.suse.com/display/~dawei_pang/VMs+on+vm-server.qa2.suse.asia+accessing+CC+area#VMsonvmserver.qa2.suse.asiaaccessingCCarea-HowtoprepareWGonyourVMs on one host and prepare a Salt change to apply this to other relevant hosts.
- Introduce a special role or add a condition based on worker classes to setup WireGuard only on hosts in the NUE2 server room.
- Take https://confluence.suse.com/display/enginfra/Wireguard+gateway+-+auto+configuration+tool as inspiration for the Salt change.
- This involves letting IT do the final configuration manually. Supposedly that's also where the keypair is generated and the public key copied over to the WG gateway.
- Have a look at https://sd.suse.com/servicedesk/customer/portal/1/SD-171369 in case we get a response from IT after all.
- Talk to Beijing Colleagues who have already been through this.
- Put into salt or documentation what needs to be done if we want to reproduce, e.g. put private keys into the salt pillar repo
- When done, add affected workers back to Salt, e.g. via
for key in petrol.qe.nue2.suse.org sapworker1.qe.nue2.suse.org diesel.qe.nue2.suse.org mania.qe.nue2.suse.org; do salt-key --accept="$key" --include-rejected --yes; done
Updated by okurz about 1 month ago
- Priority changed from Normal to High
- Target version set to Ready
Updated by mkittler about 1 month ago
- Blocks action #169159: Allow variable expansion incorporating worker settings size:S added
Updated by mkittler about 1 month ago
- Status changed from New to In Progress
- Assignee set to mkittler
Updated by mkittler about 1 month ago · Edited
MR: https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/5779
Updated SD ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-171369
Slack thread: https://suse.slack.com/archives/C029APBKLGK/p1731323391495109
I added also the monitoring host even though we probably have different long-term plans for this hosts. The config might be useful until we have moved the host.
I added also the powered-off arm worker because it might be useful if we decide to use it again.
I also installed wireguard-tools on all relevant hosts and added the authorized key as mentioned on the Confluence page. This needs to be done manually because Salt is also affected. I nevertheless created a draft to still have the setup "documented" in Salt: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1304
Waiting for feedback from IT.
Updated by mkittler about 1 month ago
- Status changed from In Progress to Feedback
We'll get feedback earliest next week.
Updated by okurz about 1 month ago
- Subject changed from Configure wireguard tunnels on hosts located in the NUE2 server room to Configure wireguard tunnels on hosts located in the NUE2 server room size:S
- Description updated (diff)
Updated by okurz about 1 month ago
- Related to action #169348: Custom, non-IT-provided wireguard tunnels to connect NUE2 OSD openQA workers to OSD added
Updated by szarate about 1 month ago
- Blocks openqa-force-result #169834: [qe-core] Unschedule PowerKVM tests for Maintenance updates while keeping ppc64le architecture still running for PowerVM - auto_review:".*_EXIT_AFTER_SCHEDULE. Only evaluating test schedule":force_result:softfailed added
Updated by mkittler about 1 month ago
- Status changed from Feedback to Blocked
Blocked by getting feedback on the MR and SD-ticket.
Updated by okurz about 1 month ago
- Copied to action #170041: Configure wireguard tunnels on hosts located in the NUE2 server room - at least one KVM@PowerNV host size:S added
Updated by okurz about 1 month ago
mkittler wrote:
Acceptance criteria¶
- AC1: All hosts in the NUE2 server room that are managed via Salt have WireGuard setup via Salt so they can reach the CC area […]
- When done, add affected workers back to Salt, e.g. via
for key in petrol.qe.nue2.suse.org sapworker1.qe.nue2.suse.org diesel.qe.nue2.suse.org mania.qe.nue2.suse.org; do salt-key --accept="$key" --include-rejected --yes; done
Hi mkittler, in AC1 there is "All hosts in NUE2 […] managed via Salt" but the last suggestion only mentions openQA workers which is a discrepancy as there are more salt controlled hosts which are not OSD openQA workers, e.g. monitor, backup, etc. I suggest you create separate tickets for according groups. I just created #170041 for the KVM@PowerNV hosts diesel, petrol, mania. You could look into the other group of hosts based on if it turns out if we actually need wireguard tunnels.
Updated by okurz 29 days ago
Nikolay, Lazaros and me had a call today about the last comments and open points:
- Priority is to have https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/5779 merged. Nikolay already provided comments. We will react.
- As older ppc64le Linux 5.3 do not support the current wireguard package we should focus on sapworker1
- For bare-metal test hosts we should try to get them working using sapworker1 as openQA control host as before. As plan B we could follow up with setting up wireguard for those but that would need test maintainers to adapt test code to install and setup wireguard as part of the tests.
- Certain problems regarding DNS resolution are expected which are likely less of a concern for openQA workers as they establish the connection to the openQA webUI.
Updated by okurz 28 days ago
I did
for i in backup-qam.qe.nue2.suse.org backup-vm.qe.nue2.suse.org baremetal-support.qe.nue2.suse.org jenkins.qe.nue2.suse.org monitor.qe.nue2.suse.org openqa-piworker.qe.nue2.suse.org osiris-1.qe.nue2.suse.org qamaster.qe.nue2.suse.org schort-server.qe.nue2.suse.org tumblesle.qe.nue2.suse.org unreal6.qe.nue2.suse.org openqaworker1.qe.nue2.suse.org diesel.qe.nue2.suse.org mania.qe.nue2.suse.org petrol.qe.nue2.suse.org sapworker1.qe.nue2.suse.org ; do echo "### $i" && ssh $i "sudo grep -q 'root@atlas$' /root/.ssh/authorized_keys || echo 'ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBOdQtABW5WPNpAtV0shvOTQi05M6SEUGrXLGuMByWApgwQpWEM41vjWeVIoKim7Y7x62rX99UvC5CiKvG4Do9CI= root@atlas' | sudo tee -a /root/.ssh/authorized_keys" ; done
to deploy the ssh key https://confluence.suse.com/download/attachments/1593344189/wg-prg2-nue2.pub?version=1&modificationDate=1731513804592&api=v2 as suggested in https://sd.suse.com/servicedesk/customer/portal/1/SD-171369
Updated by mkittler 24 days ago
It looks like the setup works on sapworker1. I can reach OSD and download.suse.de via HTTP. The worker also appears as online and is picking up jobs.
So far test results don't look good, though: https://openqa.suse.de/tests/15987565
So we'll have to have an eye on that.
An additional problem is that the salt-minion still cannot connect to OSD:
Nov 25 10:55:55 sapworker1 salt-minion[76940]: [ERROR ] Failed to send msg SaltReqTimeoutError('Message timed out',)
Nov 25 10:55:55 sapworker1 salt-minion[76940]: [ERROR ] Error while bringing up minion for multi-master. Is master at openqa.suse.de responding?
Of course I accepted the key on OSD and I have also restarted salt-minion.service
.
I replied on the SD ticket to have the config applied on all hosts where it is possible.
Updated by okurz 24 days ago
Please put openqaworker-arm-1 out of production again and power it off. https://racktables.nue.suse.com/index.php?page=object&object_id=9886 has the machine correctly marked as "unused" with a link to #167057. Priority should be machines that are currently in production use.
Updated by openqa_review 23 days ago
- Due date set to 2024-12-10
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 23 days ago
- Copied to action #170260: Help others (or ourselves) to configure wireguard tunnels on other hosts needing wireguard to PRG2 in the NUE2 server room size:M added
Updated by mkittler 23 days ago
I created https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/941 to avoid further test failures due baremetal hosts not reaching OSD for assets. We might need to create a follow-up ticket, so far I tracked it via #168097#note-29.
Updated by mkittler 23 days ago
- Blocks deleted (action #169159: Allow variable expansion incorporating worker settings size:S)
Updated by mkittler 23 days ago
I updated https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1304. It now also contains a README section to explain the Wireguard setup so we can continue with other hosts more easily in the follow-up ticket.
Not sure whether it makes sense to add /etc/wireguard/prg2wg.conf
to Salt. It contains the private key so we needed to add that to the Pillars first. It also contains a list of allowed IPs which differs between hosts and I'm not sure how it is generated. Maybe we should skip this file considering it is configured by Eng-Infra. We could salt the configured systemd units but they depend on the config file so it doesn't make that much sense alone. So I only added this information to the README (for the sake of troubleshooting).
Updated by openqa_review 22 days ago
- Due date set to 2024-12-11
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 22 days ago
- Related to action #170338: No monitoring data from OSD since 2024-11-25 1449Z size:M added
Updated by mkittler 20 days ago
- Status changed from In Progress to Feedback
I hope this simple MR suffices as a backup: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/949
The backup and the added documentation are hopefully enough to call the setup "reproducible" as per AC2.
Updated by okurz 20 days ago
- Due date deleted (
2024-12-11) - Status changed from Feedback to Resolved
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/949 merged. I agree that with this we should consider this ticket resolved. I guess we will find out in #170260 if it's clear enough where to follow up for other hosts :)