action #120270
closedConduct the migration of SUSE openQA systems IPMI from Nbg SRV1 to new security zones size:M
0%
Description
Motivation¶
See parent #116623
Acceptance criteria¶
- AC1: All IPMI interfaces of openQA machines in Nbg SRV1 are in new security zones
- AC2: All IPMI interfaces of openQA machines in Nbg SRV1 are fully usable in production
- AC3: All documentation referencing O3+OSD ipmi interfaces are up-to-date
- AC4: Our automated tools using O3+OSD ipmi interfaces are up-to-date e.g. GitLab pipelines and salt states
Suggestions¶
- Monitor Slack #discuss-qe-new-security-zones
- Ensure access over the new way is possible
- Document changes in our infrastructure documentation, e.g. progress.opensuse.org/projects/openqav3/wiki/, https://wiki.suse.net/index.php/OpenQA, https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls
- Ensure https://gitlab.suse.de/openqa/salt-pillars-openqa#get-ipmi-definition-aliases works the new way
- Update https://gitlab.suse.de/openqa/grafana-webhook-actions
Open points¶
- Where is the documentation by SUSE-IT
- Where is the git repo handling ssh keys
- Fix the multi-second login time over ssh (workaround: use
ssh -4
)
Updated by okurz about 2 years ago
- Description updated (diff)
Preliminary instructions in https://suse.slack.com/archives/C0488BZNA5S/p1668011380114319
(Martin Caj) you are not need to be root there to do ssh / ipmi. try this: ssh jumpy@qe-jumpy.suse.de
(Nick Singer) I'm able to log onto the machine. Let me try ipmi access to one of the migrated hosts
(Oliver Kurz) can you tell me the git repo where you manage the keys so that I can check myself and add an ed25519 key
(Oliver Kurz) it takes rather long to login: time ssh jumpy@qe-jumpy.suse.de true takes 3.8s!
(Nick Singer) yes the long connection time I also realized. might get annoying in the future and might even break automated pipelines or require special, dirty hacks to increase timeouts
Updated by okurz about 2 years ago
So far the conversion rule seems to be:
sed 's/ipmitool/ssh -4 jumpy@qe-jumpy.suse.de -- &/;s/-ipmi\.suse\.de/-ipmi.qe-ipmi-ur/' openqa/workerconf.sls
only covering hostnames ending in "-ipmi". For others we will have to find out what mcaj will think of :)
Updated by okurz about 2 years ago
- Copied to action #120288: [tools] cloud based tests fail due to traffic to cloud blocked auto_review:"2022-11-0.*Test died: (Waiting for Godot.*ssh|Cannot find image after upload)":retry added
Updated by livdywan about 2 years ago
- Subject changed from Conduct the migration of SUSE openQA systems IPMI from Nbg SRV1 to new security zones to Conduct the migration of SUSE openQA systems IPMI from Nbg SRV1 to new security zones size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz about 2 years ago
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/458 for the conversion of salt-pillars
Updated by mkittler about 2 years ago
- Blocked by action #120651: [openQA][infra][ipmi][worker][api] The expected pattern CMD_FINISHED-xxxxx returned but did not show up in serial log (wait_serial timed out) size:M added
Updated by mkittler about 2 years ago
- Status changed from Workable to Blocked
We should not move grenache-1 to the new security zone as it is the last machine that can conduct tests being broken on worker2 in the new security zone (see #120270).
Updated by okurz almost 2 years ago
- Project changed from 46 to openQA Infrastructure
- Assignee set to mkittler
Updated by livdywan almost 2 years ago
- Status changed from Blocked to Workable
okurz wrote:
I guess you meant to block on #120261, right? Also please only use "Blocked" with assignee.
Looks like this is unblocked now
Updated by mkittler almost 2 years ago
- Status changed from Workable to In Progress
True. Tests using the IPMI backend are impaired (https://progress.opensuse.org/issues/120651) but that's not due to the IPMI interface itself.
I was told that AC1 and AC2 are now implemented so I'm checking what's left for AC3 and AC4.
Updated by mkittler almost 2 years ago
I ran test-ipmi-access to verify AC3 (with updated regex, see https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/477). Some ipmi commands using jumpy@qe-jumpy
still fail. I logged in on the jump host and ran the failing commands for relevant hosts (those in SRV1) manually:
jumpy@qe-jumpy:~> ipmitool -I lanplus -C 3 -H openqa-aarch64-ipmi.qe-ipmi-ur -U … -P …
Address lookup for openqa-aarch64-ipmi.qe-ipmi-ur failed
Could not open socket!
Error: Unable to establish IPMI v2 / RMCP+ session
jumpy@qe-jumpy:~> ipmitool -I lanplus -C 3 -H openqaworker-power8-ipmi.qe-ipmi-ur -U … -P …
Address lookup for openqaworker-power8-ipmi.qe-ipmi-ur failed
Could not open socket!
Error: Unable to establish IPMI v2 / RMCP+ session
jumpy@qe-jumpy:~> ipmitool -I lanplus -C 3 -H rebel-ipmi.qe-ipmi-ur -U … -P …
Address lookup for rebel-ipmi.qe-ipmi-ur failed
Could not open socket!
Error: Unable to establish IPMI v2 / RMCP+ session
Note that not all such hosts are broken (e.g. jumpy@qe-jumpy:~> ipmitool -I lanplus -C 3 -H openqaworker4-ipmi.qe-ipmi-ur …
works fine).
Looks like IPs for the hosts are hardcoded in /etc/hosts
on jumpy@qe-jumpy
. The failing hosts are missing in that list so supposedly someone needs to update the list. I cannot do it because I don't know the IPs so I've been asking in the chat.
Updated by mkittler almost 2 years ago
Looks like https://gitlab.suse.de/openqa/grafana-webhook-actions/-/blob/master/.gitlab-ci.yml is only about workers not in SRV1.
https://gitlab.suse.de/openqa/monitor-o3/-/blob/master/.gitlab-ci.yml needs to be adjusted but first the issues mentioned in my previous comment need to be resolved. For being able to use the jumphost the script also needs to be changed a little.
Updated by openqa_review almost 2 years ago
- Due date set to 2023-01-26
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler almost 2 years ago
- power8 is broken anyways so we can exclude it
- aarch64 and rebel have simply not been migrated yet; the old IPMI commands still work (so supposedly the documentation should be reverted for now: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/479)
Updated by mkittler almost 2 years ago
- Status changed from In Progress to Feedback
Updated by mkittler almost 2 years ago
SR for updating monitor-o3: https://gitlab.suse.de/openqa/monitor-o3/-/merge_requests/7
Updated by okurz almost 2 years ago
- Due date deleted (
2023-01-26) - Status changed from Feedback to Blocked
Updated by okurz almost 2 years ago
- Status changed from Blocked to Feedback
But the gitlab CI pipelines seem to have problems to reach the jumpy host, see e.g. https://gitlab.suse.de/openqa/monitor-o3/-/jobs/1346163#L37
Updated by mkittler almost 2 years ago
This SR should fix it: https://gitlab.suse.de/openqa/monitor-o3/-/merge_requests/8
Updated by mkittler almost 2 years ago
- Status changed from Feedback to Blocked
The SR has been merged and it works. So this leaves me waiting for https://sd.suse.com/servicedesk/customer/portal/1/SD-109299.
Updated by mkittler almost 2 years ago
- Status changed from Blocked to Feedback
The remaining IPMI interfaces have been migrated. Now only the documentation/automation changes are pending:
- https://gitlab.suse.de/openqa/monitor-o3/-/merge_requests/10
- https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/489
I've also gone through the list of IPMI commands in the pillars and only the following hosts have not been migrated:
- not used anymore anyways or broken: openqaworker1, imagetester, power8
- special: sp.openqaw5-xen.qa.suse.de
- Prague located: openqaworker14-ipmi.qa.suse.cz, openqaworker15-ipmi.qa.suse.cz, openqaworker16-ipmi.qa.suse.cz, openqaworker17-ipmi.qa.suse.cz, openqaworker18-ipmi.qa.suse.cz
- power: qa-power8-4.qa.suse.de, qa-power8-5.qa.suse.de, fsp1-powerqaworker-qam.qa.suse.de, malbec
- arm: openqaworker-arm-1-ipmi.suse.de, openqaworker-arm-2-ipmi.suse.de, openqaworker-arm-4-ipmi.suse.de, openqaworker-arm-4-ipmi.suse.de, openqaworker-arm-5-ipmi.suse.de
I suppose we can ignore category 1. I'm not sure about the others. Should I request migrating them, too? When I remember correctly, we talk about this ticket in Jitsi and concluded that only the hosts I requested in https://sd.suse.com/servicedesk/customer/portal/1/SD-109299 would be missing.
EDIT: Created #124119 as follow-up for remaining hosts.
Updated by okurz almost 2 years ago
- Status changed from Feedback to In Progress
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/489 merged. As this ticket should be about SRV1 in particular then please make sure the list you provided is covered in a follow-up ticket, like check the parent and other subtasks, create a new ticket as necessary.
Updated by okurz almost 2 years ago
Updated by mkittler almost 2 years ago
- Copied to action #124119: Conduct the migration of remaining SUSE openQA systems IPMI to new security zones added
Updated by okurz almost 2 years ago
- Blocked by deleted (action #120651: [openQA][infra][ipmi][worker][api] The expected pattern CMD_FINISHED-xxxxx returned but did not show up in serial log (wait_serial timed out) size:M)
Updated by okurz almost 2 years ago
- Related to action #120651: [openQA][infra][ipmi][worker][api] The expected pattern CMD_FINISHED-xxxxx returned but did not show up in serial log (wait_serial timed out) size:M added
Updated by livdywan almost 2 years ago
- Status changed from In Progress to Resolved
okurz wrote:
https://gitlab.suse.de/openqa/monitor-o3/-/merge_requests/10 merged.
Merged. We're good here.