action #120270
Conduct the migration of SUSE openQA systems IPMI from Nbg SRV1 to new security zones size:M
0%
Description
Motivation¶
See parent #116623
Acceptance criteria¶
- AC1: All IPMI interfaces of openQA machines in Nbg SRV1 are in new security zones
- AC2: All IPMI interfaces of openQA machines in Nbg SRV1 are fully usable in production
- AC3: All documentation referencing O3+OSD ipmi interfaces are up-to-date
- AC4: Our automated tools using O3+OSD ipmi interfaces are up-to-date e.g. GitLab pipelines and salt states
Suggestions¶
- Monitor Slack #discuss-qe-new-security-zones
- Ensure access over the new way is possible
- Document changes in our infrastructure documentation, e.g. progress.opensuse.org/projects/openqav3/wiki/, https://wiki.suse.net/index.php/OpenQA, https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls
- Ensure https://gitlab.suse.de/openqa/salt-pillars-openqa#get-ipmi-definition-aliases works the new way
- Update https://gitlab.suse.de/openqa/grafana-webhook-actions
Open points¶
- Where is the documentation by SUSE-IT
- Where is the git repo handling ssh keys
- Fix the multi-second login time over ssh (workaround: use
ssh -4
)
Related issues
History
#2
Updated by okurz 3 months ago
- Description updated (diff)
Preliminary instructions in https://suse.slack.com/archives/C0488BZNA5S/p1668011380114319
(Martin Caj) you are not need to be root there to do ssh / ipmi. try this: ssh jumpy@qe-jumpy.suse.de
(Nick Singer) I'm able to log onto the machine. Let me try ipmi access to one of the migrated hosts
(Oliver Kurz) can you tell me the git repo where you manage the keys so that I can check myself and add an ed25519 key
(Oliver Kurz) it takes rather long to login: time ssh jumpy@qe-jumpy.suse.de true takes 3.8s!
(Nick Singer) yes the long connection time I also realized. might get annoying in the future and might even break automated pipelines or require special, dirty hacks to increase timeouts
#5
Updated by okurz 3 months ago
- Copied to action #120288: [tools] cloud based tests fail due to traffic to cloud blocked auto_review:"2022-11-0.*Test died: (Waiting for Godot.*ssh|Cannot find image after upload)":retry added
#7
Updated by okurz 3 months ago
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/458 for the conversion of salt-pillars
#8
Updated by mkittler 2 months ago
- Blocked by action #120651: [openQA][infra][ipmi][worker][api] The expected pattern CMD_FINISHED-xxxxx returned but did not show up in serial log (wait_serial timed out) size:M added
#10
Updated by okurz about 2 months ago
- Project changed from SUSE QA to openQA Infrastructure
- Assignee set to mkittler
#12
Updated by mkittler 24 days ago
- Status changed from Workable to In Progress
True. Tests using the IPMI backend are impaired (https://progress.opensuse.org/issues/120651) but that's not due to the IPMI interface itself.
I was told that AC1 and AC2 are now implemented so I'm checking what's left for AC3 and AC4.
#13
Updated by mkittler 24 days ago
I ran test-ipmi-access to verify AC3 (with updated regex, see https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/477). Some ipmi commands using jumpy@qe-jumpy
still fail. I logged in on the jump host and ran the failing commands for relevant hosts (those in SRV1) manually:
jumpy@qe-jumpy:~> ipmitool -I lanplus -C 3 -H openqa-aarch64-ipmi.qe-ipmi-ur -U … -P … Address lookup for openqa-aarch64-ipmi.qe-ipmi-ur failed Could not open socket! Error: Unable to establish IPMI v2 / RMCP+ session jumpy@qe-jumpy:~> ipmitool -I lanplus -C 3 -H openqaworker-power8-ipmi.qe-ipmi-ur -U … -P … Address lookup for openqaworker-power8-ipmi.qe-ipmi-ur failed Could not open socket! Error: Unable to establish IPMI v2 / RMCP+ session jumpy@qe-jumpy:~> ipmitool -I lanplus -C 3 -H rebel-ipmi.qe-ipmi-ur -U … -P … Address lookup for rebel-ipmi.qe-ipmi-ur failed Could not open socket! Error: Unable to establish IPMI v2 / RMCP+ session
Note that not all such hosts are broken (e.g. jumpy@qe-jumpy:~> ipmitool -I lanplus -C 3 -H openqaworker4-ipmi.qe-ipmi-ur …
works fine).
Looks like IPs for the hosts are hardcoded in /etc/hosts
on jumpy@qe-jumpy
. The failing hosts are missing in that list so supposedly someone needs to update the list. I cannot do it because I don't know the IPs so I've been asking in the chat.
#14
Updated by mkittler 24 days ago
Looks like https://gitlab.suse.de/openqa/grafana-webhook-actions/-/blob/master/.gitlab-ci.yml is only about workers not in SRV1.
https://gitlab.suse.de/openqa/monitor-o3/-/blob/master/.gitlab-ci.yml needs to be adjusted but first the issues mentioned in my previous comment need to be resolved. For being able to use the jumphost the script also needs to be changed a little.
#15
Updated by openqa_review 24 days ago
- Due date set to 2023-01-26
Setting due date based on mean cycle time of SUSE QE Tools
#16
Updated by mkittler 23 days ago
- power8 is broken anyways so we can exclude it
- aarch64 and rebel have simply not been migrated yet; the old IPMI commands still work (so supposedly the documentation should be reverted for now: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/479)
#22
Updated by mkittler 22 days ago
SR for updating monitor-o3: https://gitlab.suse.de/openqa/monitor-o3/-/merge_requests/7
#24
Updated by okurz 18 days ago
- Status changed from Blocked to Feedback
But the gitlab CI pipelines seem to have problems to reach the jumpy host, see e.g. https://gitlab.suse.de/openqa/monitor-o3/-/jobs/1346163#L37
#25
Updated by mkittler 18 days ago
This SR should fix it: https://gitlab.suse.de/openqa/monitor-o3/-/merge_requests/8
#26
Updated by mkittler 17 days ago
- Status changed from Feedback to Blocked
The SR has been merged and it works. So this leaves me waiting for https://sd.suse.com/servicedesk/customer/portal/1/SD-109299.