action #120267
closedcoordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
coordination #116623: [epic] Migration of SUSE Nbg based openQA+QA+QAM systems to new security zones
Conduct the migration of openqa-ses aka. "storage.qa.suse.de" size:M
0%
Description
Motivation¶
See parent #116623
Acceptance criteria¶
- AC1: openqa-ses is either migrated to the new network zone or decommissioned
Suggestions¶
- Coordinate the move among SUSE-IT and machine owners in Slack #discuss-qe-new-security-zones
- Ensure racktables is up-to-date
Updated by okurz almost 2 years ago
- Copied from action #120264: Conduct the migration of SUSE QA systems (non-tools-team maintained) from Nbg SRV1 to new security zones size:M added
Updated by okurz almost 2 years ago
- Status changed from New to Blocked
Regarding "openqa-ses" chat was in https://suse.slack.com/archives/C02CANHLANP/p1667993899105129:
(Oliver Kurz) ok. I wrote to qa-team@suse.de. If I receive no response then I think this is a topic for QE mgmt, shouldn't happen that we have machines which nobody wants to know about :slightly_smiling_face:
(Matthias Griessmeier) I agree. Ses is storage, I don't know/remember why it has osd-admins as contact. Probably because openqa is in the name. I'd say if the machine is not pingable, nor reachable over ipmi and no one responses, let's unmount it and move it to cold storage. This seems to be historical leftover. […] Gerhard is aware and next time he is in srv1. He will disconnect it. I cannot even find it in https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/domain/suse_de.sls
that's probably the reason why dns does not resolve - so I think it is safe to remove it. to make it official, created https://sd.suse.com/servicedesk/customer/portal/1/SD-103880
Also I asked Lazaros Haleplidis in https://suse.slack.com/archives/C0488BZNA5S/p1668081298115499:
(Lazaros Haleplidis) also I have a question about machines like openqa-ses.suse.de https://racktables.nue.suse.com/index.php?page=object&object_id=13558 . nobody seems to know the machine and it's not reachable. But racktable lists switch port connections so by powering off/on machines or disconnecting/connecting the switch ports you and others from EngInfra could identify the MAC addresses and just go ahead with the migration and nobody from our side can contribute more.
Updated by okurz almost 2 years ago
- Status changed from Blocked to Feedback
No response in Slack. I asked mgriessmeier to share access to https://sd.suse.com/servicedesk/customer/portal/1/SD-103880 . The machine is still in racktables untouched.
Updated by okurz almost 2 years ago
- Status changed from Feedback to Blocked
OSD-Admins is now participant on https://sd.suse.com/servicedesk/customer/portal/1/SD-103880, we can track this
Updated by okurz almost 2 years ago
- Status changed from Blocked to Feedback
SD ticket was resolved with: "Has been unmounted and placed in the cold storage." racktables wasn't updated yet. Need to check with mgriessmeier if he created a new cold storage entry at FC location
Updated by okurz almost 2 years ago
- Related to action #121282: Recover storage.qa.suse.de size:S added
Updated by okurz almost 2 years ago
- Subject changed from Conduct the migration/decommissioning of openqa-ses to Conduct the migration of openqa-ses aka. "storage.qa.suse.de"
- Category set to Infrastructure
- Status changed from Feedback to Blocked
blocked by #121282. After that we need to ensure that we actually have the system migrated to the new network security zone(s)
Updated by okurz almost 2 years ago
- Status changed from Blocked to New
- Assignee deleted (
okurz)
With #121282 resolved we can now unblock and continue here.
Updated by okurz almost 2 years ago
- Parent task changed from #120264 to #116623
Updated by livdywan almost 2 years ago
- Subject changed from Conduct the migration of openqa-ses aka. "storage.qa.suse.de" to Conduct the migration of openqa-ses aka. "storage.qa.suse.de" size:M
- Status changed from New to Workable
Updated by mkittler almost 2 years ago
- Related to action #120270: Conduct the migration of SUSE openQA systems IPMI from Nbg SRV1 to new security zones size:M added
Updated by mkittler almost 2 years ago
- Status changed from Workable to Feedback
Updated by okurz almost 2 years ago
- Status changed from Feedback to Blocked
We can declare that as blocked by https://sd.suse.com/servicedesk/customer/portal/1/SD-109299
Updated by mkittler over 1 year ago
The IPMI interface has been moved (which is hopefully actually wanted, this ticket makes no mention of it but supposedly the IPMI interface is part of #120270). Now we only need to wait for the migration of "storage.qa.suse.de" itself.
Updated by okurz over 1 year ago
I commented in https://sd.suse.com/servicedesk/customer/portal/1/SD-109299
Currently we use storage.qa.suse.de solely within the scope of openQA so please move it into .oqa.suse.de., not .qe.suse.de.
Updated by mkittler over 1 year ago
- Status changed from Blocked to Feedback
The migration is supposedly concluded. I cannot login on qe-jumpy.suse.de
to verify it at the moment, though. hostname --fqdn
only returns hostname: Name or service not known
on storage.qa.suse.de
.
Updated by mkittler over 1 year ago
I'd checked this again today but VPN is offline.
Updated by mkittler over 1 year ago
- Status changed from Feedback to Blocked
The VPN is online again. However, it doesn't look like the host has actually been migrated. So I've left a comment on https://sd.suse.com/servicedesk/customer/portal/1/SD-109299. (It is not possible to re-open the ticket. I've mentioned it in the chat to get some attention.)
Updated by okurz over 1 year ago
- Status changed from Blocked to In Progress
mkittler to check latest changes after mcaj mentioned
the DNS change for the device storage.qa.suse.de - https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=13558
is here: https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3232/
already merged. Maybe we need to reboot or check the host's status.
Updated by mkittler over 1 year ago
- Status changed from In Progress to Blocked
Still blocked by https://sd.suse.com/servicedesk/customer/portal/1/SD-109299.
Updated by mkittler over 1 year ago
- Status changed from Blocked to In Progress
The host has been migrated and I've updated racktables. Now I only need to check our salt/alerting (as currently the host-up alert for the old domain is firing).
Updated by mkittler over 1 year ago
I deleted the old host storage.qa.suse.de and added storage.oqa.suse.de instead. This fixed the alert and the host seems to be almost properly in salt again. There are no failing systemd services. The only problem I found so far is telegraf:
Feb 28 16:07:37 storage telegraf[14403]: 2023-02-28T15:07:37Z W! [outputs.influxdb] When writing to [http://openqa-monitor.qa.suse.de:8086]: database "telegraf" creation failed: Post "http://openqa-monitor.qa.suse.de:8086/query": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Feb 28 16:07:52 storage telegraf[14403]: 2023-02-28T15:07:52Z E! [outputs.influxdb] When writing to [http://openqa-monitor.qa.suse.de:8086]: failed doing req: Post "http://openqa-monitor.qa.suse.de:8086/write?db=telegraf": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Feb 28 16:07:52 storage telegraf[14403]: 2023-02-28T15:07:52Z E! [agent] Error writing to outputs.influxdb: could not write any address
Note that the host is generally reachable and e.g. curl --verbose -X POST http://openqa-monitor.qa.suse.de:8086/write?db=telegraf
returns fast (with a "204 No Content" response).
I couldn't find anything useful in the InfluxDB logs.
Updated by okurz over 1 year ago
deployed https://gitlab.suse.de/qa-sle/qanet-configs/-/merge_requests/51 for the updated DNS entry
Updated by openqa_review over 1 year ago
- Due date set to 2023-03-15
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler over 1 year ago
The InfluxDB issue is gone (error not logged anymore and data is visible on https://stats.openqa-monitor.qa.suse.de/d/GDstorage/dashboard-for-storage). I don't know what has changed but I suppose it is good enough that it works now.
Looks like I still need to replace some references of the old domain:
- https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/801
- https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/497
Annoyingly, the host-up alert (and possible other alerts for storage) still used the old domain, even after applying salt states. For now I have changed the domain of the host-up alert by editing the alert manually. I could just save the changes on this page without being prompted to C&P JSON.
All of this means that our alerting is currently not covered by the JSON files in salt and generic alert-related code (e.g. "value": "{{ host_interface }}"
in generic.json.template
is not effective¹).
¹ Despite the old alert config being actually still visibly stored - e.g. if I save the JSON of that dashboard I'm getting "value": "storage.oqa.suse.de"
. However, it has no effect on the migrated alert.
Updated by mkittler over 1 year ago
- Status changed from In Progress to Resolved
I've also edited the other alerts manually (actually just the ping alert used the full domain name and had to be changed). So this should cover everything.
Updated by livdywan over 1 year ago
- Status changed from Resolved to Feedback
I'm seeing alerts because salt-states-openqa is failing like so, hence re-opening:
ID: /root/.ssh/id_ed25519.backup_osd
Function: file.managed
Result: False
Comment: Pillar id_ed25519.backup_osd does not exist
Started: 13:25:13.881664
Duration: 2.935 ms
Changes:
And this looks to be the fix for it: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/498
Updated by mkittler over 1 year ago
- Status changed from Feedback to Resolved
Yes, and the pipeline has already passed.