action #153742
closedcoordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
coordination #123800: [epic] Provide SUSE QE Tools services running in PRG2 aka. Prg CoLo
coordination #137630: [epic] QE (non-openQA) setup in PRG2
Move of OSD machine NUE1 to PRG2 - storage.qe.prg2.suse.org
0%
Description
Acceptance criteria¶
- AC1: storage.qe.prg2.suse.org is usable from PRG2 as part of o3
Suggestions¶
- DONE Follow https://jira.suse.com/browse/ENGINFRA-3469
- Ensure machine can be reached
- Ensure machine is used as in before migration
Updated by okurz 7 months ago
- Copied from action #153739: Move of openqa.opensuse.org machine NUE1 to PRG2 - blackbauhinia added
Updated by nicksinger 4 months ago
- Related to action #159186: [alert] Systemd-services alert failing due to unit "rsnapshot@alpha" on host "storage" added
Updated by nicksinger 4 months ago
machine is now reachable by ipv4: https://gitlab.suse.de/OPS-Service/salt/-/commit/f8313232418249cbfd8832e5df8f7bf64a6b1a21 (AAAA entries present as well but they don't match the hosts v6)
Updated by okurz 4 months ago
Apparently
https://gitlab.suse.de/OPS-Service/salt/-/blob/production/salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org has wrong AAAA records. Asked in
https://suse.slack.com/archives/C04MDKHQE20/p1713438519404599
@Martin Caj you lately added entries for "storage" in https://gitlab.suse.de/OPS-Service/salt/-/blob/production/salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org?ref_type=heads&plain=1 with AAAA records. Those AAAA records are addresses which would match to DHCPv6 entries but AFAIK there is no DHCPv6 in qe.prg2.suse.org but SLAAC. So is it that you added wrong entries and haven't checked those? If yes what's the quickest way to update all AAAA records to match the SLAAC addresses?
Updated by okurz 4 months ago
- Copied to action #159306: Fix AAAA records in qe.prg2.suse.org size:S added
Updated by okurz 4 months ago
- Due date set to 2024-05-08
- Status changed from Blocked to In Progress
$ host storage.qe.prg2.suse.org
storage.qe.prg2.suse.org has address 10.145.0.6
storage.qe.prg2.suse.org has IPv6 address 2a07:de40:b203:8:3eec:efff:fe6d:96bc
$ ping storage.qe.prg2.suse.org
PING storage.qe.prg2.suse.org(2a07:de40:b203:8:3eec:efff:fe6d:96bc (2a07:de40:b203:8:3eec:efff:fe6d:96bc)) 56 data bytes
64 bytes from 2a07:de40:b203:8:3eec:efff:fe6d:96bc (2a07:de40:b203:8:3eec:efff:fe6d:96bc): icmp_seq=1 ttl=62 time=22.1 ms
unblocked.
Updated by okurz 4 months ago
- Due date changed from 2024-05-08 to 2024-06-07
- Status changed from In Progress to Blocked
https://sd.suse.com/servicedesk/customer/portal/1/SD-155096
Observation
After migration of storage.qe.prg2.suse.org to PRG2 it can not access openqa.opensuse.org and openqa.suse.de over SSH anymore to conduct backups.
Please allow access from
storage.qe.prg2.suse.org has address 10.145.0.6
storage.qe.prg2.suse.org has IPv6 address 2a07:de40:b203:8:3eec:efff:fe6d:96bc
to
openqa.oqa.prg2.suse.org has address 10.145.10.207
openqa.oqa.prg2.suse.org has IPv6 address 2a07:de40:b203:12:0:ff:fe4f:7c2b
ariel.dmz-prg2.suse.org has address 10.150.2.10
ariel.dmz-prg2.suse.org has IPv6 address 2a07:de40:b281:2:10:150:2:10
22/tcp (ssh) so that we can run backups again.
Steps to reproduce
storage:~ # nmap -p 22 openqa.suse.de ariel.dmz-prg2.suse.org
Nmap scan report for openqa.suse.de (10.145.10.207)
Host is up (0.00018s latency).
Other addresses for openqa.suse.de (not scanned): 2a07:de40:b203:12:0:ff:fe4f:7c2b
rDNS record for 10.145.10.207: openqa.oqa.prg2.suse.org
PORT STATE SERVICE
22/tcp filtered ssh
Nmap scan report for ariel.dmz-prg2.suse.org (10.150.2.10)
Host is up (0.00033s latency).
Other addresses for ariel.dmz-prg2.suse.org (not scanned): 2a07:de40:b281:2:10:150:2:10
PORT STATE SERVICE
22/tcp filtered ssh
Nmap done: 2 IP addresses (2 hosts up) scanned in 0.37 seconds
Expected result
ssh access from storage.qe.prg2.suse.org to both hosts openqa.suse.de+ariel.dmz-prg2.suse.org
Impact
Increased risk over time as no additional backup of openQA assets and test results is conducted for both openqa.suse.de and openqa.opensuse.org
Further details Internal tracking issue: https://progress.opensuse.org/issues/153742
Feel welcome to comment in the progress ticket which can be shared with more people by default and helps to communicate and we can edit texts and know who is assigned.
Impact - Please see notes below!
Medium
Urgency
Low
Updated by okurz 4 months ago · Edited
Plus https://sd.suse.com/servicedesk/customer/portal/1/SD-155109 for IPMI
So blocked on
https://sd.suse.com/servicedesk/customer/portal/1/SD-155096https://sd.suse.com/servicedesk/customer/portal/1/SD-155109
EDIT: https://sd.suse.com/servicedesk/customer/portal/1/SD-155109 was resolved as duplicate of https://jira.suse.com/browse/ENGINFRA-4144 in https://jira.suse.com/browse/ENGINFRA-3972
EDIT: ssh access to both o3+osd restored. Restarted "rsnapshot@alpha" on storage. Back to blocked on https://jira.suse.com/browse/ENGINFRA-4144
Updated by okurz 4 months ago · Edited
In coordination with @Viktor Karpovych done
storage:~ # ipmitool lan set 1 ipsrc static
storage:~ # ipmitool lan set 1 ipaddr 192.168.153.126
Setting LAN IP Address to 192.168.153.253
storage:~ # ipmitool lan set 1 netmask 255.255.255.0
Setting LAN Subnet Mask to 255.255.255.0
storage:~ # ipmitool lan set 1 defgw ipaddr 192.168.153.254
Setting LAN Default Gateway IP to 192.168.153.254
IPMI works fine but our hostname in salt pillars was wrong https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/795
Updated by okurz 4 months ago
- Status changed from Blocked to In Progress
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/795 merged. https://jira.suse.com/browse/ENGINFRA-4144 still open though technically all seems to work.
As rsnapshot@alpha lately failed due to invalid host key verification I called manually rsnapshot alpha
as root.
Updated by okurz 4 months ago
- Due date deleted (
2024-06-07) - Status changed from In Progress to Resolved
https://jira.suse.com/browse/ENGINFRA-4144 now at least has a comment that it would be "done". I guess that should suffice. Regarding rsnapshot@alpha that I will cover in #159186