Project

General

Profile

action #121282

Recover storage.qa.suse.de size:S

Added by mkittler 2 months ago. Updated 26 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2022-12-01
Due date:
% Done:

0%

Estimated time:

Description

Observation

The server cannot be reached via SSH and the host-up alert in our monitoring fired (paused it for now). I wasn't able to reach the IPMI host (via jumpy@qe-jumpy.suse.de as documented in pillars).

Acceptance criteria

  • AC1: Machine is racked again
  • AC2: Racktables is updated including mac/connections/rack
  • AC3: No related alerts for that machine is firing

Rollback steps


Related issues

Related to openQA Infrastructure - action #88546: Make use of the new "Storage Server", e.g. complete OSD backupResolved

Related to openQA Infrastructure - action #69577: Handle installation of the new "Storage Server"Resolved2020-08-04

Related to openQA Infrastructure - action #123082: backup of o3 to storage.qa.suse.de was not conducted by rsnapshot since 2021-12 size:MFeedback2023-01-132023-02-10

History

#1 Updated by okurz 2 months ago

  • Tags changed from alert to alert, reactive work

#2 Updated by dheidler 2 months ago

Unable to find that hostname in racktables:
https://racktables.nue.suse.com/?page=search&last_page=search&last_tab=default&q=storage.qa.suse.de

Does anyone know where this machine is located?

#3 Updated by okurz 2 months ago

  • Related to action #88546: Make use of the new "Storage Server", e.g. complete OSD backup added

#5 Updated by okurz 2 months ago

funny story: https://infra.nue.suse.com/SelfService/Display.html?id=175645#txn-2575010

guess where the racktable links leads to: https://racktables.suse.de/index.php?page=object&tab=default&object_id=13558 "openqa-ses" the seemingly "orphaned" machine we didn't know what it was intended for :D

So next task: Decide if we should move openqa-ses back to SRV1 or find a new home at FC

#6 Updated by mkittler 2 months ago

I suppose having it in FC will be fine. That means we'd likely mount it somewhere in SRV2 or the lab in the 2nd floor until FC is ready.

Where's the machine now, btw?

Next time we could at least check what services run on a machine before pulling the plug. In this case it would have been very obvious that it's just the storage server.

#7 Updated by nicksinger 2 months ago

  • Tags changed from alert, reactive work to alert, next-office-day

I've updated https://racktables.suse.de/index.php?page=object&tab=edit&object_id=13558 to reflect the actual FQDN we know the host and also added the serial number from the attached picture so we can avoid this issue in the future by just looking up the serial number

#8 Updated by nicksinger 2 months ago

  • Subject changed from Recover storage.qa.suse.de to Recover storage.qa.suse.de size:S
  • Description updated (diff)
  • Status changed from New to Workable

#9 Updated by nicksinger 2 months ago

  • Related to action #69577: Handle installation of the new "Storage Server" added

#10 Updated by okurz 2 months ago

  • Description updated (diff)

#11 Updated by okurz 2 months ago

  • Assignee set to dheidler

At next opportunity someone nearby to Nuremberg please create a EngInfra ticket over sd.suse.com/ with added Jira SD group "OSD Admins", make an appointment to take the machine from 2.2.14 (TAM) QA lab location where the machine is currently located, bring it back to SRV1 and mount it back where it was, update racktables and make sure the machine is reachable. Migrating the machine into the new network zone should be done in #120267

As discussed in daily 2022-12-07 dheidler will pick this up.

#12 Updated by dheidler 2 months ago

  • Status changed from Workable to Blocked

#13 Updated by okurz about 2 months ago

  • Priority changed from Urgent to Normal

In the ticket gschlotter informed that they likely won't be at Maxtorhof anymore this year so we will have to wait

#14 Updated by okurz 29 days ago

  • Tags changed from alert, next-office-day, infra to alert, next-office-day, infra, reactive work

#15 Updated by nicksinger 26 days ago

  • Status changed from Blocked to In Progress

We plugged the machine back in where previously power8 was sitting. Gerhard configured the switch-ports to be in VLAN2 again. IPMI+OS is running and pingable again, racktables is updated

#16 Updated by nicksinger 26 days ago

alerts enabled again. Checking in 5m if they come up green

#17 Updated by nicksinger 26 days ago

  • Status changed from In Progress to Resolved

alerts are good again.

#18 Updated by nicksinger 26 days ago

  • Assignee changed from dheidler to nicksinger

#19 Updated by okurz 26 days ago

Great work on resolving this and taking care about all the mentioned alerts, appreciated. Now we can look into the previously blocked #120267

#20 Updated by okurz 23 days ago

  • Related to action #123082: backup of o3 to storage.qa.suse.de was not conducted by rsnapshot since 2021-12 size:M added

Also available in: Atom PDF