action #135515
closedmalbec.arch.suse.de not reachable anymore
0%
Description
Observation¶
host up alert since today morning https://monitor.qa.suse.de/d/WDmalbec/worker-dashboard-malbec?viewPanel=65105&orgId=1&from=1694419433265&to=1694422652024
Rollback steps¶
- Unsilence
alertname=malbec: host up alert
- Add back malbec to salt when back up (if ever)
Updated by okurz about 1 year ago
- Description updated (diff)
- Status changed from New to Blocked
From https://suse.slack.com/archives/C04MDKHQE20/p1694437602943579
(Oliver Kurz) @Steven Quinata you mentioned a power down of SRV2-EXT - A11 & SRV2-EXT - A10 and SRV2-EXT - A09 . We can't reach a machine malbec.arch.suse.de in A1 since around that time that you mentioned. Related?
(Steven Quinata) This could be as the connections to all other racks were coming from the FEX103 & FEX104. I will be back in the Server room tomorrow its possible to move to SRV2 with the uplinks coming still from SRV1.
(Oliver Kurz) I see. Thanks for the information. We will be ok to live without for the time being regardless of the plans to bring it back up or not.
(Steven Quinata) With this being in the power realm I would rather let someone from there department give advice
Updated by okurz about 1 year ago
- Description updated (diff)
- Status changed from Blocked to Feedback
Updated by okurz about 1 year ago
With the awesome help from squinata we could bring up malbec temporarily again from a new location NUE1-SRV2-D:5 https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=3052 . The machine picked up ppc64le jobs again. But in general tests have to be adapted to rely less on Power8 based qemu-kvm tests in the future.
Updated by okurz about 1 year ago
- Status changed from Feedback to Resolved
Added back malbec to salt and removed alert silence.