Actions
action #122158
closedopenQA Project - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4
action #114565: recover qa-power8-4+qa-power8-5 size:M
[alert] qa-power8-4-kvm host up alert - machine not up, nothing obvious on SoL but IPMI works size:M
Start date:
2022-12-19
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
https://monitor.qa.suse.de/d/WDQA-Power8-4-kvm/worker-dashboard-qa-power8-4-kvm?editPanel=65105&tab=alert failed since 2022-12-18 03:00 so the weekly reboot that triggered this, see https://monitor.qa.suse.de/d/WDQA-Power8-4-kvm/worker-dashboard-qa-power8-4-kvm?editPanel=65105&tab=alert&orgId=1&from=1671320482076&to=1671348249406 in detail. The machine is reachable over IPMI but does not show anything obvious.
Suggestions¶
- Pause related alert(s)
- Remove from salt
- Trigger OSD deployment which failed due to this
- Trigger reboot
- Look into the other recent related tickets, e.g. about worker cache sqlite lookup something
- #114565 and other related
Rollback steps¶
- Bring back to salt
- Unpause alert after resolution
Actions