action #80540
closedopenQA Project (public) - coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
coordination #78206: [epic] 2020-11-18 nbg power outage aftermath
idea: Conduct "power outage drills", e.g. once every half-year?
0%
Description
Motivation¶
During the last "power outage" in Nbg data center we fared kinda ok but some machines and services did not come up by themselves due to simple things, e.g. not being configured to automatically power on after power is restored. See #78206 for details
Acceptance criteria¶
- AC1: We have a schedule within the team SUSE QE Tools when we forcefully shut down all or selected systems to test our recovery plans
- AC2: A simple requirements catalog for our services is listed what we need to ensure in service designs
Suggestion¶
- Decide about a useful cadence, date and procedure
- Add an according calendar appointment with reminder
Updated by okurz over 3 years ago
- Status changed from Workable to Rejected
- Assignee set to okurz
As we often reboot our machines also to test this and as we managed to configure all openQA worker hosts to automatically power up after power is back and because we do not have direct physical control over all machines I think we can not do more reasonable things here, hence rejecting