Project

General

Profile

action #80540

openQA Project - coordination #80142: [saga][epic] Scale out openQA: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #78206: [epic] 2020-11-18 nbg power outage aftermath

idea: Conduct "power outage drills", e.g. once every half-year?

Added by okurz 5 months ago.

Status:
Workable
Priority:
Low
Assignee:
-
Target version:
Start date:
2020-11-27
Due date:
% Done:

0%

Estimated time:

Description

Motivation

During the last "power outage" in Nbg data center we fared kinda ok but some machines and services did not come up by themselves due to simple things, e.g. not being configured to automatically power on after power is restored. See #78206 for details

Acceptance criteria

  • AC1: We have a schedule within the team SUSE QE Tools when we forcefully shut down all or selected systems to test our recovery plans
  • AC2: A simple requirements catalog for our services is listed what we need to ensure in service designs

Suggestion

  • Decide about a useful cadence, date and procedure
  • Add an according calendar appointment with reminder

Also available in: Atom PDF