action #77089
Updated by okurz over 4 years ago
## Observation
On 2020-11-06 Found multiple unattended alerts, unattended gitlab CI pipeline fails, all osd aarch64 workers offline. What happened?
What I have seen failing:
* Minion Jobs alert for more than one day
* openqaworker-arm-1, openqaworker-arm-2, openqaworker-arm-3 offline alert but also the long-time alert for all three
* An increased job schedule of 600 aarch64 jobs and not decreasing, see https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&fullscreen&panelId=12&from=1604651865692&to=1604718259586
* Multiple email alerts from failed gitlab CI pipelines, e.g. for the grafana-webhook-ations, openqa-review, auto-review
* No message in Rocket.Chat nor email about anyone handling any of the above alerts until Friday, 2020-11-06, 22:00 UTC
## Acceptance criteria
* **AC1:** Alerts handled
* **AC2:** gitlab CI jobs can find shared runners again
* **AC3:** issue has been discussed with team, e.g. in retrospective