Project

General

Profile

action #178015

Updated by livdywan 3 months ago

## Observation 
 It often starts innocent like in https://suse.slack.com/archives/C02CANHLANP/p1740668762857669 when José Fernández asked why a change in os-autoinst-distri-opensuse does not seem to work on aarch64. Some steps later digging down the rabbit hole I found that we have many failed systemd services on various hosts which https://monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services happily shows along with green hearts and there are no related firing alerts though there should be. 

 ## Acceptance Criteria 
 * **AC1:** It is understood why aarch64 revealed issues with systemd services and follow-up tickets are filed 

 ## Suggestions 
 * Check current alert definitions in grafana 
 * Check our git history in https://gitlab.suse.de/openqa/salt-states-openqa or ticket history for potential regression introducing candidates 
 * Identify the problem and fix it and let the team learn how it came to this 

 ## Rollback steps 
 * Reset the failed state of `openqa-reload-worker-auto-restart@999` on worker33 and run `systemctl unmask openqa-worker-auto-restart@999`.

Back