Project

General

Profile

action #106832

Updated by okurz about 2 years ago

## Motivation 

 #106666#note-6 https://progress.opensuse.org/issues/106666#note-6 raised the valid question how we become aware of units being masked for too long. 

 ## Suggestions 
 * Use e.g. `systemctl list-unit-files --state=masked --no-legend` to figure out what units are currently masked 
 * Feed this information into our monitoring/grafana - one way to do this would be to extend https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/monitoring/telegraf/scripts/systemd_failed.sh 
 * Create an appropriate dashboard in grafana with reasonable thresholds for alerting. E.g. don't alert if a service is masked <1w

Back