action #166739
Updated by tinita 6 months ago
## Motivation There is no consistent monitoring of system services on o3. Most errors are ignored or only acted upon when there is a visible impact. an example of this is errors in *openqa-continuous-update*: ``` Sep 11 03:21:01 ariel openqa-continuous-update[9321]: /usr/share/openqa/script/openqa-check-devel-repo: line 39: echo: write error: Broken pipe Sep 10 10:44:13 ariel openqa-continuous-update[26983]: Could not refresh the repositories because of errors. Sep 10 10:44:13 ariel openqa-continuous-update[26983]: Skipping repository 'openQA' because of the above error. Sep 10 10:39:12 ariel openqa-continuous-update[23326]: Could not refresh the repositories because of errors. Sep 10 10:39:12 ariel openqa-continuous-update[23326]: Skipping repository 'openQA' because of the above error. Sep 04 05:16:11 ariel openqa-continuous-update[8892]: /usr/share/openqa/script/openqa-check-devel-repo: line 39: echo: write error: Broken pipe Sep 02 19:52:54 ariel openqa-continuous-update[21123]: /usr/share/openqa/script/openqa-check-devel-repo: line 39: echo: write error: Broken pipe Sep 02 00:00:02 ariel openqa-continuous-update[19069]: Could not refresh the repositories because of errors. Sep 02 00:00:02 ariel openqa-continuous-update[19069]: Skipping repository 'openQA' because of the above error. ``` My guess is nobody looked into those errors. I couldn't find relevant tickets or Slack conversations about those. ## Suggestions * Use Munin's Munion's [systemd_status plugin](https://gallery.munin-monitoring.org/plugins/munin-contrib/systemd_status/)