action #166739
Updated by tinita 6 months ago
## Motivation
There is no consistent monitoring of system services on o3. Most errors are ignored or only acted upon when there is a visible impact.
an example of this is errors in *openqa-continuous-update*:
```
Sep 11 03:21:01 ariel openqa-continuous-update[9321]: /usr/share/openqa/script/openqa-check-devel-repo: line 39: echo: write error: Broken pipe
Sep 10 10:44:13 ariel openqa-continuous-update[26983]: Could not refresh the repositories because of errors.
Sep 10 10:44:13 ariel openqa-continuous-update[26983]: Skipping repository 'openQA' because of the above error.
Sep 10 10:39:12 ariel openqa-continuous-update[23326]: Could not refresh the repositories because of errors.
Sep 10 10:39:12 ariel openqa-continuous-update[23326]: Skipping repository 'openQA' because of the above error.
Sep 04 05:16:11 ariel openqa-continuous-update[8892]: /usr/share/openqa/script/openqa-check-devel-repo: line 39: echo: write error: Broken pipe
Sep 02 19:52:54 ariel openqa-continuous-update[21123]: /usr/share/openqa/script/openqa-check-devel-repo: line 39: echo: write error: Broken pipe
Sep 02 00:00:02 ariel openqa-continuous-update[19069]: Could not refresh the repositories because of errors.
Sep 02 00:00:02 ariel openqa-continuous-update[19069]: Skipping repository 'openQA' because of the above error.
```
My guess is nobody looked into those errors. I couldn't find relevant tickets or Slack conversations about those.
## Suggestions
* Use Munin's [systemd_status plugin](https://gallery.munin-monitoring.org/plugins/munin-contrib/systemd_status/) [git](https://github.com/munin-monitoring/contrib/blob/master/plugins/systemd/systemd_status)