Project

General

Profile

action #166739

Updated by tinita 6 months ago

## Motivation 

 There is no consistent monitoring of system services on o3. Most errors are ignored or only acted upon when there is a visible impact. 

 an example of this is errors in *openqa-continuous-update*: 
 
 ``` 
 Sep 11 03:21:01 ariel openqa-continuous-update[9321]: /usr/share/openqa/script/openqa-check-devel-repo: line 39: echo: write error: Broken pipe                        
 Sep 10 10:44:13 ariel openqa-continuous-update[26983]: Could not refresh the repositories because of errors.                                                           
 Sep 10 10:44:13 ariel openqa-continuous-update[26983]: Skipping repository 'openQA' because of the above error.                                                        
 Sep 10 10:39:12 ariel openqa-continuous-update[23326]: Could not refresh the repositories because of errors.                                                           
 Sep 10 10:39:12 ariel openqa-continuous-update[23326]: Skipping repository 'openQA' because of the above error.                                                        
 Sep 04 05:16:11 ariel openqa-continuous-update[8892]: /usr/share/openqa/script/openqa-check-devel-repo: line 39: echo: write error: Broken pipe                        
 Sep 02 19:52:54 ariel openqa-continuous-update[21123]: /usr/share/openqa/script/openqa-check-devel-repo: line 39: echo: write error: Broken pipe                       
 Sep 02 00:00:02 ariel openqa-continuous-update[19069]: Could not refresh the repositories because of errors.                                                           
 Sep 02 00:00:02 ariel openqa-continuous-update[19069]: Skipping repository 'openQA' because of the above error. 
 ``` 

 My guess is nobody looked into those errors. I couldn't find relevant tickets or Slack conversations about those. 

 ## Suggestions 
 * Use Munin's Munion's [systemd_status plugin](https://gallery.munin-monitoring.org/plugins/munin-contrib/systemd_status/)

Back