Project

General

Profile

action #163340

Updated by livdywan 5 months ago

## Observation 

 emails with the subject **Munin - minion Minion Jobs** and content like this: 
 ``` 
 opensuse.org :: openqa.opensuse.org :: Minion Jobs - see https://openqa.opensuse.org/minion/jobs?state=failed 
         
	 CRITICALs: failed is 501.00 (outside range [:500]). 
 ``` 

 We also see the same on OSD, see https://progress.opensuse.org/issues/163340#note-6 for some examples. 
 Looking at https://openqa.opensuse.org/minion/jobs?state=failed a lot of [obs_rsync_run jobs fail](https://openqa.opensuse.org/minion/jobs?id=4069130): 

 ``` 
 --- 
 args: 
 - project: openSUSE:Slowroll:Build:2 
   url: https://api.opensuse.org/public/build/openSUSE:Slowroll:Build:2/_result?package=000product 
 attempts: 1 
 children: [] 
 created: 2024-07-04T08:33:32.788609Z 
 delayed: 2024-07-04T08:33:32.788609Z 
 expires: ~ 
 finished: 2024-07-04T08:33:33.044735Z 
 id: 4068028 
 lax: 0 
 notes: 
   gru_id: 20378045 
   project_lock: 1 
 parents: [] 
 priority: 100 
 queue: default 
 result: 
   code: 256 
   message: |- 
     rsync: [sender] change_dir "/openSUSE:Slowroll:Build:2/images/x86_64/kiwi-templates-Minimal:kvm-and-xen" (in openqa) failed: No such file or directory (2) 
     rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1835) [Receiver=3.2.3] 
     read_files.sh failed for openSUSE:Slowroll:Build:2 in enviroment openSUSE:Slowroll:Build:2 
 retried: ~ 
 retries: 0 
 started: 2024-07-04T08:33:32.792054Z 
 state: failed 
 task: obs_rsync_run 
 time: 2024-07-04T11:25:28.671226Z 
 worker: 2253 
 ``` 

 ## Acceptance criteria 
 * **AC1**: We don't receive those e-mails anymore (unless there is really an actionable problem) 
 * **AC2*: Errors are visible on the web UI pages under "OBS Sync" 

 ## Suggestions 
 * Why do urls get configured when they are not present yet? 
   * Can we find out who added this? (*what* was added at all?) 
   * Interview that person and find out the use-case and why this was done 
 * Make the issue non-critical so we don't receive mails about failing minion jobs 
 * See #112871 for a ticket about changing the error handling and suppressing these errors 
 * Check https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/openqa-trigger-from-ibs.sls?ref_type=heads 
 * See also https://gitlab.suse.de/openqa/openqa-trigger-from-ibs-plugin/-/tree/master/xml 
 * Show errors in the OBSRsync UI instead of the minion dashboard (maybe that's already the case, what is the "Last failure" column on https://openqa.opensuse.org/admin/obs_rsync for?) 
     * So maybe don't consider those Minion jobs failed? But that would mean they'd be clean up quite quickly so the links in the "Last failure" column will not be very useful. So maybe the relevant error message/logs should be saved elsewhere? 

 ## Rollback steps 
 * Delete alert silence: https://stats.openqa-monitor.qa.suse.de/alerting/silence/03d05348-7f76-4cd6-8ee3-66026fe4adb0/edit

Back