action #163340
Updated by livdywan 5 months ago
## Observation emails with the subject **Munin - minion Minion Jobs** and content like this: ``` opensuse.org :: openqa.opensuse.org :: Minion Jobs - see https://openqa.opensuse.org/minion/jobs?state=failed CRITICALs: failed is 501.00 (outside range [:500]). ``` We also see the same on OSD, see https://progress.opensuse.org/issues/163340#note-6 for some examples. Looking at https://openqa.opensuse.org/minion/jobs?state=failed a lot of [obs_rsync_run jobs fail](https://openqa.opensuse.org/minion/jobs?id=4069130): ``` --- args: - project: openSUSE:Slowroll:Build:2 url: https://api.opensuse.org/public/build/openSUSE:Slowroll:Build:2/_result?package=000product attempts: 1 children: [] created: 2024-07-04T08:33:32.788609Z delayed: 2024-07-04T08:33:32.788609Z expires: ~ finished: 2024-07-04T08:33:33.044735Z id: 4068028 lax: 0 notes: gru_id: 20378045 project_lock: 1 parents: [] priority: 100 queue: default result: code: 256 message: |- rsync: [sender] change_dir "/openSUSE:Slowroll:Build:2/images/x86_64/kiwi-templates-Minimal:kvm-and-xen" (in openqa) failed: No such file or directory (2) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1835) [Receiver=3.2.3] read_files.sh failed for openSUSE:Slowroll:Build:2 in enviroment openSUSE:Slowroll:Build:2 retried: ~ retries: 0 started: 2024-07-04T08:33:32.792054Z state: failed task: obs_rsync_run time: 2024-07-04T11:25:28.671226Z worker: 2253 ``` ## Acceptance criteria * **AC1**: We don't receive those e-mails anymore (unless there is really an actionable problem) * **AC2*: Errors are visible on the web UI pages under "OBS Sync" ## Suggestions * Why do urls get configured when they are not present yet? * Can we find out who added this? (*what* was added at all?) * Interview that person and find out the use-case and why this was done * Make the issue non-critical so we don't receive mails about failing minion jobs * See #112871 for a ticket about changing the error handling and suppressing these errors * Check https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/openqa-trigger-from-ibs.sls?ref_type=heads * See also https://gitlab.suse.de/openqa/openqa-trigger-from-ibs-plugin/-/tree/master/xml * Show errors in the OBSRsync UI instead of the minion dashboard (maybe that's already the case, what is the "Last failure" column on https://openqa.opensuse.org/admin/obs_rsync for?) * So maybe don't consider those Minion jobs failed? But that would mean they'd be clean up quite quickly so the links in the "Last failure" column will not be very useful. So maybe the relevant error message/logs should be saved elsewhere? ## Rollback steps * Delete alert silence: https://stats.openqa-monitor.qa.suse.de/alerting/silence/03d05348-7f76-4cd6-8ee3-66026fe4adb0/edit