Actions
action #163340
openOBSRSync regularily fails minion jobs - nobody cares, tools gets alerted (e.g. "Munin - minion Minion Jobs") size:M
Status:
Feedback
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
emails with the subject Munin - minion Minion Jobs and content like this:
opensuse.org :: openqa.opensuse.org :: Minion Jobs - see https://openqa.opensuse.org/minion/jobs?state=failed
CRITICALs: failed is 501.00 (outside range [:500]).
We also see the same on OSD, see https://progress.opensuse.org/issues/163340#note-6 for some examples.
Looking at https://openqa.opensuse.org/minion/jobs?state=failed a lot of obs_rsync_run jobs fail:
---
args:
- project: openSUSE:Slowroll:Build:2
url: https://api.opensuse.org/public/build/openSUSE:Slowroll:Build:2/_result?package=000product
attempts: 1
children: []
created: 2024-07-04T08:33:32.788609Z
delayed: 2024-07-04T08:33:32.788609Z
expires: ~
finished: 2024-07-04T08:33:33.044735Z
id: 4068028
lax: 0
notes:
gru_id: 20378045
project_lock: 1
parents: []
priority: 100
queue: default
result:
code: 256
message: |-
rsync: [sender] change_dir "/openSUSE:Slowroll:Build:2/images/x86_64/kiwi-templates-Minimal:kvm-and-xen" (in openqa) failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1835) [Receiver=3.2.3]
read_files.sh failed for openSUSE:Slowroll:Build:2 in enviroment openSUSE:Slowroll:Build:2
retried: ~
retries: 0
started: 2024-07-04T08:33:32.792054Z
state: failed
task: obs_rsync_run
time: 2024-07-04T11:25:28.671226Z
worker: 2253
Acceptance criteria¶
- AC1: We don't receive those e-mails anymore (unless there is really an actionable problem)
- AC2: Errors are visible on the web UI pages under "OBS Sync"
Suggestions¶
- Why do urls get configured when they are not present yet?
- Can we find out who added this? (what was added at all?)
- Interview that person and find out the use-case and why this was done
- Make the issue non-critical so we don't receive mails about failing minion jobs
- See #112871 for a ticket about changing the error handling and suppressing these errors
- Check https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/openqa-trigger-from-ibs.sls?ref_type=heads
- See also https://gitlab.suse.de/openqa/openqa-trigger-from-ibs-plugin/-/tree/master/xml
- Show errors in the OBSRsync UI instead of the minion dashboard (maybe that's already the case, what is the "Last failure" column on https://openqa.opensuse.org/admin/obs_rsync for?)
- So maybe don't consider those Minion jobs failed? But that would mean they'd be clean up quite quickly so the links in the "Last failure" column will not be very useful. So maybe the relevant error message/logs should be saved elsewhere?
Rollback steps¶
Actions