action #128945
closed[alert][grafana] web UI: Too many Minion job failures alert Salt (liA25iB4k)
0%
Description
Observation¶
multiple alert emails received 2023-05-05 and following days
see https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&viewPanel=19&from=1683045374714&to=1683504896385
Updated by okurz over 1 year ago
- Copied from action #128942: [alert][grafana][openqa-piworker] NTP offset alert Generic (openqa-piworker ntp_offset_alert_openqa-piworker generic) size:M added
Updated by livdywan over 1 year ago
- Status changed from New to In Progress
- Assignee set to livdywan
Looking at the minion dashboard seems like this is all obs_rsync_run jobs:
Unexpected local arg: /opt/openqa-trigger-from-ibs/SUSE:SLE-15-SP3:Update:BCI/Media1_SLE-BCI-*.lst
If arg is a remote file/dir, prefix it with a colon (:).
rsync error: syntax or usage error (code 1) at main.c(1528) [Receiver=3.2.3]
Updated by mkittler over 1 year ago
So likely a recent change in https://github.com/os-autoinst/openqa-trigger-from-obs caused this (as this public repo is actually what we use on OSD at this point as well:
martchus@openqa:/opt/openqa-trigger-from-ibs> sudo -u geekotest git remote -v
old https://gitlab.suse.de/openqa/openqa-trigger-from-ibs.git (fetch)
old https://gitlab.suse.de/openqa/openqa-trigger-from-ibs.git (push)
origin https://github.com/os-autoinst/openqa-trigger-from-obs (fetch)
origin https://github.com/os-autoinst/openqa-trigger-from-obs (push)
)
But strangely the last change there is quite old.
Updated by openqa_review over 1 year ago
- Due date set to 2023-05-24
Setting due date based on mean cycle time of SUSE QE Tools
Updated by andriinikitin over 1 year ago
This should be fixed now
geekotest@openqa:/opt/openqa-trigger-from-ibs> bash SUSE\:SLE-15-SP3\:Update\:BCI/read_files.sh
Unexpected local arg: /opt/openqa-trigger-from-ibs/SUSE:SLE-15-SP3:Update:BCI/Media1_SLE-BCI-aarch64.lst
If arg is a remote file/dir, prefix it with a colon (:).
rsync error: syntax or usage error (code 1) at main.c(1528) [Receiver=3.2.3]
geekotest@openqa:/opt/openqa-trigger-from-ibs> rm SUSE\:SLE-15-SP3\:Update\:BCI/*lst
geekotest@openqa:/opt/openqa-trigger-from-ibs> bash SUSE\:SLE-15-SP3\:Update\:BCI/read_files.sh
geekotest@openqa:/opt/openqa-trigger-from-ibs>
problem is combination of rare things - we should avoid using '*' in file names in xml + there were leftover .lst files (because of reverted commit in openqa-trigger-from-obs).
I will clean the tasks in gru as well.
Updated by livdywan over 1 year ago
- Status changed from In Progress to Feedback
andriinikitin wrote:
problem is combination of rare things - we should avoid using '*' in file names in xml + there were leftover .lst files (because of reverted commit in openqa-trigger-from-obs).
I will clean the tasks in gru as well.
Thank you for taking care of it!
Updated by okurz over 1 year ago
- Status changed from Feedback to In Progress
We still have a way too high number of failed minion jobs, see https://monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&viewPanel=19
Updated by livdywan over 1 year ago
- Status changed from In Progress to Feedback
okurz wrote:
We still have a way too high number of failed minion jobs, see https://monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&viewPanel=19
Ack. I hadn't checked the "other ones" yet. There's some really old ones here.
I now also deleted a bunch of save_needle failing like this:
error: |-
<strong>Failed to save installation-autoyast-boot-20230505.</strong><br><pre>Unable to commit via Git: On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
</pre>
and limit_{results_and_logs,screenshots} failing like this:
notes:
gru_id: 33669590
signal_handler: Received signal TERM, scheduling retry and releasing locks
Updated by livdywan over 1 year ago
- Status changed from Feedback to Resolved
No new ones over the weekend. I think we're good here.