action #176013: [alert] web UI: Too many Minion job failures alert size:S - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

Copy link

action #176013

closed

coordination #161414: [epic] Improved salt based infrastructure management

[alert] web UI: Too many Minion job failures alert size:S

Added by tinita 4 months ago. Updated 4 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

ybonatakis

Category:

Regressions/Crashes

Target version:

openQA Project (public) - Ready

Start date:

2025-01-23

Due date:

% Done:

Estimated time:

Tags:

alert, osd, infra, reactive work

Description

Observation¶

Date: Wed, 22 Jan 2025 22:44:38 +0100

https://monitor.qa.suse.de/alerting/grafana/liA25iB4k/view?orgId=1

Looking at https://openqa.suse.de/minion/jobs?state=failed most failed jobs seem to be obs_rsync.

---
args:
- project: SUSE:ALP:Source:Standard:1.0:Staging:Y
  url: https://api.suse.de/build/SUSE:ALP:Source:Standard:1.0:Staging:Y/_result?package=000product
attempts: 1
children: []
created: 2025-01-22T16:47:20.274616Z
delayed: 2025-01-22T16:47:20.274616Z
expires: ~
finished: 2025-01-22T16:47:20.667998Z
id: 14216909
lax: 0
notes:
  gru_id: 39649588
  project_lock: 1
parents: []
priority: 100
queue: default
result:
  code: 256
  message: read_files.sh failed for SUSE:ALP:Source:Standard:1.0:Staging:Y in enviroment
    SUSE:ALP:Source:Standard:1.0:Staging:Y
retried: ~
retries: 0
started: 2025-01-22T16:47:20.277566Z
state: failed
task: obs_rsync_run
time: 2025-01-23T10:07:07.353823Z
worker: 1894

however more recent failures look like this

---
args:
- project: SUSE:ALP:Source:Standard:1.0:Staging:Z
  url: https://api.suse.de/build/SUSE:ALP:Source:Standard:1.0:Staging:Z/_result?package=000product
attempts: 1
children: []
created: 2025-01-23T10:06:00.665092Z
delayed: 2025-01-23T10:06:00.665092Z
expires: ~
finished: 2025-01-23T10:06:01.794449Z
id: 14225762
lax: 0
notes:
  gru_id: 39657486
  project_lock: 1
parents: []
priority: 100
queue: default
result:
  code: 256
  message: |-
    rsync: [sender] change_dir "/SUSE:/ALP:/Source:/Standard:/1.0:/Staging:/Z/images/repo/SL-Micro*/repodata" (in repos) failed: No such file or directory (2)
    rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1877) [Receiver=3.2.7]
    read_files.sh failed for SUSE:ALP:Source:Standard:1.0:Staging:Z in enviroment SUSE:ALP:Source:Standard:1.0:Staging:Z
retried: ~
retries: 0
started: 2025-01-23T10:06:00.668233Z
state: failed
task: obs_rsync_run
time: 2025-01-23T10:07:07.353823Z
worker: 1899

Suggestions¶

Confirm if this is a (temporary) network connectivity issue OR a case of repos deleted which are still getting picked up by OBS sync
Look into ignoring related failes OR adjusting repo configs
- Or ask nicely if one of the maintainers would care to fix the config
Also check older failed minion jobs, consider filing new tickets if there are separate issues there
Look at schedules configured in

Related issues 4 (0 open — 4 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #176013

[alert] web UI: Too many Minion job failures alert size:S

Observation¶

Suggestions¶

Updated by tinita 4 months ago

Updated by okurz 4 months ago

Updated by gpuliti 4 months ago

Updated by gpuliti 4 months ago

Updated by tinita 4 months ago

Updated by okurz 4 months ago

Updated by okurz 4 months ago

Updated by kraih 4 months ago

Updated by livdywan 4 months ago

Updated by livdywan 4 months ago

Updated by livdywan 4 months ago

Updated by livdywan 4 months ago · Edited

Updated by okurz 4 months ago

Updated by jlausuch 4 months ago

Updated by ybonatakis 4 months ago

Updated by ybonatakis 4 months ago

Updated by tinita 4 months ago

Updated by tinita 4 months ago

Updated by ybonatakis 4 months ago

Updated by okurz 4 months ago

Updated by gpathak 3 months ago

Updated by gpathak 27 days ago