Project

General

Profile

Actions

action #176013

closed

coordination #161414: [epic] Improved salt based infrastructure management

[alert] web UI: Too many Minion job failures alert size:S

Added by tinita 20 days ago. Updated 14 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
2025-01-23
Due date:
% Done:

0%

Estimated time:

Description

Observation

Date: Wed, 22 Jan 2025 22:44:38 +0100

https://monitor.qa.suse.de/alerting/grafana/liA25iB4k/view?orgId=1

Looking at https://openqa.suse.de/minion/jobs?state=failed most failed jobs seem to be obs_rsync.

---
args:
- project: SUSE:ALP:Source:Standard:1.0:Staging:Y
  url: https://api.suse.de/build/SUSE:ALP:Source:Standard:1.0:Staging:Y/_result?package=000product
attempts: 1
children: []
created: 2025-01-22T16:47:20.274616Z
delayed: 2025-01-22T16:47:20.274616Z
expires: ~
finished: 2025-01-22T16:47:20.667998Z
id: 14216909
lax: 0
notes:
  gru_id: 39649588
  project_lock: 1
parents: []
priority: 100
queue: default
result:
  code: 256
  message: read_files.sh failed for SUSE:ALP:Source:Standard:1.0:Staging:Y in enviroment
    SUSE:ALP:Source:Standard:1.0:Staging:Y
retried: ~
retries: 0
started: 2025-01-22T16:47:20.277566Z
state: failed
task: obs_rsync_run
time: 2025-01-23T10:07:07.353823Z
worker: 1894

however more recent failures look like this

---
args:
- project: SUSE:ALP:Source:Standard:1.0:Staging:Z
  url: https://api.suse.de/build/SUSE:ALP:Source:Standard:1.0:Staging:Z/_result?package=000product
attempts: 1
children: []
created: 2025-01-23T10:06:00.665092Z
delayed: 2025-01-23T10:06:00.665092Z
expires: ~
finished: 2025-01-23T10:06:01.794449Z
id: 14225762
lax: 0
notes:
  gru_id: 39657486
  project_lock: 1
parents: []
priority: 100
queue: default
result:
  code: 256
  message: |-
    rsync: [sender] change_dir "/SUSE:/ALP:/Source:/Standard:/1.0:/Staging:/Z/images/repo/SL-Micro*/repodata" (in repos) failed: No such file or directory (2)
    rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1877) [Receiver=3.2.7]
    read_files.sh failed for SUSE:ALP:Source:Standard:1.0:Staging:Z in enviroment SUSE:ALP:Source:Standard:1.0:Staging:Z
retried: ~
retries: 0
started: 2025-01-23T10:06:00.668233Z
state: failed
task: obs_rsync_run
time: 2025-01-23T10:07:07.353823Z
worker: 1899

Suggestions

  • Confirm if this is a (temporary) network connectivity issue OR a case of repos deleted which are still getting picked up by OBS sync
  • Look into ignoring related failes OR adjusting repo configs
    • Or ask nicely if one of the maintainers would care to fix the config
  • Also check older failed minion jobs, consider filing new tickets if there are separate issues there
  • Look at schedules configured in

Related issues 2 (1 open1 closed)

Related to openQA Infrastructure (public) - action #175710: OSD openqa.ini is corrupted, invalid characters, again 2025-01-17Blockedokurz2024-07-10

Actions
Copied to openQA Infrastructure (public) - action #176124: OSD influxdb minion route seemingly returns only a very small number of failed minion jobs, not allResolvedtinita

Actions
Actions

Also available in: Atom PDF