Project

General

Profile

Actions

action #163340

open

OBSRSync regularily fails minion jobs - nobody cares, tools gets alerted (e.g. "Munin - minion Minion Jobs") size:M

Added by nicksinger 6 months ago. Updated about 20 hours ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

emails with the subject Munin - minion Minion Jobs and content like this:

opensuse.org :: openqa.opensuse.org :: Minion Jobs - see https://openqa.opensuse.org/minion/jobs?state=failed
        CRITICALs: failed is 501.00 (outside range [:500]).

We also see the same on OSD, see https://progress.opensuse.org/issues/163340#note-6 for some examples.
Looking at https://openqa.opensuse.org/minion/jobs?state=failed a lot of obs_rsync_run jobs fail:

---
args:
- project: openSUSE:Slowroll:Build:2
  url: https://api.opensuse.org/public/build/openSUSE:Slowroll:Build:2/_result?package=000product
attempts: 1
children: []
created: 2024-07-04T08:33:32.788609Z
delayed: 2024-07-04T08:33:32.788609Z
expires: ~
finished: 2024-07-04T08:33:33.044735Z
id: 4068028
lax: 0
notes:
  gru_id: 20378045
  project_lock: 1
parents: []
priority: 100
queue: default
result:
  code: 256
  message: |-
    rsync: [sender] change_dir "/openSUSE:Slowroll:Build:2/images/x86_64/kiwi-templates-Minimal:kvm-and-xen" (in openqa) failed: No such file or directory (2)
    rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1835) [Receiver=3.2.3]
    read_files.sh failed for openSUSE:Slowroll:Build:2 in enviroment openSUSE:Slowroll:Build:2
retried: ~
retries: 0
started: 2024-07-04T08:33:32.792054Z
state: failed
task: obs_rsync_run
time: 2024-07-04T11:25:28.671226Z
worker: 2253

Acceptance criteria

  • AC1: We don't receive those e-mails anymore (unless there is really an actionable problem)
  • AC2: Errors are visible on the web UI pages under "OBS Sync"

Suggestions

Rollback steps


Related issues 3 (1 open2 closed)

Related to openQA Infrastructure (public) - action #155743: OBSRSync fails to sync openSUSE:Factory:PowerPC:ToTest (was: WARNINGs: failed is 452.00 in Munin - minion Minion Jobs on o3)Blockedlivdywan2024-02-21

Actions
Related to openQA Infrastructure (public) - action #163067: [alert] Munin - minion Minion Jobs - see https://openqa.opensuse.org/minion/jobs?state=failed - opensuse.org :: openqa.opensuse.orgRejected2024-07-01

Actions
Related to QA (public) - action #112871: obs_rsync_run Minion tasks fail with no error message size:MResolvedlivdywan

Actions
Actions #1

Updated by nicksinger 6 months ago

  • Copied from action #155743: OBSRSync fails to sync openSUSE:Factory:PowerPC:ToTest (was: WARNINGs: failed is 452.00 in Munin - minion Minion Jobs on o3) added
Actions #2

Updated by nicksinger 6 months ago

  • Copied from deleted (action #155743: OBSRSync fails to sync openSUSE:Factory:PowerPC:ToTest (was: WARNINGs: failed is 452.00 in Munin - minion Minion Jobs on o3))
Actions #3

Updated by nicksinger 6 months ago

  • Related to action #155743: OBSRSync fails to sync openSUSE:Factory:PowerPC:ToTest (was: WARNINGs: failed is 452.00 in Munin - minion Minion Jobs on o3) added
Actions #4

Updated by tinita 6 months ago

  • Related to action #163067: [alert] Munin - minion Minion Jobs - see https://openqa.opensuse.org/minion/jobs?state=failed - opensuse.org :: openqa.opensuse.org added
Actions #5

Updated by okurz 6 months ago

  • Category set to Regressions/Crashes
  • Target version set to Ready
Actions #6

Updated by nicksinger 6 months ago

  • Description updated (diff)

Some examples from OSD as well:

---
args:
- alias: SUSE:SLFO:Products:SL-Micro:6.1:ToTest
attempts: 1
children: []
created: 2024-07-05T07:00:35.834961Z
delayed: 2024-07-05T07:00:35.834961Z
expires: ~
finished: 2024-07-05T07:00:36.373966Z
id: 12004122
lax: 0
notes:
  gru_id: 37809476
parents: []
priority: 0
queue: default
result:
  code: 256
  message: ''
retried: ~
retries: 0
started: 2024-07-05T07:00:35.837364Z
state: failed
task: obs_rsync_update_builds_text
time: 2024-07-05T11:43:17.137624Z
worker: 1725
---
args:
- project: SUSE:ALP:Source:Standard:1.0:Staging:L
  url: https://api.suse.de/build/SUSE:ALP:Source:Standard:1.0:Staging:L/_result?package=000product
attempts: 1
children: []
created: 2024-07-05T09:31:36.042208Z
delayed: 2024-07-05T09:31:36.042208Z
expires: ~
finished: 2024-07-05T09:31:37.036903Z
id: 12007191
lax: 0
notes:
  gru_id: 37811813
  project_lock: 1
parents: []
priority: 100
queue: default
result:
  code: 256
  message: |-
    rsync: [sender] change_dir "/SUSE:/ALP:/Source:/Standard:/1.0:/Staging:/L/images/repo/SL-Micro*/repodata" (in repos) failed: No such file or directory (2)
    rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1835) [Receiver=3.2.3]
    read_files.sh failed for SUSE:ALP:Source:Standard:1.0:Staging:L in enviroment SUSE:ALP:Source:Standard:1.0:Staging:L
retried: ~
retries: 0
started: 2024-07-05T09:31:36.045513Z
state: failed
task: obs_rsync_run
time: 2024-07-05T11:43:17.137624Z
worker: 1725
Actions #7

Updated by nicksinger 6 months ago

  • Related to action #112871: obs_rsync_run Minion tasks fail with no error message size:M added
Actions #8

Updated by nicksinger 6 months ago

  • Description updated (diff)
Actions #9

Updated by livdywan 5 months ago

  • Subject changed from OBSRSync regularily fails minion jobs - nobody cares, tools gets alerted (e.g. "Munin - minion Minion Jobs") to OBSRSync regularily fails minion jobs - nobody cares, tools gets alerted (e.g. "Munin - minion Minion Jobs") size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #10

Updated by dheidler 5 months ago

  • Status changed from Workable to In Progress
  • Assignee set to dheidler
Actions #11

Updated by openqa_review 5 months ago

  • Due date set to 2024-08-08

Setting due date based on mean cycle time of SUSE QE Tools

Actions #12

Updated by okurz 5 months ago

  • Description updated (diff)
Actions #13

Updated by dheidler 5 months ago

Findings so far:

The rsync error message refers to a missing directory on the rsync server and not on the client.

Actions #14

Updated by dheidler 5 months ago

Posted on #proj-agama about the wrong paths:

https://suse.slack.com/archives/C02TLF25571/p1721919187484699

Actions #15

Updated by livdywan 5 months ago

  • Status changed from In Progress to Workable

Putting back to Workable as discussed last week

Actions #16

Updated by livdywan 5 months ago

  • Due date deleted (2024-08-08)
Actions #17

Updated by dheidler 4 months ago

  • Status changed from Workable to In Progress

The openqa-trigger-from-obs config was updated in https://github.com/os-autoinst/openqa-trigger-from-obs/commit/4f683924d171c4d3d6a7f1fce9b9f8db65c2ba00

Updated the checkout on ariel accordingly:

dheidler@ariel:/opt/openqa-trigger-from-obs> sudo rm -r systemsmanagement\:Agama\:Staging/

The systemsmanagement:Agama:Devel files seem to have been already updated.

Actions #18

Updated by dheidler 4 months ago

  • Status changed from In Progress to Resolved

I'm closing this ticket now because technically the ACs are both fulfilled.

I fixed the concrete problem but did no further changes in openQA, because it is already possible to access the error in the minion job via the obs_sync page (last failure column).

Actions

Also available in: Atom PDF