Project

General

Profile

Actions

action #70768

closed

obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequently

Added by mkittler about 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2020-09-01
Due date:
% Done:

0%

Estimated time:
16.00 h

Description

Observation

The obs_rsync_run and obs_rsync_update_builds_text Minion task fails frequently on OSD (not o3).

The failing `obs_rsync_run can be observed using the following query parameters: https://openqa.suse.de/minion/jobs?soffset=0&task=obs_rsync_run&state=failed

I'm going to remove most of these jobs to calm down the alert but right now 43 jobs have piled up over 22 days. However, the problem actually exists longer than 22 days. I assume somebody cleaned up the Minion dashboard at some point.

The job arguments are always like this:

  "args" => [
    {
      "project" => "SUSE:SLE-15-SP3:GA:Staging:E"
    }
  ],

The results always look like one of these:

  "result" => {
    "code" => 256,
    "message" => "rsync: change_dir \"/SUSE:/SLE-15-SP3:/GA:/Staging:/S/images/iso\" (in repos) failed: No such file or directory (2)\nrsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1674) [Receiver=3.1.3]"
  },
  "result" => {
    "code" => 256,
    "message" => "No file found: {SUSE:SLE-15-SP3:GA:Staging:B/read_files.sh}"
  },

So there are two different cases when something can not be found.

There are also some failing obs_rsync_update_builds_text jobs which look like this:

  "result" => {
    "code" => 256,
    "message" => "rsync: change_dir \"/SUSE:/SLE-15-SP3:/GA:/Staging:/E/images/iso\" (in repos) failed: No such file or directory (2)\nrsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1674) [Receiver=3.1.3]\n"
  },

Suggestions

It looks like that these failures are not a practical problem - at least we haven't received any negative feedback. Likely it works again on the next run or the job was not needed anymore anyways. If that's true these jobs should not end up as failures¹ or shouldn't have been created in the first place. Maybe there's also an actual bug we need to fix.

¹ Failures in the sense of the Minion dashboard are jobs which should be manually investigated but I doubt these failing jobs should be manually investigated when they occur.

  • Check low-level commands executed by obsrsync, potentially try to reproduce manually
  • Check if source project folders exist or not

Just adjust our monitoring to ignore obs_rsync failures. If test reviewers find missing assets in their tests these tests will incomplete and https://openqa.suse.de/minion/jobs?state=failed has additional information available on demand.


Related issues 3 (1 open2 closed)

Related to openQA Infrastructure - action #70975: [alert] too many failed minion jobsResolvedokurz2020-09-04

Actions
Related to openQA Project - coordination #96263: [epic] Exclude certain Minion tasks from "Too many Minion job failures alert" alertNew2020-09-01

Actions
Copied to QA - action #112871: obs_rsync_run Minion tasks fail with no error message size:MResolvedlivdywan

Actions
Actions

Also available in: Atom PDF