Project

General

Profile

Actions

action #69553

closed

coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues

coordination #62420: [epic] Distinguish all types of incompletes

job incompletes with "Failed to rsync tests: exit code 10":retry, improve user feedback

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2020-08-04
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://openqa.opensuse.org/tests/1352135 shows reason "setup failure: Failed to rsync tests: exit code 10", autoinst-log.txt shows:

[2020-08-03T19:34:17.0982 UTC] [info] Rsync from 'rsync://openqa1-opensuse/tests' to '/var/lib/openqa/cache/openqa1-opensuse', request #5593 sent to Cache Service
[2020-08-03T19:34:44.0427 UTC] [info] Output of rsync:
[info] [#5593] Calling: rsync -avHP rsync://openqa1-opensuse/tests/ --delete /var/lib/openqa/cache/openqa1-opensuse/tests/

[2020-08-03T19:34:44.0655 UTC] [info] +++ worker notes +++
[2020-08-03T19:34:44.0655 UTC] [info] End time: 2020-08-03 19:34:44
[2020-08-03T19:34:44.0656 UTC] [info] Result: setup failure

Steps to reproduce

Unclear how this can be reproduced but as long as auto_review is finding related issues we can find these jobs with https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label:

openqa-query-for-job-label poo#69553

Suggestions

  • Find out what "exit code 10" means "exit code 10" means "Error in socket I/O" (see man page). Improve feedback to users, e.g. what are possible reasons for the problem and what to do to fix or workaround

Workaround

Retriggering the job should work


Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #46658: openqaworker4 caching failsResolvedokurz2019-01-25

Actions
Related to openQA Project - action #73396: job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23":retryResolvedXiaojing_liu2020-10-15

Actions
Actions #1

Updated by okurz over 3 years ago

Actions #2

Updated by okurz over 3 years ago

  • Description updated (diff)
  • Status changed from New to Workable
  • Priority changed from Normal to Low

I looked for recent occurences

$ ssh o3 "sudo -u geekotest psql openqa -c \"select jobs.id,t_finished,state,result,test,reason,host from jobs, comments, workers where t_finished >= '2020-01-01' and jobs.id = comments.job_id and comments.text ~ 'poo#69553' order by t_finished desc limit 10;\""
   id    |     t_finished      | state |   result   | test |                       reason                       |      host      
---------+---------------------+-------+------------+------+----------------------------------------------------+----------------
 1352135 | 2020-08-03 19:34:49 | done  | incomplete | jeos | setup failure: Failed to rsync tests: exit code 10 | openqaworker7
 1352135 | 2020-08-03 19:34:49 | done  | incomplete | jeos | setup failure: Failed to rsync tests: exit code 10 | openqa-aarch64
 1352135 | 2020-08-03 19:34:49 | done  | incomplete | jeos | setup failure: Failed to rsync tests: exit code 10 | power8
 1352135 | 2020-08-03 19:34:49 | done  | incomplete | jeos | setup failure: Failed to rsync tests: exit code 10 | power8
 1352135 | 2020-08-03 19:34:49 | done  | incomplete | jeos | setup failure: Failed to rsync tests: exit code 10 | oss-apollo7004
 1352135 | 2020-08-03 19:34:49 | done  | incomplete | jeos | setup failure: Failed to rsync tests: exit code 10 | NONE
 1352135 | 2020-08-03 19:34:49 | done  | incomplete | jeos | setup failure: Failed to rsync tests: exit code 10 | openqa-aarch64
 1352135 | 2020-08-03 19:34:49 | done  | incomplete | jeos | setup failure: Failed to rsync tests: exit code 10 | openqa-aarch64
 1352135 | 2020-08-03 19:34:49 | done  | incomplete | jeos | setup failure: Failed to rsync tests: exit code 10 | openqa-aarch64
 1352135 | 2020-08-03 19:34:49 | done  | incomplete | jeos | setup failure: Failed to rsync tests: exit code 10 | power8
(10 rows)

so did not happen again after 2020-08-03. However I think we can still improve at least the user feedback.

Actions #3

Updated by okurz over 3 years ago

  • Subject changed from job incompletes with auto_review:"Failed to rsync tests: exit code 10":retry to job incompletes with auto_review:"Failed to rsync tests: exit code 10":retry, improve user feedback
Actions #4

Updated by okurz over 3 years ago

  • Tags changed from caching, worker to caching, worker, cache, rsync, minion, ux, incomplete
Actions #5

Updated by okurz over 3 years ago

  • Description updated (diff)
Actions #6

Updated by okurz over 3 years ago

  • Related to action #73396: job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23":retry added
Actions #7

Updated by okurz over 3 years ago

  • Description updated (diff)
Actions #8

Updated by kraih over 3 years ago

Yes, this does look very harmless, probably a temporary network issue. Retriggering the job should be fine.

Actions #9

Updated by kraih over 3 years ago

  • Assignee set to kraih
Actions #10

Updated by okurz over 3 years ago

seems like this can be easily solved together with #73396 and you already saw the other, related pull request.

Actions #11

Updated by okurz over 3 years ago

  • Parent task set to #62420
Actions #12

Updated by kraih over 3 years ago

  • Status changed from Workable to In Progress
Actions #14

Updated by okurz over 3 years ago

  • Status changed from In Progress to Feedback

merged. I suggest we wait until this is deployed to o3 and osd, then wait for some "auto-review" runs, check if this ticket is still used for auto-labelling and if not remove the auto_review: keyword from the ticket subject line and set the ticket to "Resolved". As an alternative to shortcut it a little bit we can after deployment look for any according log message in logfiles and then do the other steps.

Actions #15

Updated by okurz over 3 years ago

  • Subject changed from job incompletes with auto_review:"Failed to rsync tests: exit code 10":retry, improve user feedback to job incompletes with "Failed to rsync tests: exit code 10":retry, improve user feedback
  • Status changed from Feedback to Resolved

Since kraih likely is working on something different I checked right now with

env host="o3 osd" failed_since="NOW() - interval '90 day'" sh -ex $(which openqa-query-for-job-label) 69553

and could only find a single job from 2020-09-13 on osd that is still listed so this seems to work fine with the internal retrying

Actions

Also available in: Atom PDF