Project

General

Profile

Actions

action #159444

closed

Many minion jobs failing with rc_hook error code because progress is unavailable

Added by livdywan 6 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

http://stats.openqa-monitor.qa.suse.de/d/WebuiDb?orgId=1&viewPanel=201

From the journal e.g. sudo journalctl -S today | grep known-issues:

Error fetching url [...] Got Status 503

Seems to be hooks with result 1 (or not 0) like so because grep timed out?

args:
- env enable_force_result=true email_unreviewed=true from_email=openqa-review@suse.de
 notification_address=discuss-openqa-auto-r-aaaagmhuypu2hq2kmzgovutmqm@suse.slack.com
 host=openqa.suse.de exclude_name_regex='.*(SAPHanaSR|saptune).*' exclude_group_regex='.*(Development|Public
 Cloud|Released|Others|Kernel|Virtualization|BCI).*' grep_timeout=60 nice ionice
 -c idle /opt/os-autoinst-scripts/openqa-label-known-issues-and-investigate-hook
- 14122329
- delay: 60
 kill_timeout: 10s
 retries: 1440
 skip_rc: 142
 timeout: 10m
attempts: 1
children: []
created: 2024-04-23T09:28:55.327396Z
delayed: 2024-04-23T09:28:55.327396Z
expires: ~
finished: 2024-04-23T09:29:00.375086Z
id: 11087913
lax: 0
notes:
 hook_cmd: env enable_force_result=true email_unreviewed=true from_email=openqa-review@suse.de
   notification_address=discuss-openqa-auto-r-aaaagmhuypu2hq2kmzgovutmqm@suse.slack.com
   host=openqa.suse.de exclude_name_regex='.*(SAPHanaSR|saptune).*' exclude_group_regex='.*(Development|Public
   Cloud|Released|Others|Kernel|Virtualization|BCI).*' grep_timeout=60 nice ionice
   -c idle /opt/os-autoinst-scripts/openqa-label-known-issues-and-investigate-hook
 hook_rc: 1
 hook_result: ''

https://openqa.suse.de/minion/jobs?id=11087902

Note that this is a "successful" minion job. See all hook jobs https://openqa.suse.de/minion/jobs?task=hook_script&state=finished&queue=¬e=hook_rc

Actions #1

Updated by livdywan 6 months ago

  • Subject changed from s390x kvm jobs incomplete with auto_review:"cache failure: Failed to send asset request for SLE-Micro-.*Cache service enqueue error 500: Internal Server Error" to Many minion jobs failing with rc_hook error code because grep timed out
Actions #2

Updated by livdywan 6 months ago

  • Subject changed from Many minion jobs failing with rc_hook error code because grep timed out to Many minion jobs failing with rc_hook error code because progress is unavailable
  • Description updated (diff)
  • Status changed from New to Feedback
  • Assignee set to livdywan

So as I was investigating this, I think this was due to progress being unavailable, and most likely is fine from here.

Actions #3

Updated by livdywan 6 months ago

  • Description updated (diff)
Actions #4

Updated by livdywan 6 months ago

  • Description updated (diff)
  • Status changed from Feedback to Resolved

The issue no longer occurs. And for now we're not going to add retries or extend the timeout.

Actions #5

Updated by livdywan 6 months ago

  • Description updated (diff)
Actions

Also available in: Atom PDF