Project

General

Profile

Actions

action #99741

closed

Minion jobs for job hooks failed silently on o3 size:M

Added by livdywan about 3 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2021-10-04
Due date:
% Done:

0%

Estimated time:

Description

Minion jobs for job hooks failed silently on o3

Observation

Gru logs show entries like this for minion jobs:

Oct 01 14:25:44 ariel openqa-gru[30835]: Can't exec "/bin/sh": Permission denied at /usr/share/openqa/script/../lib/OpenQA/Task/Job/FinalizeResults.pm line 63.

Relevant minion jobs are shown as finished rather than "failed", e.g. https://openqa.opensuse.org/minion/jobs?id=800152 with the following details:

---
args:
- 1951060
- ~
attempts: 1
children: []
created: 2021-10-02T13:46:14.14573Z
delayed: 2021-10-02T13:46:14.14573Z
expires: ~
finished: 2021-10-02T13:46:14.41935Z
id: 800152
lax: 0
notes:
  gru_id: 17752756
  hook_cmd: env scheme=http exclude_group_regex='(Development|Open Build Service|Others|Kernel).*/.*'
    /opt/os-autoinst-scripts/openqa-label-known-issues-and-investigate-hook
  hook_rc: -1
parents: []
priority: -10
queue: default
result: Job successfully executed
retried: ~
retries: 0
started: 2021-10-02T13:46:14.15145Z
state: finished
task: finalize_job_results
time: 2021-10-04T13:59:22.27403Z
worker: 744

Acceptance criteria

  • AC1: Alerts are received for both osd+o3 if a high (configurable?) amount (or ratio) of hook scripts fail

Suggestions


Related issues 5 (2 open3 closed)

Related to openQA Infrastructure - action #57239: Add/fix openqa_logwarn for o3 and osd sending to o3-admins@suse.de and osd-admins@suse.de respectivelyWorkable2019-09-23

Actions
Related to openQA Project - action #128405: Missing investigate jobs on both o3+osd since months? size:MResolvedtinita2023-04-28

Actions
Related to openQA Project - action #132665: [alert] openqa-label-known-issues-and-investigate minion hook failed on o3 size:SResolvedtinita2023-07-13

Actions
Copied from openQA Infrastructure - action #99195: Upgrade o3 webUI host to openSUSE Leap 15.3 size:MResolvedlivdywan

Actions
Copied to openQA Infrastructure - action #130778: Treat some openqa-clone-job failures non-fatal in openqa-investigate hookNew

Actions
Actions

Also available in: Atom PDF