action #80736: Trigger 'auto-review' from within openQA when jobs incomplete (or fail) , for testing: auto_review:"tests died: unable to load main.pm, check the log for the cause" - openQA Project (public) - openSUSE Project Management Tool

Actions

action #80736

closed

coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues

coordination #80828: [epic] Trigger 'auto-review' and 'openqa-investigate' from within openQA when jobs incomplete or fail on o3+osd

Trigger 'auto-review' from within openQA when jobs incomplete (or fail) , for testing: auto_review:"tests died: unable to load main.pm, check the log for the cause"

Added by okurz over 4 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

okurz

Category:

Feature requests

Target version:

Ready

Start date:

Due date:

% Done:

Estimated time:

Description

Motivation¶

auto-review does a good job but we could benefit from running auto-review more often. We should trigger it directly when openQA jobs incomplete or fail

Acceptance criteria¶

AC1: auto-review is triggered on o3 when jobs incomplete

Suggestions¶

DONE: After some days check if this still works fine on o3
DONE: Fix reading from config for gru service -> https://github.com/os-autoinst/openQA/pull/3622
DONE: Enable for failed as well (with label-known+investigate)
DONE: Review openQA documentation for the current support, e.g. to consider apparmor, config vs. env, etc. -> http://open.qa/docs/#_enable_custom_hook_scripts_on_job_done_based_on_result
monitor impact on o3
Think about a better approach for apparmor

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by okurz over 4 years ago

Copied from action #77944: Run "auto-review" more often but alarm less added

Actions

Copy link

Updated by okurz over 4 years ago

Start date deleted (~~2020-11-14~~)

Actions

Copy link

Updated by okurz over 4 years ago

Subject changed from Trigger "auto-review" from within openQA when jobs incomplete (or fail) to Trigger "auto-review" from within openQA when jobs incomplete (or fail) , for testing: auto_review:"tests died: unable to load main.pm, check the log for the cause"

On o3 as root:

cd /opt/
mkdir os-autoinst-scripts
chown geekotest os-autoinst-scripts
sudo -u geekotest git clone git@github.com:os-autoinst/scripts.git os-autoinst-scripts

In /etc/cron.d/os-autoinst-scripts-update-git:

-*/3    * * * *  geekotest     git -C /opt/os-autoinst-scripts pull --quiet --rebase origin master

In /etc/openqa/openqa.ini:

[hooks]
job_done_hook_incomplete = /opt/os-autoinst-scripts/openqa-label-known-issues-hook

and

systemctl restart openqa-webui openqa-gru

Triggered a job that would incomplete with openqa-cli api --o3 -X post jobs test=okurz_poo80736 and found minion job result https://openqa.opensuse.org/minion/jobs?id=335327 that shows

{
  "args" => [
    1495066,
    undef
  ],
  "attempts" => 1,
  "children" => [],
  "created" => "2020-12-04T19:33:19.6905Z",
  "delayed" => "2020-12-04T19:33:19.6905Z",
  "expires" => undef,
  "finished" => "2020-12-04T19:33:19.77125Z",
  "id" => 335327,
  "lax" => 0,
  "notes" => {
    "gru_id" => 17288276
  },
  "parents" => [],
  "priority" => 0,
  "queue" => "default",
  "result" => "Job successfully executed",
  "retried" => undef,
  "retries" => 0,
  "started" => "2020-12-04T19:33:19.6963Z",
  "state" => "finished",
  "task" => "finalize_job_results",
  "time" => "2020-12-04T19:35:14.04025Z",
  "worker" => 471
}

so nothing seen from the hook. Calling the script with a job as argument manually reveals the problem:

/opt/os-autoinst-scripts/openqa-label-known-issues-hook 1495074

this shows "Connection refused" because https://openqa.opensuse.org is tried which does not work when called within the o3 network, http has to be used. env scheme=http /opt/os-autoinst-scripts/openqa-label-known-issues-hook 1495074 works so we can set that in the config as well.

Setting a custom auto_review regex in the subject line of this ticket which should label the job accordingly.

Actions

Copy link

Updated by okurz over 4 years ago

Subject changed from Trigger "auto-review" from within openQA when jobs incomplete (or fail) , for testing: auto_review:"tests died: unable to load main.pm, check the log for the cause" to Trigger 'auto-review' from within openQA when jobs incomplete (or fail) , for testing: auto_review:"tests died: unable to load main.pm, check the log for the cause"

Actions

Copy link

Updated by okurz over 4 years ago

Description updated (diff)
Status changed from In Progress to Feedback

First nothing happened. Then on o3 I added some live debug statements to /usr/share/openqa/lib/OpenQA/Task/Job/FinalizeResults.pm . I have the hypothesis that the config variables are not read by the gru service, so trying with env variables. I put into /etc/systemd/system/openqa-gru.service.d/override.conf

[Service]
Environment="OPENQA_JOB_DONE_HOOK_INCOMPLETE=env scheme=http /opt/os-autoinst-scripts/openqa-label-known-issues-hook"

which yields "Permission denied" in journalctl -u openqa-gru, due to apparmor. For testing the following steps I used the minion dashboard to retrigger the same minion job by selecting it and then clicking "Retry", instead of posting a new openQA job every time. So did aa-complain /etc/apparmor.d/usr.share.openqa.script.openqa and then aa-logprof /var/log/audit/audit.log which suggested me the following additions for /etc/apparmor.d/usr.share.openqa.script.openqa :

  /opt/os-autoinst-scripts/** rix,
  /usr/bin/cat rix,
  /usr/bin/curl rix,
  /usr/bin/jq rix,
  /usr/bin/mktemp rix,
  /usr/share/openqa/script/client rix,

and then again aa-enforce /etc/apparmor.d/usr.share.openqa.script.openqa.

https://openqa.opensuse.org/tests/1495076#comments is the job showing the results of my experiments in job labels.

The strange thing is that when using just a config variable I do not see anything from my temporary debugging statements nor any perl warning or error or anything but with an environment variable in place all the output is there. However, using that I could find out that the complete config is not read at all. Likely we just never read the config from the gru service process.

I also checked on progress.i.o.o with htop and in /opt/redmine/log/production.log that the repeated requests for tickets over the redmine API do not harm.

I have created a hook wrapper script for "label-known+investigate-unknown":
https://github.com/os-autoinst/scripts/pull/53

After merge this will be automatically deployed on o3 and we can then enable it as well with an env variable for the GRU service for failed (or even incomplete jobs).

Updated description with suggestions what to do next.

Actions

Copy link

Updated by okurz over 4 years ago

Description updated (diff)

Actions

Copy link

Updated by okurz over 4 years ago

Description updated (diff)

https://github.com/os-autoinst/openQA/pull/3622 to remove the faulty config reading.
Checked on o3 and everything still seems to work fine, e.g. https://openqa.opensuse.org/tests/1497351#comments although I wonder if I can make the comments show up from "auto-review" instead of "geekotest". Would need to use another user for the openQA client likely.

Actions

Copy link

Updated by okurz over 4 years ago

Description updated (diff)

Actions

Copy link

Updated by okurz over 4 years ago

Copied to action #80826: Trigger 'auto-review' from within openQA when jobs incomplete on osd as well added

Actions

Copy link

#10

Updated by okurz over 4 years ago

Copied to coordination #80828: [epic] Trigger 'auto-review' and 'openqa-investigate' from within openQA when jobs incomplete or fail on o3+osd added

Actions

Copy link

#11

Updated by okurz over 4 years ago

Description updated (diff)
Status changed from Feedback to Resolved
Parent task changed from #39719 to #80828

Split out other specific tickets and create parent epic #80828

auto-review is now triggered on o3 directly from hook scripts for incomplete jobs calling "openqa-label-known-issues" near-immediate when a job incompletes.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #80736

Trigger 'auto-review' from within openQA when jobs incomplete (or fail) , for testing: auto_review:"tests died: unable to load main.pm, check the log for the cause"

Motivation¶

Acceptance criteria¶

Suggestions¶

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago