Project

General

Profile

Actions

coordination #102864

open

coordination #102861: [saga][epic] Improved openQA for multi-user environments

[epic] Inform openQA webUI users about potential worker class mismatch or long delays

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2021-09-13
Due date:
% Done:

0%

Estimated time:

Description

Motivation

In #98562 the idea came to cancel jobs with "invalid" worker class but that is time dependant. Then in #100973 we implemented automatic cancellation of all jobs after a (longer) timeout so that jobs don't hang around forever. Now we can go the next step and improve the feedback to users about potential worker class mismatches or expected long delays in job execution

Acceptance criteria

  • AC1: Given a scheduled job When worker class does not match any worker entry Then inform user about that fact and that the job is likely misconfigured
  • AC2: Given a scheduled job When worker class does match a worker entry And there are currently no online workers for this worker class And the last online time is below a configurable threshold, e.g. 10 minutes, Then inform user about that fact and that the job will likely be executed later
  • AC3: Given a scheduled job When worker class does match a worker entry And there are currently no online workers for this worker class And the last online time is above a configurable threshold, e.g. 10 minutes, Then inform user about that fact and that there is likely an infrastructure problem and admins should be contacted
  • AC4: Given a scheduled job When worker class does match a worker entry And there are currently no free workers for this worker class And the ratio of "scheduled for this worker class / available worker instances for this worker class" is high Then inform user about to be expected longer delays

Related issues 1 (1 open0 closed)

Copied from openQA Project - action #98562: Cancel jobs with invalid WORKER_CLASS after a timeoutNew2021-09-13

Actions
Actions #1

Updated by okurz over 2 years ago

  • Copied from action #98562: Cancel jobs with invalid WORKER_CLASS after a timeout added
Actions #2

Updated by livdywan over 2 years ago

I think yesterday I hit this case again:

  1. Configured a worker
  2. Checked the web UI /workers page
  3. Waited 10 minutes while my job is not picked up
  4. No errors in logs anywhere
  5. AC3 worker class does match a worker entry / there are currently no online workers for this worker class / the last online time is above a configurable threshold, e.g. 10 minutes
Actions

Also available in: Atom PDF