Project

General

Profile

Actions

action #80678

open

Make sure that certain amount of workers of some unique class is running

Added by asmorodskyi about 4 years ago. Updated about 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
QA (public, currently private due to #173521) - future
Start date:
2020-12-03
Due date:
% Done:

0%

Estimated time:

Description

Goal: Introduce workflow where we can be sure that defined amount of instances are running.

Use case :
Currently we have limited amount of workers with WORKER_CLASS=pc_azure ( and same case goes also to pc_ec2 , pc_gce , pc_azure_qam etc. ). Due to technical issues one of workers which was running this instances goes down. this introduce several problems :

  1. there is no notification on that level . there was announcement that some of openQA worker is down but no warning about the consequence of this event.
  2. in case of long period of worker been down there quite complicated process to solve it ( you need to create PR for certain repo , pass review , get it merged and deployed ) all this time jobs which can be processed only by this worker class will hang in queue

Ideas how to solve it :

  1. Introduce API call allowing to add/remove WORKER_CLASS at runtime
  2. Introduce a service which would ping dedicated worker instances and act accordingly when they going down, raising up
Actions

Also available in: Atom PDF