Actions
action #80678
openMake sure that certain amount of workers of some unique class is running
Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
QA (public, currently private due to #173521) - future
Start date:
2020-12-03
Due date:
% Done:
0%
Estimated time:
Description
Goal: Introduce workflow where we can be sure that defined amount of instances are running.
Use case :
Currently we have limited amount of workers with WORKER_CLASS=pc_azure ( and same case goes also to pc_ec2 , pc_gce , pc_azure_qam etc. ). Due to technical issues one of workers which was running this instances goes down. this introduce several problems :
- there is no notification on that level . there was announcement that some of openQA worker is down but no warning about the consequence of this event.
- in case of long period of worker been down there quite complicated process to solve it ( you need to create PR for certain repo , pass review , get it merged and deployed ) all this time jobs which can be processed only by this worker class will hang in queue
Ideas how to solve it :
- Introduce API call allowing to add/remove WORKER_CLASS at runtime
- Introduce a service which would ping dedicated worker instances and act accordingly when they going down, raising up
Actions