Project

General

Profile

Actions

action #106666

closed

Improve worker startup in our salt states or "openqa-worker-auto-restart repeatedly failing on grenache-1.qa.suse.de"

Added by nicksinger almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2022-02-11
Due date:
% Done:

0%

Estimated time:

Description

Motivation

It can happen that we disable single worker-instances on openQA workers (e.g. https://progress.opensuse.org/issues/106257#note-9). If we use the mask approach it results in our deployment pipeline failing because our states try to start every worker instance configured in the "numofworkers" field (https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls#L44) this happens here: https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/worker.sls#L190-194
So even commenting out the affected instances wouldn't work.

Suggestions

The following flow would allow us to just comment out instances in addition to mask them manually:

  1. Iterate over every key for each worker (https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls#L52) and use their instance number to explicitly start them
  2. Take the last, explicitly defined instance number, subtract it from "numofworkers", start only the remaining instances

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure (public) - action #106832: Monitor masked units on our infrastructureResolvedokurz2022-02-15

Actions
Has duplicate openQA Infrastructure (public) - action #106753: openqa-worker-auto-restart repeatedly failing on grenache-1.qa.suse.deRejected2022-02-14

Actions
Actions

Also available in: Atom PDF