action #133352
closed
Activating systemd target openqa-worker.target when openqa-worker-auto-restart@ is already used causes havoc size:M
Added by okurz over 1 year ago.
Updated over 1 year ago.
Category:
Feature requests
Description
Observation¶
On the o3 worker openqaworker4 I recently called systemctl start openqa-worker.target
. That repeatedly soon after caused lots of incomplete openQA jobs with message like "Reason: abandoned: associated worker openqaworker4:10 re-connected but abandoned the job " in https://openqa.opensuse.org/tests/3455867 . This is due to the fact that conflicting systemd services start: openqa-worker which is a link to openqa-worker-plain and openqa-worker-auto-restart
Acceptance criteria¶
- AC1: Starting openqa-worker.target does not cause conflicts with already existing openqa-worker-auto-restart
Suggestions¶
- Look into how the openqa-worker.target is generated by a script systemd/systemd-openqa-generator in openQA repo
- Maybe this can be solved by "documentation" that we need to update the symlink or something
- It is ok if services fail preventing the admin to do stupid things but we should prevent such situation where the services actually start, pick up jobs and incomplete them
- Subject changed from Activating systemd target openqa-worker.target when openqa-worker-auto-restart@ is already used causes havoc to Activating systemd target openqa-worker.target when openqa-worker-auto-restart@ is already used causes havoc size:M
- Description updated (diff)
- Status changed from New to Workable
- Assignee set to jbaier_cz
- Status changed from Workable to In Progress
I looked around on openqaworker4 to see the current setup. If I understood correctly, the problem is that both openqa-worker-plain@.service
and openqa-worker-auto-restart@.service
are PartOf=openqa-worker.target
. The easy workaround would be to mask the other (unused) service, which if I interpret the history correctly was done on that particular worker, we can document this as the solution. However I believe we can improve also by introducing Conflicts=
in our unit files. According to documentation:
If a unit has a Conflicts= setting on another unit, starting the former will stop the latter and vice versa.
This should at least prevent both services to be running simultaneously. In our case, we are starting both services so the following applies.
If unit A that conflicts with unit B is scheduled to be started at the same time as B, the transaction will either fail (in case both are required parts of the transaction) or be modified to be fixed (in case one or both jobs are not a required part of the transaction). In the latter case, the job that is not required will be removed, or in case both are not required, the unit that conflicts will be started and the unit that is conflicted is stopped.
I will need to test the behavior and where to put the Conflicts=
for the best result.
- Related to action #109734: Better way to prevent conflicts between openqa-worker@ and openqa-worker-auto-restart@ variants size:M added
- Due date set to 2023-09-20
Setting due date based on mean cycle time of SUSE QE Tools
- Status changed from In Progress to Feedback
- Status changed from Feedback to Resolved
I did some tests, now the manual activation of openqa-worker-auto-restart@X.service
will stop the conflicting openqa-worker-plain@X.service
before starting and vice versa. Activating the worker target did start the missing services (via openqa-worker@X.service
symlink), but did not start the conflicting one.
- Due date deleted (
2023-09-20)
Also available in: Atom
PDF