Project

General

Profile

Actions

action #109734

closed

Better way to prevent conflicts between openqa-worker@ and openqa-worker-auto-restart@ variants size:M

Added by okurz almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2022-04-09
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Multiple times we have found users trying to start the systemd-services openqa-worker@ when instead we are running openqa-worker-auto-restart@. Last instance when that happened was in #109055. We should try to find a better way which is less confusing to users. At best we should have only openqa-worker@ and use configuration to solve the auto-restart requirement.

Acceptance criteria

  • AC1: We have an unambiguous solution for providing both variants of worker modes which are not confusing users
  • AC2: Ensure documentation covers the updated way

Suggestions

  • DONE: Research about systemd service best practices
  • DONE: Research why we chose to have separate systemd service at the time -> Likely because we need different settings on systemd level
  • DONE: Conduct a brainstorming session together, different ideas:
    • Replace openqa-worker@ by a symlink pointing to the real solution, e.g. "openqa-worker-auto-restart" and "openqa-worker-plain", similar to "network.service" which for us is a symlink in /etc/systemd/system pointing to e.g. /usr/lib/systemd/system/NetworkManager.service or /usr/lib/systemd/system/wicked.service
    • Alternative: just provide a drop-in file instead of separate systemd service file
    • Alternative: Potentially provide the drop-in file in an openSUSE package
    • Alternative: Solve it within a process itself so that systemd is not involved, e.g. same as hypnotoad or nginx
  • Should the services be actual "conflicts" on the level of systemd? -> Likely yes, but not a full solution. Could be done on top
  • Update documentation

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #109055: Broken workers alertResolvedokurz2022-03-28

Actions
Related to openQA Project - action #133352: Activating systemd target openqa-worker.target when openqa-worker-auto-restart@ is already used causes havoc size:MResolvedjbaier_cz2023-07-26

Actions
Actions #1

Updated by okurz almost 2 years ago

Actions #2

Updated by okurz almost 2 years ago

  • Priority changed from Normal to Low
Actions #3

Updated by okurz almost 2 years ago

  • Subject changed from Better way to prevent conflicts between openqa-worker@ and openqa-worker-auto-restart@ variants to Better way to prevent conflicts between openqa-worker@ and openqa-worker-auto-restart@ variants size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by okurz almost 2 years ago

Wicked solves it with the following code in the spec file:

%if %{with systemd}

%pre service
# upgrade from sysconfig[-network] scripts
_id=`readlink /etc/systemd/system/network.service 2>/dev/null` || :
if test "x${_id##*/}" = "xnetwork.service" -a -x /etc/init.d/network ; then
    /etc/init.d/network stop-all-dhcp-clients || :
fi
%{service_add_pre wicked.service}

%post service
%{service_add_post wicked.service}
# See bnc#843526: presets do not apply for upgrade / are not sufficient
#                 to handle sysconfig-network|wicked -> wicked migration
_id=`readlink /etc/systemd/system/network.service 2>/dev/null` || :
case "${_id##*/}" in
""|wicked.service|network.service)
    /usr/bin/systemctl --system daemon-reload || :
    /usr/bin/systemctl --force enable wicked.service || :
;;
esac

%preun service
# stop the daemons on removal
# - stopping wickedd should be sufficient ... other just to be sure.
# - stopping of the wicked.service does not stop network, but removes
#   the wicked.service --> network.service link and resets its status.
%{service_del_preun wickedd.service wickedd-auto4.service wickedd-dhcp4.service wickedd-dhcp6.service wickedd-nanny.service wicked.service}

%postun service
# restart wickedd after upgrade
%{service_del_postun wickedd.service}
…
%endif

however NetworkManager and also systemd-network do not provide that when I tried out the install in a container environment. I guess one should still research about a systemd ecosystem best practice here.

Actions #5

Updated by okurz almost 2 years ago

  • Tags set to reactive work
  • Parent task deleted (#80908)
Actions #6

Updated by jbaier_cz almost 2 years ago

I think we can do the transition in to phases:

  1. Move from openqa-worker@.service to openqa-worker-plain@.service and provide symlink for the old service name pointing to the new one
  2. Update spec file to find out if openqa-worker@.service is a symlink and do not change it during updates

Did I miss something?

The PR for the first step is: https://github.com/os-autoinst/openQA/pull/4687

Actions #7

Updated by okurz almost 2 years ago

  • Due date set to 2022-06-16
  • Status changed from Workable to In Progress
  • Assignee set to jbaier_cz
Actions #8

Updated by jbaier_cz almost 2 years ago

As the PR is merged, next steps should be doing some spec file magic to not overwrite openqa-worker@.service symlink.

Actions #9

Updated by jbaier_cz almost 2 years ago

  • Status changed from In Progress to Feedback

I just realized there is even better mechanism to preserve user changes: just put them in the /etc where they belong.

MR for salt to do exactly that: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/698

Actions #10

Updated by jbaier_cz almost 2 years ago

  • Status changed from Feedback to Resolved

Changes are merged and deployed, documentation was also updated in https://github.com/os-autoinst/openQA/pull/4695

systemd is listing the correct unit:

openqaworker5:~>  systemctl cat openqa-worker@
# /usr/lib/systemd/system/openqa-worker-auto-restart@.service
...

So hopefully, this will help.

Actions #11

Updated by okurz almost 2 years ago

  • Status changed from Resolved to Feedback

please review https://gitlab.suse.de/openqa/salt-states-openqa#remarks-about-the-systemd-units-used-to-start-workers as well. Maybe we can simplify the section and make it less error-prone now.

Please also consider using the openqa-worker@ symlinked service definition directly

Actions #12

Updated by okurz almost 2 years ago

  • Due date deleted (2022-06-16)
  • Status changed from Feedback to Resolved
Actions #13

Updated by jbaier_cz 7 months ago

  • Related to action #133352: Activating systemd target openqa-worker.target when openqa-worker-auto-restart@ is already used causes havoc size:M added
Actions

Also available in: Atom PDF