Project

General

Profile

action #109734

Better way to prevent conflicts between openqa-worker@ and openqa-worker-auto-restart@ variants size:M

Added by okurz 3 months ago. Updated 12 days ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2022-04-09
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

Multiple times we have found users trying to start the systemd-services openqa-worker@ when instead we are running openqa-worker-auto-restart@. Last instance when that happened was in #109055. We should try to find a better way which is less confusing to users. At best we should have only openqa-worker@ and use configuration to solve the auto-restart requirement.

Acceptance criteria

  • AC1: We have an unambiguous solution for providing both variants of worker modes which are not confusing users
  • AC2: Ensure documentation covers the updated way

Suggestions

  • DONE: Research about systemd service best practices
  • DONE: Research why we chose to have separate systemd service at the time -> Likely because we need different settings on systemd level
  • DONE: Conduct a brainstorming session together, different ideas:
    • Replace openqa-worker@ by a symlink pointing to the real solution, e.g. "openqa-worker-auto-restart" and "openqa-worker-plain", similar to "network.service" which for us is a symlink in /etc/systemd/system pointing to e.g. /usr/lib/systemd/system/NetworkManager.service or /usr/lib/systemd/system/wicked.service
    • Alternative: just provide a drop-in file instead of separate systemd service file
    • Alternative: Potentially provide the drop-in file in an openSUSE package
    • Alternative: Solve it within a process itself so that systemd is not involved, e.g. same as hypnotoad or nginx
  • Should the services be actual "conflicts" on the level of systemd? -> Likely yes, but not a full solution. Could be done on top
  • Update documentation

Related issues

Related to openQA Infrastructure - action #109055: Broken workers alertResolved2022-03-28

History

#1 Updated by okurz 3 months ago

#2 Updated by okurz 3 months ago

  • Priority changed from Normal to Low

#3 Updated by okurz about 2 months ago

  • Subject changed from Better way to prevent conflicts between openqa-worker@ and openqa-worker-auto-restart@ variants to Better way to prevent conflicts between openqa-worker@ and openqa-worker-auto-restart@ variants size:M
  • Description updated (diff)
  • Status changed from New to Workable

#4 Updated by okurz about 2 months ago

Wicked solves it with the following code in the spec file:

%if %{with systemd}

%pre service
# upgrade from sysconfig[-network] scripts
_id=`readlink /etc/systemd/system/network.service 2>/dev/null` || :
if test "x${_id##*/}" = "xnetwork.service" -a -x /etc/init.d/network ; then
    /etc/init.d/network stop-all-dhcp-clients || :
fi
%{service_add_pre wicked.service}

%post service
%{service_add_post wicked.service}
# See bnc#843526: presets do not apply for upgrade / are not sufficient
#                 to handle sysconfig-network|wicked -> wicked migration
_id=`readlink /etc/systemd/system/network.service 2>/dev/null` || :
case "${_id##*/}" in
""|wicked.service|network.service)
    /usr/bin/systemctl --system daemon-reload || :
    /usr/bin/systemctl --force enable wicked.service || :
;;
esac

%preun service
# stop the daemons on removal
# - stopping wickedd should be sufficient ... other just to be sure.
# - stopping of the wicked.service does not stop network, but removes
#   the wicked.service --> network.service link and resets its status.
%{service_del_preun wickedd.service wickedd-auto4.service wickedd-dhcp4.service wickedd-dhcp6.service wickedd-nanny.service wicked.service}

%postun service
# restart wickedd after upgrade
%{service_del_postun wickedd.service}
…
%endif

however NetworkManager and also systemd-network do not provide that when I tried out the install in a container environment. I guess one should still research about a systemd ecosystem best practice here.

#5 Updated by okurz about 1 month ago

  • Tags set to reactive work
  • Parent task deleted (#80908)

#6 Updated by jbaier_cz 24 days ago

I think we can do the transition in to phases:

  1. Move from openqa-worker@.service to openqa-worker-plain@.service and provide symlink for the old service name pointing to the new one
  2. Update spec file to find out if openqa-worker@.service is a symlink and do not change it during updates

Did I miss something?

The PR for the first step is: https://github.com/os-autoinst/openQA/pull/4687

#7 Updated by okurz 23 days ago

  • Due date set to 2022-06-16
  • Status changed from Workable to In Progress
  • Assignee set to jbaier_cz

#8 Updated by jbaier_cz 23 days ago

As the PR is merged, next steps should be doing some spec file magic to not overwrite openqa-worker@.service symlink.

#9 Updated by jbaier_cz 18 days ago

  • Status changed from In Progress to Feedback

I just realized there is even better mechanism to preserve user changes: just put them in the /etc where they belong.

MR for salt to do exactly that: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/698

#10 Updated by jbaier_cz 17 days ago

  • Status changed from Feedback to Resolved

Changes are merged and deployed, documentation was also updated in https://github.com/os-autoinst/openQA/pull/4695

systemd is listing the correct unit:

openqaworker5:~>  systemctl cat openqa-worker@
# /usr/lib/systemd/system/openqa-worker-auto-restart@.service
...

So hopefully, this will help.

#11 Updated by okurz 17 days ago

  • Status changed from Resolved to Feedback

please review https://gitlab.suse.de/openqa/salt-states-openqa#remarks-about-the-systemd-units-used-to-start-workers as well. Maybe we can simplify the section and make it less error-prone now.

Please also consider using the openqa-worker@ symlinked service definition directly

#12 Updated by okurz 12 days ago

  • Due date deleted (2022-06-16)
  • Status changed from Feedback to Resolved

Also available in: Atom PDF