Project

General

Profile

action #109298

salt-minion on grenache-1 is not working preventing OSD deployment

Added by mkittler 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2022-03-31
Due date:
% Done:

0%

Estimated time:

Description

The service is regularily restarting:

Mär 31 10:35:12 grenache-1 systemd[1]: Starting The Salt Minion...
Mär 31 10:35:13 grenache-1 systemd[1]: Started The Salt Minion.
Mär 31 10:35:14 grenache-1 salt-minion[754750]: The Salt Minion is shutdown.
Mär 31 10:35:14 grenache-1 systemd[1]: salt-minion.service: Main process exited, code=exited, status=1/FAILURE
Mär 31 10:35:14 grenache-1 systemd[1]: salt-minion.service: Failed with result 'exit-code'.
Mär 31 10:35:14 grenache-1 systemd[1]: salt-minion.service: Unit process 4311 (salt-minion) remains running after unit stopped.
Mär 31 10:35:14 grenache-1 systemd[1]: salt-minion.service: Unit process 565957 (salt-minion) remains running after unit stopped.
Mär 31 10:35:29 grenache-1 systemd[1]: salt-minion.service: Scheduled restart job, restart counter is at 1645.
Mär 31 10:35:29 grenache-1 systemd[1]: Stopped The Salt Minion.
Mär 31 10:35:29 grenache-1 systemd[1]: salt-minion.service: Found left-over process 4311 (salt-minion) in control group while starting unit. Ignoring.
Mär 31 10:35:29 grenache-1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Mär 31 10:35:29 grenache-1 systemd[1]: salt-minion.service: Found left-over process 565957 (salt-minion) in control group while starting unit. Ignoring.
Mär 31 10:35:29 grenache-1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Mär 31 10:35:29 grenache-1 systemd[1]: Starting The Salt Minion...
Mär 31 10:35:29 grenache-1 systemd[1]: Started The Salt Minion.
Mär 31 10:35:30 grenache-1 salt-minion[754846]: The Salt Minion is shutdown.
Mär 31 10:35:30 grenache-1 systemd[1]: salt-minion.service: Main process exited, code=exited, status=1/FAILURE
Mär 31 10:35:30 grenache-1 systemd[1]: salt-minion.service: Failed with result 'exit-code'.
Mär 31 10:35:30 grenache-1 systemd[1]: salt-minion.service: Unit process 4311 (salt-minion) remains running after unit stopped.
Mär 31 10:35:30 grenache-1 systemd[1]: salt-minion.service: Unit process 565957 (salt-minion) remains running after unit stopped.
Mär 31 10:35:45 grenache-1 systemd[1]: salt-minion.service: Scheduled restart job, restart counter is at 1646.

and appears unresponsive on the other end:

grenache-1.qa.suse.de:
    Minion did not return. [Not connected]
ERROR: Minions returned with non-zero exit code

History

#1 Updated by mkittler 3 months ago

/var/log/salt/minion:

<stream>
<message type="error">System management is locked by the application with pid 565869 (zypper).
Close this application before trying again.</message>
</stream>
2022-03-31 03:02:58,268 [salt.loaded.int.module.cmdmod:853 ][ERROR   ][565957] retcode: 7
2022-03-31 03:03:03,312 [salt.loaded.int.module.cmdmod:847 ][ERROR   ][565957] Command 'zypper' failed with return code: 7
2022-03-31 03:03:03,313 [salt.loaded.int.module.cmdmod:849 ][ERROR   ][565957] stdout: <?xml version='1.0'?>
<stream>
<message type="error">System management is locked by the application with pid 565869 (zypper).
Close this application before trying again.</message>
</stream>
2022-03-31 03:03:03,313 [salt.loaded.int.module.cmdmod:853 ][ERROR   ][565957] retcode: 7

Looks like zypper was just running in a user session:

grenache-1:/home/martchus # zypper dup
System management is locked by the application with pid 752307 (zypper).
Close this application before trying again.
grenache-1:/home/martchus # systemctl status 752307
● session-7.scope - Session 7 of user osukup

It is closed now but salt-minion is still not coming up.

#2 Updated by mkittler 3 months ago

  • Status changed from In Progress to Feedback

Despite the service being stopped an instance is already running:

grenache-1:/home/martchus # /usr/bin/salt-minion --log-level=debug
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: grenache-1.qa.suse.de
[DEBUG   ] Configuration file path: /etc/salt/minion
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[INFO    ] Setting up the Salt Minion "grenache-1.qa.suse.de"
[INFO    ] An instance is already running. Exiting the Salt Minion
[INFO    ] Shutting down the Salt Minion
[DEBUG   ] Stopping the multiprocessing logging queue listener
[DEBUG   ] closing multiprocessing queue
[DEBUG   ] joining multiprocessing queue thread
[DEBUG   ] Stopped the multiprocessing logging queue listener

Stopped the remaining process in htop, started the actual service again and re-triggered the deployment which is now passed the steps that previously failed.

#3 Updated by mkittler 3 months ago

  • Status changed from Feedback to Resolved

Not sure why salt-minion was lingering around but now the OSD deployment works.

Also available in: Atom PDF