action #131249
closed[alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:M
Added by okurz over 1 year ago. Updated about 1 year ago.
0%
Description
Observation¶
From https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1648467 and reproduced locally with sudo salt --no-color -C 'G@roles:worker' test.ping
:
grenache-1.qa.suse.de:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20230622084232610255
worker5.oqa.suse.de:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20230622084232610255
worker2.oqa.suse.de:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20230622084232610255
Acceptance criteria¶
- AC1:
salt \* test.ping
andsalt \* state.apply
succeeds consistently for more than one day - AC2: our salt states and pillar and osd deployment pipelines are green and stable again
Suggestions¶
- DONE It seems we might have had this problem for a while but never really that severly. Now it looks like those machines even if we trigger a reboot and restart salt-minion can end up with "no response" again. Maybe we can revert some recent package updates? From /var/log/zypp/history there is
2023-06-22 03:01:12|install|python3-pyzmq|17.1.2-150000.3.5.2|x86_64||repo-sle-update|e2d9d07654cffc31e5199f40aa1ba9fee1e114c4ca5abd78f7fdc78b2e6cc21a|
- DONE Debug the actual problem of hanging salt-minion. Maybe we can actually try to better trigger the problem, not prevent it?
- DONE Research upstream, apply workarounds, potentially try upgrade Leap 15.5 if that might fix something
Rollback steps¶
- DONE on worker2,worker3,worker5,grenache-1,openqaworker-arm-2,openqaworker-arm-3
sudo mv /etc/systemd/system/auto-update.$i{.disabled_poo131249,} && sudo systemctl enable --now auto-update.timer && sudo systemctl start auto-update
, remove manual override /etc/systemd/system/auto-update.service.d/override.conf, wait for upgrade to complete and reboot - DONE re-enable osd-deployment https://gitlab.suse.de/openqa/osd-deployment/-/pipeline_schedules/36/edit
- DONE remove silence https://stats.openqa-monitor.qa.suse.de/alerting/silences "alertname=Failed systemd services alert (except openqa.suse.de)"
- DONE remove package locks for anything related to salt
Updated by okurz over 1 year ago
I could still login to worker5 over ssh, salt minion still runs there.
Updated by osukup over 1 year ago
on worker2:
worker2:/home/osukup # ps -aux | grep salt
root 17052 0.1 0.0 0 0 ? Z 04:01 0:30 [salt-minion] <defunct>
root 24047 0.0 0.0 50452 27904 ? Ss 03:01 0:00 /usr/bin/python3 /usr/bin/salt-minion
root 24054 0.0 0.0 786284 74092 ? Sl 03:01 0:05 /usr/bin/python3 /usr/bin/salt-minion
Updated by mkittler over 1 year ago
- Status changed from New to In Progress
- Assignee set to mkittler
Updated by mkittler over 1 year ago
Jun 22 03:01:12 grenache-1 salt-minion[693500]: /usr/lib/python3.6/site-packages/salt/transport/client.py:81: DeprecationWarning: This module is deprecated. Please use salt.channel.client instead.
Jun 22 03:01:12 grenache-1 salt-minion[693500]: "This module is deprecated. Please use salt.channel.client instead.",
Jun 22 03:01:12 grenache-1 salt-minion[693500]: [WARNING ] Got events for closed stream None
Jun 22 04:05:23 grenache-1 salt-minion[699658]: /usr/lib/python3.6/site-packages/salt/states/x509.py:214: DeprecationWarning: The x509 modules are deprecated. Please migrate to the replacement modules (x509_v2). They are the default from Salt 3008 (Argon) onwards.
Jun 22 04:05:23 grenache-1 salt-minion[699658]: "The x509 modules are deprecated. Please migrate to the replacement "
At least the last deprecation warning we also get an other hosts. Likely those warnings aren't the culprit. However, otherwise nothing has been logged since the last restart of the service except for
2023-06-22 03:01:12,130 [tornado.general :444 ][WARNING ][693500] Got events for closed stream None
Updated by mkittler over 1 year ago
Restarting the Minion services on the affected hosts helped. This is not the first time I see salt-minion.service
being stuck. Not sure what is causing this from time to time.
Yesterday afternoon it definitely still worked on all workers so the relevant timeframe is quite small. However, I couldn't spot anything in the logs.
Updated by okurz over 1 year ago
- Status changed from In Progress to Resolved
Yes, all salt minions seem to be reachable again. Next time if reproduced I suggest to attach to the blocked processes with strace of check open file handles with lsof.
I restarted failed tests in
https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/710619
The deployment step has just started and all macahines are reachable again so we are good here.
Updated by mkittler over 1 year ago
- Status changed from Resolved to In Progress
Looks like it got stuck again on grenache-1:
martchus@grenache-1:~> sudo strace -p 731998
strace: Process 731998 attached
wait4(732003,
^Cstrace: Process 731998 detached
<detached ...>
martchus@grenache-1:~> sudo strace -p 732003
strace: Process 732003 attached
futex(0x7fff7c001c10, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY
^Cstrace: Process 732003 detached
<detached ...>
martchus@grenache-1:~> ps aux | grep -i salt
root 731998 0.0 0.0 42560 37376 ? Ss 12:36 0:00 /usr/bin/python3 /usr/bin/salt-minion
root 732003 0.1 0.0 560960 83328 ? Sl 12:36 0:05 /usr/bin/python3 /usr/bin/salt-minion
root 738594 4.9 0.0 0 0 ? Z 13:36 1:03 [salt-minion] <defunct>
martchus 743308 0.0 0.0 8192 1344 pts/1 S+ 13:58 0:00 grep --color=auto -i salt
lsof -p …
doesn't show anything special and the list is empty for the zombie process. It looks like one of the processes is stuck in a deadlock. There are no coredumps by the way.
Updated by mkittler over 1 year ago
Backtrace via gdb
of the process stuck on the futex wait:
* 1 Thread 0x7fff9c5d49b0 (LWP 732003) "salt-minion" 0x00007fff9c117e0c in do_futex_wait.constprop () from /lib64/libpthread.so.0
2 Thread 0x7fff9a11f180 (LWP 732004) "salt-minion" 0x00007fff9c00c888 in select () from /lib64/libc.so.6
3 Thread 0x7fff935af180 (LWP 732007) "ZMQbg/0" 0x00007fff9c01a56c in epoll_wait () from /lib64/libc.so.6
4 Thread 0x7fff92d9f180 (LWP 732008) "ZMQbg/1" 0x00007fff9c01a56c in epoll_wait () from /lib64/libc.so.6
5 Thread 0x7fff9258f180 (LWP 732013) "ZMQbg/4" 0x00007fff9c01a56c in epoll_wait () from /lib64/libc.so.6
6 Thread 0x7fff91d7f180 (LWP 732014) "ZMQbg/5" 0x00007fff9c01a56c in epoll_wait () from /lib64/libc.so.6
7 Thread 0x7fff83fff180 (LWP 741516) "salt-minion" 0x00007fff9c01a56c in epoll_wait () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fff9c117e0c in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1 0x00007fff9c117fb8 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2 0x00007fff9c34cff4 in PyThread_acquire_lock_timed () from /usr/lib64/libpython3.6m.so.1.0
#3 0x00007fff9c3544ec in ?? () from /usr/lib64/libpython3.6m.so.1.0
#4 0x00007fff9c354700 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#5 0x00007fff9c244aac in _PyCFunction_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#6 0x00007fff9c2ed908 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#7 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#8 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#9 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#10 0x00007fff9c2ed3c8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#11 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#12 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#13 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#14 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#15 0x00007fff9c2ed3c8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#16 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#17 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#18 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#19 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#20 0x00007fff9c2ed3c8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#21 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#22 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#23 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#24 0x00007fff9c208b2c in ?? () from /usr/lib64/libpython3.6m.so.1.0
#25 0x00007fff9c2e55ac in ?? () from /usr/lib64/libpython3.6m.so.1.0
#26 0x00007fff9c2449f4 in _PyCFunction_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#27 0x00007fff9c2ed908 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#28 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#29 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#30 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#31 0x00007fff9c2ed3c8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#32 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#33 0x00007fff9c2f4558 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#34 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#35 0x00007fff9c208b2c in ?? () from /usr/lib64/libpython3.6m.so.1.0
#36 0x00007fff9c2e55ac in ?? () from /usr/lib64/libpython3.6m.so.1.0
#37 0x00007fff9c2449f4 in _PyCFunction_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#38 0x00007fff9c2ed908 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#39 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#40 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#41 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#42 0x00007fff9c2ed3c8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#43 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#44 0x00007fff9c2f4558 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#45 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#46 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#47 0x00007fff9c2ed3c8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#48 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#49 0x00007fff9c2f4558 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#50 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#51 0x00007fff9c208b2c in ?? () from /usr/lib64/libpython3.6m.so.1.0
#52 0x00007fff9c2e55ac in ?? () from /usr/lib64/libpython3.6m.so.1.0
#53 0x00007fff9c2449f4 in _PyCFunction_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#54 0x00007fff9c2ed908 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#55 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#56 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#57 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#58 0x00007fff9c2ed3c8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#59 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#60 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#61 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#62 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#63 0x00007fff9c2ed9f8 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.6m.so.1.0
#64 0x00007fff9c215148 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#65 0x00007fff9c1d4ae8 in PyObject_Call () from /usr/lib64/libpython3.6m.so.1.0
#66 0x00007fff9c2f3e88 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#67 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#68 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#69 0x00007fff9c2ed9f8 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.6m.so.1.0
#70 0x00007fff9c215148 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#71 0x00007fff9c1d4ae8 in PyObject_Call () from /usr/lib64/libpython3.6m.so.1.0
#72 0x00007fff9c2f3e88 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#73 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#74 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#75 0x00007fff9c2f8b4c in _PyFunction_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#76 0x00007fff9c1d4ed0 in _PyObject_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#77 0x00007fff9c1d5210 in _PyObject_Call_Prepend () from /usr/lib64/libpython3.6m.so.1.0
#78 0x00007fff9c1f43e8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#79 0x00007fff9c1d4ae8 in PyObject_Call () from /usr/lib64/libpython3.6m.so.1.0
#80 0x00007fff9c2f3e88 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
--Type <RET> for more, q to quit, c to continue without paging--c
#81 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#82 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#83 0x00007fff9c2f8c5c in _PyFunction_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#84 0x00007fff9c1d4ed0 in _PyObject_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#85 0x00007fff9c381d1c in ?? () from /usr/lib64/libpython3.6m.so.1.0
#86 0x00007fff9c1d4d84 in _PyObject_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#87 0x00007fff9c2ed654 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#88 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#89 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#90 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#91 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#92 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#93 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#94 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#95 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#96 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#97 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#98 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#99 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#100 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#101 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#102 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#103 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#104 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#105 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#106 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#107 0x00007fff9c2ed3c8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#108 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#109 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#110 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#111 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#112 0x00007fff9c2ed9f8 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.6m.so.1.0
#113 0x00007fff9c215148 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#114 0x00007fff9c1d4ae8 in PyObject_Call () from /usr/lib64/libpython3.6m.so.1.0
#115 0x00007fff9c2f3e88 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#116 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#117 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#118 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#119 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#120 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#121 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#122 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#123 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#124 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#125 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#126 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#127 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#128 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#129 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#130 0x00007fff9c2f8d4c in _PyFunction_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#131 0x00007fff9c1d4ed0 in _PyObject_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#132 0x00007fff9c1d5210 in _PyObject_Call_Prepend () from /usr/lib64/libpython3.6m.so.1.0
#133 0x00007fff9c1f43e8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#134 0x00007fff9c1d4ae8 in PyObject_Call () from /usr/lib64/libpython3.6m.so.1.0
#135 0x00007fff9c26c858 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#136 0x00007fff9c267230 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#137 0x00007fff9c1d4d84 in _PyObject_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#138 0x00007fff9c2ed654 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#139 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#140 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#141 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#142 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#143 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#144 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#145 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#146 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#147 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#148 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#149 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#150 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#151 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#152 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#153 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#154 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#155 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#156 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#157 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#158 0x00007fff9c2ed1c0 in PyEval_EvalCode () from /usr/lib64/libpython3.6m.so.1.0
#159 0x00007fff9c32bc54 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#160 0x00007fff9c32ec58 in PyRun_FileExFlags () from /usr/lib64/libpython3.6m.so.1.0
#161 0x00007fff9c32eeb8 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython3.6m.so.1.0
#162 0x00007fff9c350860 in Py_Main () from /usr/lib64/libpython3.6m.so.1.0
#163 0x000000012f9d0ea8 in main ()
(gdb) bt
#0 0x00007fff9c00c888 in select () from /lib64/libc.so.6
#1 0x00007fff9c39df54 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#2 0x00007fff9c24491c in _PyCFunction_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#3 0x00007fff9c2ed908 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#4 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#5 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#6 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#7 0x00007fff9c2ed9f8 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.6m.so.1.0
#8 0x00007fff9c215148 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#9 0x00007fff9c1d4ae8 in PyObject_Call () from /usr/lib64/libpython3.6m.so.1.0
#10 0x00007fff9c2f3e88 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#11 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#12 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#13 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#14 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#15 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#16 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#17 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#18 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#19 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#20 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#21 0x00007fff9c2f8d4c in _PyFunction_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#22 0x00007fff9c1d4ed0 in _PyObject_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#23 0x00007fff9c1d5210 in _PyObject_Call_Prepend () from /usr/lib64/libpython3.6m.so.1.0
#24 0x00007fff9c1f43e8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#25 0x00007fff9c1d4ae8 in PyObject_Call () from /usr/lib64/libpython3.6m.so.1.0
#26 0x00007fff9c2ee248 in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython3.6m.so.1.0
#27 0x00007fff9c353f84 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#28 0x00007fff9c34ca60 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#29 0x00007fff9c109748 in start_thread () from /lib64/libpthread.so.0
#30 0x00007fff9c01a084 in clone () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fff9c01a56c in epoll_wait () from /lib64/libc.so.6
#1 0x00007fff9966c4d8 in ?? () from /usr/lib64/libzmq.so.5
#2 0x00007fff996b3ab0 in ?? () from /usr/lib64/libzmq.so.5
#3 0x00007fff9c109748 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fff9c01a084 in clone () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fff9c01a56c in epoll_wait () from /lib64/libc.so.6
#1 0x00007fff9966c4d8 in ?? () from /usr/lib64/libzmq.so.5
#2 0x00007fff996b3ab0 in ?? () from /usr/lib64/libzmq.so.5
#3 0x00007fff9c109748 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fff9c01a084 in clone () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fff9c01a56c in epoll_wait () from /lib64/libc.so.6
#1 0x00007fff9966c4d8 in ?? () from /usr/lib64/libzmq.so.5
#2 0x00007fff996b3ab0 in ?? () from /usr/lib64/libzmq.so.5
#3 0x00007fff9c109748 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fff9c01a084 in clone () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fff9c01a56c in epoll_wait () from /lib64/libc.so.6
#1 0x00007fff9966c4d8 in ?? () from /usr/lib64/libzmq.so.5
#2 0x00007fff996b3ab0 in ?? () from /usr/lib64/libzmq.so.5
#3 0x00007fff9c109748 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fff9c01a084 in clone () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fff9c01a56c in epoll_wait () from /lib64/libc.so.6
#1 0x00007fff9b7c28c0 in ?? () from /usr/lib64/python3.6/lib-dynload/select.cpython-36m-powerpc64le-linux-gnu.so
#2 0x00007fff9c244aac in _PyCFunction_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#3 0x00007fff9c2ed908 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#4 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#5 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#6 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#7 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#8 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#9 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#10 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#11 0x00007fff9c2ed3c8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#12 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#13 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#14 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#15 0x00007fff9c2ed060 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#16 0x00007fff9c2f8b4c in _PyFunction_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#17 0x00007fff9c1d4ed0 in _PyObject_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#18 0x00007fff9c1d5210 in _PyObject_Call_Prepend () from /usr/lib64/libpython3.6m.so.1.0
#19 0x00007fff9c1f43e8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#20 0x00007fff9c1d4ae8 in PyObject_Call () from /usr/lib64/libpython3.6m.so.1.0
#21 0x00007fff9c2f3e88 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#22 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#23 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#24 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#25 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#26 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#27 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#28 0x00007fff9c2ed7a8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#29 0x00007fff9c2f1694 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.6m.so.1.0
#30 0x00007fff9c2ec3d4 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.6m.so.1.0
#31 0x00007fff9c2ed284 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#32 0x00007fff9c2f8d4c in _PyFunction_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#33 0x00007fff9c1d4ed0 in _PyObject_FastCallDict () from /usr/lib64/libpython3.6m.so.1.0
#34 0x00007fff9c1d5210 in _PyObject_Call_Prepend () from /usr/lib64/libpython3.6m.so.1.0
#35 0x00007fff9c1f43e8 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#36 0x00007fff9c1d4ae8 in PyObject_Call () from /usr/lib64/libpython3.6m.so.1.0
#37 0x00007fff9c2ee248 in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython3.6m.so.1.0
#38 0x00007fff9c353f84 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#39 0x00007fff9c34ca60 in ?? () from /usr/lib64/libpython3.6m.so.1.0
#40 0x00007fff9c109748 in start_thread () from /lib64/libpthread.so.0
#41 0x00007fff9c01a084 in clone () from /lib64/libc.so.6
On other hosts none of the processes is stuck on a futex wait so that's likely not normal, indeed.
I created coredumps of both running processes. One could open that on grenache via sudo gdb --core=/home/martchus/core.732003
but unfortunately this lacks the symbol names then.
I also tried to generate a Python backtrace but it is useless because debug info is missing. gdb says one should install it via zypper install python3-base-debuginfo-3.6.15-150300.10.48.1.ppc64le
but that particular version doesn't exist and just installing python3-base-debuginfo
doesn't help.
Updated by mkittler over 1 year ago
Not sure how to make sense of this without diving deeply into salt's internals. It at least looks like we're not the only one's having trouble with salt being stuck:
- https://github.com/saltstack/salt/issues/55710 (stale and supposedly unresolved)
- https://github.com/saltstack/salt/issues/58159 (still open)
Both issues mention futex_wait
specifically.
Updated by okurz over 1 year ago
mkittler wrote:
Not sure how to make sense of this without diving deeply into salt's internals
I recommend:
- write "me too" with a reference to this ticket in at least one of the upstream ones. Preferrably more details than just "me too" :)
- apply or at least document in this ticket a workaround that works for us, e.g. reboot machine or whatever
Updated by openqa_review over 1 year ago
- Due date set to 2023-07-07
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz over 1 year ago
- Subject changed from [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt minion does not return to [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt minion does not return size:M
- Description updated (diff)
Updated by okurz over 1 year ago
Again test.ping does not return for worker2, worker5 and grenache-1 so exactly the same machines that were problematic as before. From worker2:
$ ssh worker2.oqa.suse.de
Last login: Mon Jun 19 11:24:42 2023 from 2620:113:80c0:8360::107a
okurz@worker2:~> sudo ps auxf | grep minion
okurz 14681 0.0 0.0 8200 768 pts/0 S+ 12:54 0:00 \_ grep --color=auto minion
root 25456 0.0 0.0 50452 27872 ? Ss Jun22 0:00 /usr/bin/python3 /usr/bin/salt-minion
root 25461 0.0 0.0 647540 73528 ? Sl Jun22 0:06 \_ /usr/bin/python3 /usr/bin/salt-minion
root 14984 0.0 0.0 0 0 ? Z Jun22 0:26 \_ [salt-minion] <defunct>
okurz@worker2:~> systemctl status salt-minion
● salt-minion.service - The Salt Minion
Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2023-06-22 12:38:18 CEST; 24h ago
Main PID: 25456 (salt-minion)
Tasks: 9 (limit: 4915)
CGroup: /system.slice/salt-minion.service
├─ 25456 /usr/bin/python3 /usr/bin/salt-minion
└─ 25461 /usr/bin/python3 /usr/bin/salt-minion
Warning: some journal files were not opened due to insufficient permissions.
okurz@worker2:~> sudo strace -p 25461
strace: Process 25461 attached
futex(0x7f3f70000b50, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY^Cstrace: Process 25461 detached
maybe we have better luck debugging here.
Updated by mkittler over 1 year ago
I installed debug packages but couldn't produce a backtrace. Maybe I can try again tomorrow if the issue happens again. Otherwise, the debug packages should be installed again before closing this ticket (zypper rm $(zypper se -i debuginfo | grep -i name | sed -e 's|.*name="\([^"]*\)".*|\1|')
).
apply or at least document in this ticket a workaround that works for us, e.g. reboot machine or whatever
I guess the workaround is to simply restart the service. You've already came up with a restart-loop so maybe we can run something similar in a more automated way.
Updated by okurz over 1 year ago
I did
for i in worker5.oqa.suse.de openqaworker-arm-2.suse.de worker2.oqa.suse.de grenache-1.qa.suse.de openqaworker-arm-3.suse.de; do ssh $i "sudo systemctl restart salt-minion"; done && ssh osd "sudo salt \* test.ping
and after that the same for openqaworker18.qa.suse.cz and then trying again three times and eventually test.ping returned ok for all currently salt controlled machines. So at least that would work as workaround.
Next day, 2023-06-24, again w2,w5,w18,arm-2,arm-3,grenache-1 would not respond, others are fine. Trying for i in openqaworker18.qa.suse.cz worker5.oqa.suse.de openqaworker-arm-2.suse.de worker2.oqa.suse.de grenache-1.qa.suse.de openqaworker-arm-3.suse.de; do ssh $i "sudo systemctl restart salt-minion"; done && ssh osd "sudo salt \* cmd.run 'uptime; rpm -q ffmpeg-4'"
. Immediatetly after resetting salt minion the commands are correctly executed just fine.
Now I am trying timeout 7200 sh -c 'for i in {1..7200}; do echo "### Run $i -- $(date -Is)" && salt --no-color \* test.ping; done'
to see when or how the responsiveness breaks down.
In run 917, after around 1h the first minion failed to respond, w5. In run 919 w2 followed, in run 921 grenache-1 followed.
In parallel to find out if I can make another machine break I tried to find out if there are any updates missing on so far not affected machines. On worker13 I found some updates pending so I installed them now:
The following 20 packages are going to be upgraded:
libatomic1
12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15
SUSE LLC <https://www.suse.com/>
libgcc_s1
12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15
SUSE LLC <https://www.suse.com/>
libgcc_s1-32bit
12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15
SUSE LLC <https://www.suse.com/>
libgfortran5
12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15
SUSE LLC <https://www.suse.com/>
libgomp1
12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15
SUSE LLC <https://www.suse.com/>
libitm1
12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15
SUSE LLC <https://www.suse.com/>
liblsan0
12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15
SUSE LLC <https://www.suse.com/>
libquadmath0
12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15
SUSE LLC <https://www.suse.com/>
libstdc++6
12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15
SUSE LLC <https://www.suse.com/>
libstdc++6-32bit
12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15
SUSE LLC <https://www.suse.com/>
libstdc++6-pp
12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15
SUSE LLC <https://www.suse.com/>
libstdc++6-pp-32bit
12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15
SUSE LLC <https://www.suse.com/>
openQA-client
4.6.1687510203.8d9fc92-lp154.5910.1 -> 4.6.1687532073.e11feac-lp154.5912.1 x86_64 devel_openQA
obs://build.opensuse.org/devel:openQA
openQA-common
4.6.1687510203.8d9fc92-lp154.5910.1 -> 4.6.1687532073.e11feac-lp154.5912.1 x86_64 devel_openQA
obs://build.opensuse.org/devel:openQA
openQA-worker
4.6.1687510203.8d9fc92-lp154.5910.1 -> 4.6.1687532073.e11feac-lp154.5912.1 x86_64 devel_openQA
obs://build.opensuse.org/devel:openQA
os-autoinst
4.6.1687515905.8e765fc-lp154.1597.1 -> 4.6.1687532294.4a46169-lp154.1598.1 x86_64 devel_openQA
obs://build.opensuse.org/devel:openQA
os-autoinst-devel
4.6.1687515905.8e765fc-lp154.1597.1 -> 4.6.1687532294.4a46169-lp154.1598.1 x86_64 devel_openQA
obs://build.opensuse.org/devel:openQA
os-autoinst-distri-opensuse-deps
1.1687520240.f97f61fa-lp154.12375.1 -> 1.1687528477.44bada55-lp154.12376.1 noarch devel_openQA
obs://build.opensuse.org/devel:openQA
os-autoinst-openvswitch
4.6.1687515905.8e765fc-lp154.1597.1 -> 4.6.1687532294.4a46169-lp154.1598.1 x86_64 devel_openQA
obs://build.opensuse.org/devel:openQA
os-autoinst-swtpm
4.6.1687515905.8e765fc-lp154.1597.1 -> 4.6.1687532294.4a46169-lp154.1598.1 x86_64 devel_openQA
obs://build.opensuse.org/devel:openQA
But so far the host did not stop responding.
Updated by okurz over 1 year ago
- Description updated (diff)
I want to crosscheck if any recent package installations triggered this. On worker5:
# snapper ls
# | Type | Pre # | Date | User | Cleanup | Description | Userdata
------+--------+-------+--------------------------+------+---------+-----------------------+--------------
0 | single | | | root | | current |
1* | single | | Fri Jan 13 10:36:01 2017 | root | | first root filesystem |
2895 | pre | | Sun May 14 03:37:22 2023 | root | number | zypp(zypper) | important=yes
2896 | post | 2895 | Sun May 14 03:38:55 2023 | root | number | | important=yes
2905 | pre | | Thu May 18 08:11:58 2023 | root | number | zypp(zypper) | important=yes
2906 | post | 2905 | Thu May 18 08:13:06 2023 | root | number | | important=yes
2921 | pre | | Thu May 25 07:57:26 2023 | root | number | zypp(zypper) | important=yes
2922 | post | 2921 | Thu May 25 07:58:38 2023 | root | number | | important=yes
2947 | pre | | Thu Jun 8 08:14:50 2023 | root | number | zypp(zypper) | important=yes
2948 | post | 2947 | Thu Jun 8 08:16:17 2023 | root | number | | important=yes
2955 | pre | | Tue Jun 13 07:22:58 2023 | root | number | zypp(zypper) | important=yes
2956 | post | 2955 | Tue Jun 13 07:23:28 2023 | root | number | | important=yes
2985 | pre | | Thu Jun 22 13:27:51 2023 | root | number | zypp(zypper) | important=no
2986 | pre | | Thu Jun 22 13:28:06 2023 | root | number | zypp(zypper) | important=no
2987 | pre | | Thu Jun 22 13:28:22 2023 | root | number | zypp(zypper) | important=no
2988 | pre | | Thu Jun 22 13:28:37 2023 | root | number | zypp(zypper) | important=no
2989 | pre | | Thu Jun 22 14:15:38 2023 | root | number | zypp(zypper) | important=no
2990 | post | 2989 | Thu Jun 22 14:15:46 2023 | root | number | | important=no
2991 | pre | | Fri Jun 23 14:38:32 2023 | root | number | zypp(zypper) | important=no
2992 | post | 2991 | Fri Jun 23 14:38:39 2023 | root | number | | important=no
2993 | pre | | Fri Jun 23 14:39:36 2023 | root | number | zypp(zypper) | important=no
2994 | post | 2993 | Fri Jun 23 14:40:10 2023 | root | number | | important=no
worker5:/home/okurz # snapper rollback 2955
Ambit is classic.
Creating read-only snapshot of current system. (Snapshot 2995.)
Creating read-write snapshot of snapshot 2955. (Snapshot 2996.)
Setting default subvolume to snapshot 2996.
# sudo systemctl disable --now auto-update.timer
# reboot
added rollback steps for worker5. Again running the experiment to restart salt-minion on worker2.oqa.suse.de grenache-1.qa.suse.de openqaworker-arm-2.suse.de openqaworker-arm-3.suse.de
and then run a test.ping
salt call in a loop.
https://github.com/saltstack/salt/issues/56467 looks related, last update in 2022-02, potentially also https://bugzilla.suse.com/show_bug.cgi?id=1135756
EDIT: 2023-06-25: hm, worker5 was unresponsive again but also the auto-update timer was enabled again. Well, yesterday I couldn't mask it because auto-update.timer is a custom timer deployed by our salt states. So instead I will prevent the enablement of the timer and service by
for i in service timer; do mv /etc/systemd/system/auto-update.$i{,.disabled_poo131249}; done
snapper rollback 2955
reboot
By the way, salt-run manage.status
provides a good overview over which nodes are considered down from OSD salt point of view. Right now this shows many down:
down:
- grenache-1.qa.suse.de
- openqaworker-arm-2.suse.de
- openqaworker-arm-3.suse.de
- openqaworker16.qa.suse.cz
- openqaworker17.qa.suse.cz
- openqaworker18.qa.suse.cz
- worker2.oqa.suse.de
- worker3.oqa.suse.de
- worker5.oqa.suse.de
which I consider quite bad. Ok, worker5.oqa.suse.de is back for now due to my recovery. For the others I am doing as in before a restart of salt minion, then continuing my test.ping experiment.
After roughly 10h w5 at least is still reachable. Interestingly so is w3,w16,17,18. Though w2,g1,arm2,arm3 still unreachable. On w2 I did for i in service timer; do sudo mv /etc/systemd/system/auto-update.$i{,.disabled_poo131249}; done && sudo snapper rollback 3031 && sudo reboot
. Correspondingly on arm2: for i in service timer; do sudo mv /etc/systemd/system/auto-update.$i{,.disabled_poo131249}; done && sudo snapper rollback 2698 && sudo reboot
and on arm3: for i in service timer; do sudo mv /etc/systemd/system/auto-update.$i{,.disabled_poo131249}; done && sudo snapper rollback 2426 && sudo reboot
and on grenache-1: for i in service timer; do sudo mv /etc/systemd/system/auto-update.$i{,.disabled_poo131249}; done && sudo snapper rollback 1263 && sudo reboot
. Waiting for all to reboot, then let's see if that helps. Running "test.ping" in a loop again.
EDIT: Eventually still g1,arm2,arm3,w2,w5 were unresponsive again. But again auto update was running at least on w5. I was again making mistakes. It's either salt enabling the auto update service again or it's because I disabled the auto update service before rolling back. Instead now I have rolled back on worker5 and then after reboot with systemctl edit auto-update.service
replaced ExecStart with an echo call instead of the real one, /etc/systemd/system/auto-update.service.d/override.conf
is now:
[Service]
ExecStart=
ExecStart=/usr/bin/echo 'Not running auto-update, see https://progress.opensuse.org/issues/131249'
Updated by okurz over 1 year ago
- Related to action #130835: salt high state fails after recent merge requests in salt pillars size:M added
Updated by okurz over 1 year ago
- Assignee changed from mkittler to okurz
continuing my downgrade experiment as discussed with mkittler.
Updated by okurz over 1 year ago
on grenache-1
sudo snapper rollback 1263 && sudo reboot
wait for reboot and then
sudo mkdir -p /etc/systemd/system/auto-update.service.d && echo -e "[Service]\nExecStart=\nExecStart=/usr/bin/echo 'Not running auto-update, see https://progress.opensuse.org/issues/131249'" | sudo tee /etc/systemd/system/auto-update.service.d/override.conf && sudo systemctl daemon-reload
and now test.ping is failing with 'str' object has no attribute 'pop'
most of the time. Ok, after some time it seems to work fine again.
EDIT: 2023-06-27: w2,arm2,arm3 stopped responding, the other still seem fine, supporting my hypothesis of regression due to package upgrade. From worker5 the full list of packages with pending upgrade:
autoyast2 4.4.43-150400.3.16.1 -> 4.4.45-150400.3.19.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
autoyast2-installation 4.4.43-150400.3.16.1 -> 4.4.45-150400.3.19.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
cups 2.2.7-150000.3.43.1 -> 2.2.7-150000.3.46.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
cups-client 2.2.7-150000.3.43.1 -> 2.2.7-150000.3.46.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
cups-config 2.2.7-150000.3.43.1 -> 2.2.7-150000.3.46.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libatomic1 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libbluetooth3 5.62-150400.4.10.3 -> 5.62-150400.4.13.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libcups2 2.2.7-150000.3.43.1 -> 2.2.7-150000.3.46.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libcups2-32bit 2.2.7-150000.3.43.1 -> 2.2.7-150000.3.46.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libcupscgi1 2.2.7-150000.3.43.1 -> 2.2.7-150000.3.46.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libcupsimage2 2.2.7-150000.3.43.1 -> 2.2.7-150000.3.46.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libcupsmime1 2.2.7-150000.3.43.1 -> 2.2.7-150000.3.46.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libcupsppdc1 2.2.7-150000.3.43.1 -> 2.2.7-150000.3.46.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libgcc_s1 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libgcc_s1-32bit 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libgfortran5 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libgomp1 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libitm1 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libldap-2_4-2 2.4.46-150200.14.11.2 -> 2.4.46-150200.14.14.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libldap-2_4-2-32bit 2.4.46-150200.14.11.2 -> 2.4.46-150200.14.14.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libldap-data 2.4.46-150200.14.11.2 -> 2.4.46-150200.14.14.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
liblsan0 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libopenssl1_0_0 1.0.2p-150000.3.76.1 -> 1.0.2p-150000.3.79.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libopenssl1_0_0-32bit 1.0.2p-150000.3.76.1 -> 1.0.2p-150000.3.79.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libpython3_6m1_0 3.6.15-150300.10.45.1 -> 3.6.15-150300.10.48.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libquadmath0 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libsolv-tools 0.7.24-150400.3.6.4 -> 0.7.24-150400.3.8.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libstdc++6 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libstdc++6-32bit 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libX11-6 1.6.5-150000.3.27.1 -> 1.6.5-150000.3.30.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libX11-data 1.6.5-150000.3.27.1 -> 1.6.5-150000.3.30.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libX11-devel 1.6.5-150000.3.27.1 -> 1.6.5-150000.3.30.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libX11-xcb1 1.6.5-150000.3.27.1 -> 1.6.5-150000.3.30.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libyui16 4.3.3-150400.1.5 -> 4.3.7-150400.3.3.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libyui-ncurses16 4.3.3-150400.1.5 -> 4.3.7-150400.3.3.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libyui-ncurses-pkg16 4.3.3-150400.1.8 -> 4.3.7-150400.3.3.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libzck1 1.1.16-150400.3.2.1 -> 1.1.16-150400.3.4.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libzypp 17.31.11-150400.3.25.2 -> 17.31.13-150400.3.32.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
openQA-client 4.6.1686317795.57b586f-lp154.5868.1 -> 4.6.1687790479.74f3352-lp154.5916.1 x86_64 devel_openQA obs://build.opensuse.org/devel:openQA
openQA-common 4.6.1686317795.57b586f-lp154.5868.1 -> 4.6.1687790479.74f3352-lp154.5916.1 x86_64 devel_openQA obs://build.opensuse.org/devel:openQA
openQA-worker 4.6.1686317795.57b586f-lp154.5868.1 -> 4.6.1687790479.74f3352-lp154.5916.1 x86_64 devel_openQA obs://build.opensuse.org/devel:openQA
os-autoinst 4.6.1686321776.9b5f5e8-lp154.1588.1 -> 4.6.1687771504.520c460-lp154.1600.1 x86_64 devel_openQA obs://build.opensuse.org/devel:openQA
os-autoinst-devel 4.6.1686321776.9b5f5e8-lp154.1588.1 -> 4.6.1687771504.520c460-lp154.1600.1 x86_64 devel_openQA obs://build.opensuse.org/devel:openQA
os-autoinst-distri-opensuse-deps 1.1686319656.79e363bc-lp154.12295.1 -> 1.1687792629.4b158c58-lp154.12382.1 noarch devel_openQA obs://build.opensuse.org/devel:openQA
os-autoinst-openvswitch 4.6.1686321776.9b5f5e8-lp154.1588.1 -> 4.6.1687771504.520c460-lp154.1600.1 x86_64 devel_openQA obs://build.opensuse.org/devel:openQA
os-autoinst-swtpm 4.6.1686321776.9b5f5e8-lp154.1588.1 -> 4.6.1687771504.520c460-lp154.1600.1 x86_64 devel_openQA obs://build.opensuse.org/devel:openQA
python3 3.6.15-150300.10.45.1 -> 3.6.15-150300.10.48.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-base 3.6.15-150300.10.45.1 -> 3.6.15-150300.10.48.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-curses 3.6.15-150300.10.45.1 -> 3.6.15-150300.10.48.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-dbm 3.6.15-150300.10.45.1 -> 3.6.15-150300.10.48.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-ply 3.10-1.27 -> 3.10-150000.3.3.4 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-pyzmq 17.1.2-3.3.1 -> 17.1.2-150000.3.5.2 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-salt 3004-150400.8.25.1 -> 3006.0-150400.8.34.2 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-simplejson 3.17.2-1.10 -> 3.17.2-150300.3.2.3 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-solv 0.7.24-150400.3.6.4 -> 0.7.24-150400.3.8.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-tk 3.6.15-150300.10.45.1 -> 3.6.15-150300.10.48.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python-solv 0.7.24-150400.3.6.4 -> 0.7.24-150400.3.8.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-accel-qtest 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-accel-tcg-x86 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-audio-spice 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-block-curl 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-block-iscsi 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-block-rbd 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-block-ssh 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-chardev-baum 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-chardev-spice 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-hw-display-qxl 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-hw-display-virtio-gpu 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-hw-display-virtio-gpu-pci 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-hw-display-virtio-vga 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-hw-s390x-virtio-gpu-ccw 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-hw-usb-host 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-hw-usb-redirect 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-hw-usb-smartcard 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-ipxe 1.0.0+-150400.37.14.2 -> 1.0.0+-150400.37.17.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-ivshmem-tools 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-ksm 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-kvm 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-microvm 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-seabios 1.15.0_0_g2dd4b9b-150400.37.14.2 -> 1.15.0_0_g2dd4b9b-150400.37.17.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-sgabios 8-150400.37.14.2 -> 8-150400.37.17.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-skiboot 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-tools 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-ui-curses 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-ui-gtk 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-ui-opengl 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-ui-spice-app 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-ui-spice-core 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-vgabios 1.15.0_0_g2dd4b9b-150400.37.14.2 -> 1.15.0_0_g2dd4b9b-150400.37.17.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
qemu-x86 6.2.0-150400.37.14.2 -> 6.2.0-150400.37.17.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
ruby-solv 0.7.24-150400.3.6.4 -> 0.7.24-150400.3.8.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
salt 3004-150400.8.25.1 -> 3006.0-150400.8.34.2 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
salt-bash-completion 3004-150400.8.25.1 -> 3006.0-150400.8.34.2 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
salt-minion 3004-150400.8.25.1 -> 3006.0-150400.8.34.2 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
systemd-rpm-macros 12-150000.7.30.1 -> 13-150000.7.33.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
vim 9.0.1443-150000.5.43.1 -> 9.0.1572-150000.5.46.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
vim-data 9.0.1443-150000.5.43.1 -> 9.0.1572-150000.5.46.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
vim-data-common 9.0.1443-150000.5.43.1 -> 9.0.1572-150000.5.46.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
yast2-network 4.4.56-150400.3.18.1 -> 4.4.57-150400.3.21.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
yast2-pkg-bindings 4.4.5-150400.3.3.1 -> 4.4.6-150400.3.6.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
The following 2 NEW packages are going to be installed:
python3-jmespath 0.9.3-150000.3.3.4 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-looseversion 1.0.2-150100.3.3.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
from that list what I suspect could be one or multiple of:
libopenssl1_0_0 1.0.2p-150000.3.76.1 -> 1.0.2p-150000.3.79.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libopenssl1_0_0-32bit 1.0.2p-150000.3.76.1 -> 1.0.2p-150000.3.79.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libpython3_6m1_0 3.6.15-150300.10.45.1 -> 3.6.15-150300.10.48.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libquadmath0 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libsolv-tools 0.7.24-150400.3.6.4 -> 0.7.24-150400.3.8.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libstdc++6 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libstdc++6-32bit 12.2.1+git416-150000.1.7.1 -> 12.3.0+git1204-150000.1.10.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libyui16 4.3.3-150400.1.5 -> 4.3.7-150400.3.3.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libyui-ncurses16 4.3.3-150400.1.5 -> 4.3.7-150400.3.3.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libyui-ncurses-pkg16 4.3.3-150400.1.8 -> 4.3.7-150400.3.3.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libzck1 1.1.16-150400.3.2.1 -> 1.1.16-150400.3.4.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
libzypp 17.31.11-150400.3.25.2 -> 17.31.13-150400.3.32.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3 3.6.15-150300.10.45.1 -> 3.6.15-150300.10.48.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-base 3.6.15-150300.10.45.1 -> 3.6.15-150300.10.48.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-dbm 3.6.15-150300.10.45.1 -> 3.6.15-150300.10.48.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-ply 3.10-1.27 -> 3.10-150000.3.3.4 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-pyzmq 17.1.2-3.3.1 -> 17.1.2-150000.3.5.2 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-salt 3004-150400.8.25.1 -> 3006.0-150400.8.34.2 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-simplejson 3.17.2-1.10 -> 3.17.2-150300.3.2.3 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-solv 0.7.24-150400.3.6.4 -> 0.7.24-150400.3.8.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-tk 3.6.15-150300.10.45.1 -> 3.6.15-150300.10.48.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python-solv 0.7.24-150400.3.6.4 -> 0.7.24-150400.3.8.1 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC
salt 3004-150400.8.25.1 -> 3006.0-150400.8.34.2 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
salt-minion 3004-150400.8.25.1 -> 3006.0-150400.8.34.2 x86_64 Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC
python3-jmespath 0.9.3-150000.3.3.4 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
python3-looseversion 1.0.2-150100.3.3.1 noarch Update repository with updates from SUSE Linux Enterprise 15 SUSE LLC <https://www.suse.com/>
so with the three failing machines trying selective downgrades trying to fix:
- worker2:
zypper install --force salt=3004-150400.8.25.1 salt-minion=3004-150400.8.25.1 salt-bash-completion=3004-150400.8.25.1 python3-salt=3004-150400.8.25.1
openqaworker-arm-2+3 are actually downgraded but still failed. That is an indication against my hypothesis though. On arm2 I found that I am on snapshot 2698, that's post on 2023-06-13. I am trying with pre from that day, that is snapshot 2697. Same happened on arm3, going back to 2425, pre of 2023-06-13.
I don't think we should need cups on workers so I removed it with deps on openqaworker-arm-3 and subsequently on all salt controlled machines.
EDIT: My experiments caused #131447 so I shouldn't run salt jobs quite that often ;) Now I am running
systemctl start salt-master && for i in {1..7200}; do echo "### Run $i -- $(date -Is)" && salt --no-color \* test.ping ; df -i / ; salt-run jobs.list_jobs | wc -l && sleep 60; done | tee -a log_salt_test_ping_poo131249_$(date -Is).log
Updated by nicksinger over 1 year ago
your continuous "test.ping" on OSD causes the salt job history to grow very quickly and exhausting our inodes on that machine. We stopped the master for now to mitigate the issue over the lunch period.
Updated by kraih over 1 year ago
- Related to action #131447: Some jobs incomplete due to auto_review:"api failure: 400.*/tmp/.*png.*No space left on device.*Utils.pm line 285":retry but enough space visible on machines added
Updated by okurz over 1 year ago
I am running a slightly adapted experiment due to #131249-24
for i in {1..7200}; do echo "### Run $i -- $(date -Is)" && salt --no-color \* test.ping ; df -i / ; salt-run jobs.list_jobs | wc -l && salt --no-color \* saltutil.kill_all_jobs && sleep 60 && rm -rf /var/cache/salt/master/jobs/*; done | tee -a log_salt_test_ping_poo131249_$(date -Is).log
so far since downgrading all affected machines I could not reproduce the error which might also be due to my reproduction attempt. I will let this run over the night.
Updated by okurz over 1 year ago
- Description updated (diff)
Good news everyone! w2 became unresponsive, others are still ok. I will check with zypper dup --dry-run --details
. w2 has salt-3006.0-150400.8.34.2.x86_64, w5 downgraded has salt-3004-150400.8.25.1.x86_64.
changelog diff:
* Mon Jun 19 2023 pablo.suarezhernandez@suse.com
- Make master_tops compatible with Salt 3000 and older minions (bsc#1212516) (bsc#1212517)
- Added:
* make-master_tops-compatible-with-salt-3000-and-older.patch
* Mon May 29 2023 yeray.gutierrez@suse.com
- Avoid failures due transactional_update module not available in Salt 3006.0 (bsc#1211754)
- Added:
* define-__virtualname__-for-transactional_update-modu.patch
* Wed May 24 2023 pablo.suarezhernandez@suse.com
- Avoid conflicts with Salt dependencies versions (bsc#1211612)
- Added:
* avoid-conflicts-with-dependencies-versions-bsc-12116.patch
* Fri May 05 2023 alexander.graul@suse.com
- Update to Salt release version 3006.0 (jsc#PED-4360)
* See release notes: https://docs.saltproject.io/en/latest/topics/releases/3006.0.html
- Add missing patch after rebase to fix collections Mapping issues
- Add python3-looseversion as new dependency for salt
- Add python3-packaging as new dependency for salt
- Allow entrypoint compatibility for "importlib-metadata>=5.0.0" (bsc#1207071)
- Create new salt-tests subpackage containing Salt tests
- Drop conflictive patch dicarded from upstream
- Fix SLS rendering error when Jinja macros are used
- Fix version detection and avoid building and testing failures
- Prevent deadlocks in salt-ssh executions
- Require python3-jmespath runtime dependency (bsc#1209233)
- Added:
* 3005.1-implement-zypper-removeptf-573.patch
* control-the-collection-of-lvm-grains-via-config.patch
* fix-version-detection-and-avoid-building-and-testing.patch
* make-sure-the-file-client-is-destroyed-upon-used.patch
* skip-package-names-without-colon-bsc-1208691-578.patch
* use-rlock-to-avoid-deadlocks-in-salt-ssh.patch
- Modified:
* activate-all-beacons-sources-config-pillar-grains.patch
* add-custom-suse-capabilities-as-grains.patch
* add-environment-variable-to-know-if-yum-is-invoked-f.patch
* add-migrated-state-and-gpg-key-management-functions-.patch
* add-publish_batch-to-clearfuncs-exposed-methods.patch
* add-salt-ssh-support-with-venv-salt-minion-3004-493.patch
* add-sleep-on-exception-handling-on-minion-connection.patch
* add-standalone-configuration-file-for-enabling-packa.patch
* add-support-for-gpgautoimport-539.patch
* allow-vendor-change-option-with-zypper.patch
* async-batch-implementation.patch
* avoid-excessive-syslogging-by-watchdog-cronjob-58.patch
* bsc-1176024-fix-file-directory-user-and-group-owners.patch
* change-the-delimeters-to-prevent-possible-tracebacks.patch
* debian-info_installed-compatibility-50453.patch
* dnfnotify-pkgset-plugin-implementation-3002.2-450.patch
* do-not-load-pip-state-if-there-is-no-3rd-party-depen.patch
* don-t-use-shell-sbin-nologin-in-requisites.patch
* drop-serial-from-event.unpack-in-cli.batch_async.patch
* early-feature-support-config.patch
* enable-passing-a-unix_socket-for-mysql-returners-bsc.patch
* enhance-openscap-module-add-xccdf_eval-call-386.patch
* fix-bsc-1065792.patch
* fix-for-suse-expanded-support-detection.patch
* fix-issue-2068-test.patch
* fix-missing-minion-returns-in-batch-mode-360.patch
* fix-ownership-of-salt-thin-directory-when-using-the-.patch
* fix-regression-with-depending-client.ssh-on-psutil-b.patch
* fix-salt-ssh-opts-poisoning-bsc-1197637-3004-501.patch
* fix-salt.utils.stringutils.to_str-calls-to-make-it-w.patch
* fix-the-regression-for-yumnotify-plugin-456.patch
* fix-traceback.print_exc-calls-for-test_pip_state-432.patch
* fixes-for-python-3.10-502.patch
* include-aliases-in-the-fqdns-grains.patch
* info_installed-works-without-status-attr-now.patch
* let-salt-ssh-use-platform-python-binary-in-rhel8-191.patch
* make-aptpkg.list_repos-compatible-on-enabled-disable.patch
* make-setup.py-script-to-not-require-setuptools-9.1.patch
* pass-the-context-to-pillar-ext-modules.patch
* prevent-affection-of-ssh.opts-with-lazyloader-bsc-11.patch
* prevent-pkg-plugins-errors-on-missing-cookie-path-bs.patch
* prevent-shell-injection-via-pre_flight_script_args-4.patch
* read-repo-info-without-using-interpolation-bsc-11356.patch
* restore-default-behaviour-of-pkg-list-return.patch
* return-the-expected-powerpc-os-arch-bsc-1117995.patch
* revert-fixing-a-use-case-when-multiple-inotify-beaco.patch
* run-salt-api-as-user-salt-bsc-1064520.patch
* run-salt-master-as-dedicated-salt-user.patch
* save-log-to-logfile-with-docker.build.patch
* switch-firewalld-state-to-use-change_interface.patch
* temporary-fix-extend-the-whitelist-of-allowed-comman.patch
* update-target-fix-for-salt-ssh-to-process-targets-li.patch
* use-adler32-algorithm-to-compute-string-checksums.patch
* use-salt-bundle-in-dockermod.patch
* x509-fixes-111.patch
* zypperpkg-ignore-retcode-104-for-search-bsc-1176697-.patch
- Removed:
* 3003.3-do-not-consider-skipped-targets-as-failed-for.patch
* 3003.3-postgresql-json-support-in-pillar-423.patch
* add-amazon-ec2-detection-for-virtual-grains-bsc-1195.patch
* add-missing-ansible-module-functions-to-whitelist-in.patch
* add-rpm_vercmp-python-library-for-version-comparison.patch
* add-support-for-name-pkgs-and-diff_attr-parameters-t.patch
* adds-explicit-type-cast-for-port.patch
* align-amazon-ec2-nitro-grains-with-upstream-pr-bsc-1.patch
* backport-syndic-auth-fixes.patch
* batch.py-avoid-exception-when-minion-does-not-respon.patch
* check-if-dpkgnotify-is-executable-bsc-1186674-376.patch
* clarify-pkg.installed-pkg_verify-documentation.patch
* detect-module.run-syntax.patch
* do-not-crash-when-unexpected-cmd-output-at-listing-p.patch
* enhance-logging-when-inotify-beacon-is-missing-pyino.patch
* fix-62092-catch-zmq.error.zmqerror-to-set-hwm-for-zm.patch
* fix-crash-when-calling-manage.not_alive-runners.patch
* fixes-pkg.version_cmp-on-openeuler-systems-and-a-few.patch
* fix-exception-in-yumpkg.remove-for-not-installed-pac.patch
* fix-for-cve-2022-22967-bsc-1200566.patch
* fix-inspector-module-export-function-bsc-1097531-481.patch
* fix-ip6_interface-grain-to-not-leak-secondary-ipv4-a.patch
* fix-issues-with-salt-ssh-s-extra-filerefs.patch
* fix-jinja2-contextfuntion-base-on-version-bsc-119874.patch
* fix-multiple-security-issues-bsc-1197417.patch
* fix-salt-call-event.send-call-with-grains-and-pillar.patch
* fix-salt.states.file.managed-for-follow_symlinks-tru.patch
* fix-state.apply-in-test-mode-with-file-state-module-.patch
* fix-test_ipc-unit-tests.patch
* fix-the-regression-in-schedule-module-releasded-in-3.patch
* fix-wrong-test_mod_del_repo_multiline_values-test-af.patch
* fixes-56144-to-enable-hotadd-profile-support.patch
* fopen-workaround-bad-buffering-for-binary-mode-563.patch
* force-zyppnotify-to-prefer-packages.db-than-packages.patch
* ignore-erros-on-reading-license-files-with-dpkg_lowp.patch
* ignore-extend-declarations-from-excluded-sls-files.patch
* ignore-non-utf8-characters-while-reading-files-with-.patch
* implementation-of-held-unheld-functions-for-state-pk.patch
* implementation-of-suse_ip-execution-module-bsc-10999.patch
* improvements-on-ansiblegate-module-354.patch
* include-stdout-in-error-message-for-zypperpkg-559.patch
* make-pass-renderer-configurable-other-fixes-532.patch
* make-sure-saltcacheloader-use-correct-fileclient-519.patch
* mock-ip_addrs-in-utils-minions.py-unit-test-443.patch
* normalize-package-names-once-with-pkg.installed-remo.patch
* notify-beacon-for-debian-ubuntu-systems-347.patch
* refactor-and-improvements-for-transactional-updates-.patch
* retry-if-rpm-lock-is-temporarily-unavailable-547.patch
* set-default-target-for-pip-from-venv_pip_target-envi.patch
* state.apply-don-t-check-for-cached-pillar-errors.patch
* state.orchestrate_single-does-not-pass-pillar-none-4.patch
* support-transactional-systems-microos.patch
* wipe-notify_socket-from-env-in-cmdmod-bsc-1193357-30.patch
zypper se --details --match-exact salt
shows me that there is no intermediate version available. I am applying the mitigation on w2 by restarting the salt-minion but on worker5 I am upgrading salt as well trying to break it.
On worker5 called sudo zypper al --comment "poo#131249 - potential salt regression, unresponsive salt-minion" salt salt-minion salt-bash-completion python3-salt
and upgrading with sudo zypper dup --details
.
salt --no-color -L 'worker3.oqa.suse.de,worker5.oqa.suse.de,openqaworker-arm-2.suse.de,openqaworker-arm-3.suse.de,grenache-1.qa.suse.de' cmd.run 'zypper al --comment "poo#131249 - potential salt regression, unresponsive salt-minion" salt salt-minion salt-bash-completion python3-salt && zypper -n dup --download-only && zypper -n dup'
and then for all affected machines
salt --no-color -L 'worker2.oqa.suse.de,worker3.oqa.suse.de,worker5.oqa.suse.de,openqaworker-arm-2.suse.de,openqaworker-arm-3.suse.de,grenache-1.qa.suse.de' cmd.run 'rm /etc/systemd/system/auto-update.service.d/override.conf && rm -f /etc/systemd/system/auto-update.*.disabled_poo131249 && systemctl daemon-reload && systemctl enable --now auto-update.timer && systemctl start auto-update'
With salt --no-color \* cmd.run 'zypper --no-refresh -n dup --dry-run'
I am now checking how is the general state of updates. Seems some machines have problems that need manual fixing which I will also try to do.
I enabled osd-deployment again and triggered a pipeline, monitoring https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/715992
I found that many updates are not installed. Checking the output of the auto-update and auto-upgrade services I found that we try to run both nightly at the same time so one is always aborting due to zypper already running. Better to separate them:
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/898
EDIT: Deployment succeeded: https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1659740
Updated by okurz over 1 year ago
- Copied to action #131540: openqa-piworker fails to upgrade many packages. vendor change is not enabled as our salt states so far only do that for openQA machines, not generic machines size:M added
Updated by okurz over 1 year ago
- Related to action #107932: Handling broken RPM databases does not handle certain cases added
Updated by okurz over 1 year ago
- Copied to action #131543: We have machines with both auto-update&auto-upgrade deployed, we should have only one at a time size:M added
Updated by okurz over 1 year ago
- Description updated (diff)
- Status changed from In Progress to Feedback
I reported the issue as
https://bugzilla.opensuse.org/show_bug.cgi?id=1212816
for now
And on worker2 I applied
zypper -n in --oldpackage --allow-downgrade salt=3004-150400.8.25.1 && zypper al --comment "poo#131249 - potential salt regression, unresponsive salt-minion" salt salt-minion salt-bash-completion python3-salt
On salt --no-color --state-output=changes \* state.apply
was clean now.
I will not enable alerts right now as we have quite some firing already so my plan is to check again explicitly the next days.
Updated by mkittler over 1 year ago
After setting up sapworker3 the problem is now also reproducible on that host, see #128528#note-22. That means Leap 15.5 is also affected. The problem really looks like it is the same. It is also weird that sapworker1 and 2 are not affected while the problem could be reproduced on sapworker3 a few times in a relatively short timeframe. All of those 3 workers have the same software and the hardware also seems very similar.
As mentioned in #128528#note-22, restarting the minion helped. However, the stopping of the stuck instance really took a while and it looks like is was not stopped cleanly:
martchus@sapworker3:~> sudo systemctl status salt-minion
● salt-minion.service - The Salt Minion
Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled)
Active: deactivating (stop-sigterm) since Thu 2023-06-29 14:45:07 CEST; 1min 28s ago
Main PID: 3561 (salt-minion)
Tasks: 6 (limit: 19660)
CGroup: /system.slice/salt-minion.service
├─ 3561 /usr/bin/python3 /usr/bin/salt-minion
└─ 3696 /usr/bin/python3 /usr/bin/salt-minion
Jun 29 14:45:07 sapworker3 salt-minion[3696]: The Salt Minion is shutdown. Minion received a SIGTERM. Exited.
Jun 29 14:45:07 sapworker3 salt-minion[3696]: The minion failed to return the job information for job req. This is often due to the master being shut down or overloaded. If the master is running, consider increasing the worker_threads value.
Jun 29 14:45:07 sapworker3 salt-minion[3696]: Future <salt.ext.tornado.concurrent.Future object at 0x7f36ad91ebe0> exception was never retrieved: Traceback (most recent call last):
Jun 29 14:45:07 sapworker3 salt-minion[3696]: File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 309, in wrapper
Jun 29 14:45:07 sapworker3 salt-minion[3696]: yielded = next(result)
Jun 29 14:45:07 sapworker3 salt-minion[3696]: File "/usr/lib/python3.6/site-packages/salt/minion.py", line 2927, in handle_event
Jun 29 14:45:07 sapworker3 salt-minion[3696]: self._return_pub(data, ret_cmd="_return", sync=False)
Jun 29 14:45:07 sapworker3 salt-minion[3696]: File "/usr/lib/python3.6/site-packages/salt/minion.py", line 2267, in _return_pub
Jun 29 14:45:07 sapworker3 salt-minion[3696]: log.trace("ret_val = %s", ret_val) # pylint: disable=no-member
Jun 29 14:45:07 sapworker3 salt-minion[3696]: UnboundLocalError: local variable 'ret_val' referenced before assignment
Note that the salt master definitely was able to ping other minions at the time so I don't think it was generally overloaded.
I'll keep sapworker3 running for now as an additional machine to reproduce the issue. Right now all of these machines look good, though:
martchus@openqa:~> sudo salt -C 'G@nodename:sapworker1 or G@nodename:sapworker2 or G@nodename:sapworker3' -l error --state-output=changes test.ping
sapworker1.qe.nue2.suse.org:
True
sapworker3.qe.nue2.suse.org:
True
sapworker2.qe.nue2.suse.org:
True
(In fact, right now all machines are pingable via salt.)
Updated by jbaier_cz over 1 year ago
It seems, that it is not limited to the already mentioned workers:
sapworker2.qe.nue2.suse.org:
Minion did not return. [Not connected]
openqaworker18.qa.suse.cz:
Minion did not return. [Not connected]
worker8.oqa.suse.de:
Minion did not return. [Not connected]
openqaworker17.qa.suse.cz:
Minion did not return. [Not connected]
worker9.oqa.suse.de:
Minion did not return. [Not connected]
openqaworker16.qa.suse.cz:
Minion did not return. [Not connected]
sapworker3.qe.nue2.suse.org:
Minion did not return. [Not connected]
worker3.oqa.suse.de:
Minion did not return. [Not connected]
On one of the worker:
openqaworker16:~> ps ax | grep salt
22469 ? Ss 0:00 /usr/bin/python3 /usr/bin/salt-minion
22978 ? Sl 0:05 /usr/bin/python3 /usr/bin/salt-minion
39136 ? Z 0:14 [salt-minion] <defunct>
61089 pts/0 S+ 0:00 grep --color=auto salt
Updated by okurz over 1 year ago
jbaier_cz wrote:
It seems, that it is not limited to the already mentioned workers:
sapworker2.qe.nue2.suse.org: Minion did not return. [Not connected] openqaworker18.qa.suse.cz: Minion did not return. [Not connected] worker8.oqa.suse.de: Minion did not return. [Not connected] openqaworker17.qa.suse.cz: Minion did not return. [Not connected] worker9.oqa.suse.de: Minion did not return. [Not connected] openqaworker16.qa.suse.cz: Minion did not return. [Not connected] sapworker3.qe.nue2.suse.org: Minion did not return. [Not connected] worker3.oqa.suse.de: Minion did not return. [Not connected]
On one of the worker:
openqaworker16:~> ps ax | grep salt 22469 ? Ss 0:00 /usr/bin/python3 /usr/bin/salt-minion 22978 ? Sl 0:05 /usr/bin/python3 /usr/bin/salt-minion 39136 ? Z 0:14 [salt-minion] <defunct> 61089 pts/0 S+ 0:00 grep --color=auto salt
ok, so the process list excerpt looks like it's about the same problem. However so far I would have considered only nodes with "No response" to suffer from the same issue, not "Not connected" which can happen if a host is down or deliberately disabled.
Updated by okurz over 1 year ago
Ok, I applied the workaround as well now on the affected Leap 15.4 machines:
for i in openqaworker16.qa.suse.cz openqaworker17.qa.suse.cz openqaworker18.qa.suse.cz worker3.oqa.suse.de worker8.oqa.suse.de worker9.oqa.suse.de; do echo "### $i" && ssh $i 'sudo zypper -n in --oldpackage --allow-downgrade salt=3004-150400.8.25.1 && sudo zypper al --comment "poo#131249 - potential salt regression, unresponsive salt-minion" salt salt-minion salt-bash-completion python3-salt'; done
for Leap 15.5 we need to lookup the according 15.4 package in repos manually and force-install that. Found on http://download.opensuse.org/update/leap/15.4/sle/x86_64/?P=salt*
-> http://download.opensuse.org/update/leap/15.4/sle/x86_64/salt-3004-150400.8.25.1.x86_64.rpm
so on sapworker2 and sapworker3 I did:
sudo zypper -n in --oldpackage --allow-downgrade http://download.opensuse.org/update/leap/15.4/sle/x86_64/salt-3004-150400.8.25.1.x86_64.rpm http://download.opensuse.org/update/leap/15.4/sle/x86_64/salt-minion-3004-150400.8.25.1.x86_64.rpm http://download.opensuse.org/update/leap/15.4/sle/x86_64/python3-salt-3004-150400.8.25.1.x86_64.rpm
On worker3 the new salt package is installed despite the lock. Likely I made a mistake there. Removed lock, applied downgrade again and applied locks again.
Retriggered jobs in https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/719953
I found that https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1&from=now-1h&to=now shows multiple related alerts about "snapper-cleanup" failing on machines where we conducted a rollback. In #102942 we had snapper-cleanup failing but this was due to docker subvolumes blocking the delete. /usr/lib/snapper/systemd-helper --cleanup
says that it fails to delete a snapshot but does not state which one. https://www.opensuse-forum.de/thread/64330-snapper-cleanup-nach-rollback-nicht-mehr-m%C3%B6glich/ had an open question which I answered now but I don't expect receiving any help. How could we find out which snapshot the systemd-helper tries to delete? It turned out it's actually the very same problem as in #102942. I don't know why that problem did not show itself to me in /var/log/snapper.log when I looked earlier. On worker2 I now manually deleted the btrfs subvolumes that blocked the deletion with btrfs subvolume list -a / | grep containers
and btrfs subvolume delete /.snapshots/1/…containers…
. Maybe we need to have a script for an automatic recovery.
Updated by okurz over 1 year ago
- Related to action #102942: Failed systemd services alert: snapper-cleanup on QA-Power8-4-kvm fails size:M added
Updated by okurz over 1 year ago
- Due date deleted (
2023-07-07) - Status changed from Feedback to Blocked
- Priority changed from Urgent to Normal
I am using sudo btrfs subvolume delete $(sudo btrfs subvolume list / | sed -n 's/^.*path @\(.*containers.*\)/\1/p')
on all machines
sudo salt --no-color \* cmd.run "sudo btrfs subvolume delete \$(sudo btrfs subvolume list / | sed -n 's/^.*path @\(.*containers.*\)/\1/p') && sudo systemctl is-failed snapper-cleanup | grep -q failed && sudo systemctl restart snapper-cleanup"
Surely we could do that safer :)
But with this
https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1&from=now-1h&to=now is now not showing any more failed snapper-cleanup.
We have workarounds in place. We provided more information in both the snapper cleanup upstream report as well as in a salt regression bug. Waiting for anything to happen there.
Blocking on https://bugzilla.opensuse.org/show_bug.cgi?id=1212816
Updated by nicksinger over 1 year ago
applied your workaround/lock on openqaworker14.qa.suse.cz as well
Updated by okurz over 1 year ago
- Subject changed from [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt minion does not return size:M to [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:M
Updated by okurz over 1 year ago
- Related to action #132137: Setup new PRG2 openQA worker for osd size:M added
Updated by okurz about 1 year ago
- Related to action #134906: osd-deployment failed due to openqaworker1 showing "No response" in salt size:M added
Updated by okurz about 1 year ago
- Target version changed from Ready to Tools - Next
Updated by mkittler about 1 year ago
The workers worker-arm1 and worker-arm2 were stuck again:
martchus@openqa:~> sudo salt -C 'G@roles:worker' test.ping
…
worker-arm1.oqa.prg2.suse.org:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20230915145848949891
worker-arm2.oqa.prg2.suse.org:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20230915145848949891
They responded again after systemctl kill salt-minion
and systemctl restart salt-minion
(just restart didn't work, they were really stuck, according to strace in some futex lock).
Updated by okurz about 1 year ago
maybe salt-minion-3005 is also affected and we should really go back to 3004
Updated by okurz about 1 year ago
- Related to action #136325: salt deploy fails due to multiple offline workers in qe.nue2.suse.org+prg2.suse.org added
Updated by okurz about 1 year ago
- Status changed from Blocked to In Progress
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1865994#L185
sapworker2.qe.nue2.suse.org:
----------
ID: lock_salt-bash-completion_pkg
Function: cmd.run
Name: zypper rl salt-bash-completion; (zypper -n in --oldpackage --allow-downgrade 'salt-bash-completion<=3005' || zypper -n in --oldpackage --allow-downgrade 'salt-bash-completion<=3005.1') && zypper al -m 'poo#131249 - potential salt regression, unresponsive salt-minion' salt-bash-completion
Result: False
Comment: Command "zypper rl salt-bash-completion; (zypper -n in --oldpackage --allow-downgrade 'salt-bash-completion<=3005' || zypper -n in --oldpackage --allow-downgrade 'salt-bash-completion<=3005.1') && zypper al -m 'poo#131249 - potential salt regression, unresponsive salt-minion' salt-bash-completion" run
Started: 10:12:55.246902
Duration: 3047.714 ms
Changes:
----------
pid:
80582
retcode:
4
stderr:
No provider of 'salt-bash-completion<=3005' found.
stdout:
No lock has been removed.
Loading repository data...
Reading installed packages...
'salt-bash-completion<=3005' not found in package names. Trying capabilities.
Loading repository data...
Reading installed packages...
Resolving package dependencies...
Problem: the to be installed salt-bash-completion-3005.1-150500.2.13.noarch requires 'salt = 3005.1-150500.2.13', but this requirement cannot be provided
not installable providers: salt-3005.1-150500.2.13.x86_64[distribution/leap/$releasever/repo/oss]
Solution 1: Following actions will be done:
remove lock to allow installation of salt-3005.1-150500.2.13.x86_64[distribution/leap/$releasever/repo/oss]
remove lock to allow installation of python3-salt-3005.1-150500.2.13.x86_64[distribution/leap/$releasever/repo/oss]
remove lock to allow removal of salt-3004-150400.8.25.1.x86_64
remove lock to allow removal of python3-salt-3004-150400.8.25.1.x86_64
remove lock to allow removal of salt-minion-3004-150400.8.25.1.x86_64
Solution 2: do not install salt-bash-completion-3005.1-150500.2.13.noarch
Solution 3: break salt-bash-completion-3005.1-150500.2.13.noarch by ignoring some of its dependencies
Choose from above solutions by number or cancel [1/2/3/c/d/?] (c): c
Summary for sapworker2.qe.nue2.suse.org
--------------
Succeeded: 453 (changed=1)
Failed: 1
Updated by okurz about 1 year ago
- Status changed from In Progress to Blocked
I think I could solve that problem with a manual application of zypper al -m 'poo#131249 - potential salt regression, unresponsive salt-minion' salt-bash-completion
after ensuring that salt-bash-completion is actually not installed at all. Retriggered failed salt-pillars-openqa deploy job https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1870710
Updated by okurz about 1 year ago
- Status changed from Blocked to In Progress
Same problem on sapworker3 now: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1872212#L1163
Updated by okurz about 1 year ago
- Status changed from In Progress to Blocked
Fixed in same way, back to blocked on https://bugzilla.opensuse.org/show_bug.cgi?id=1212816
Updated by okurz about 1 year ago
- Status changed from Blocked to In Progress
- Target version changed from Tools - Next to Ready
https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1876545#L2899
that is sapworker1 showing problems with salt-3005. sapworker2+3 are fine with salt-3004.
Added a comment on https://bugzilla.opensuse.org/show_bug.cgi?id=1212816
We have observed that multiple machines running Leap 15.5 with salt-3005 show the same problem eventually of "No response". A forced install of the Leap 15.4 salt-3004 package on Leap 15.5 seems to work fine.
So following https://progress.opensuse.org/projects/openqav3/wiki/#Network-legacy-boot-via-PXE-and-OSworker-setup
I did
zypper -n rm salt-bash-completion
arch=$(uname -m)
sudo zypper -n in --oldpackage --allow-downgrade http://download.opensuse.org/update/leap/15.4/sle/$arch/salt-3004-150400.8.25.1.$arch.rpm http://download.opensuse.org/update/leap/15.4/sle/$arch/salt-minion-3004-150400.8.25.1.$arch.rpm http://download.opensuse.org/update/leap/15.4/sle/$arch/python3-salt-3004-150400.8.25.1.$arch.rpm && sudo zypper al --comment "poo#131249 - potential salt regression, unresponsive salt-minion" salt salt-minion salt-bash-completion python3-salt
sudo salt --no-color 'sapworker*' cmd.run 'rpm -qa | grep -i salt'
looks better now:
sapworker1.qe.nue2.suse.org:
salt-3004-150400.8.25.1.x86_64
python3-salt-3004-150400.8.25.1.x86_64
salt-minion-3004-150400.8.25.1.x86_64
sapworker2.qe.nue2.suse.org:
salt-3004-150400.8.25.1.x86_64
salt-minion-3004-150400.8.25.1.x86_64
python3-salt-3004-150400.8.25.1.x86_64
sapworker3.qe.nue2.suse.org:
salt-3004-150400.8.25.1.x86_64
salt-minion-3004-150400.8.25.1.x86_64
python3-salt-3004-150400.8.25.1.x86_64
retriggered https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1876780
Updated by okurz about 1 year ago
- Status changed from In Progress to Blocked
https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/826506 passed deployment and is in the monitoring step. Back to blocked on https://bugzilla.opensuse.org/show_bug.cgi?id=1212816
Updated by okurz about 1 year ago
- Status changed from Blocked to Feedback
https://bugzilla.suse.com/show_bug.cgi?id=1212816#c6 suggests to try
3006.0-150400.8.44.1
sudo salt --no-color '*' cmd.run 'zypper --no-refresh se --details salt-minion | grep -q 8.44 && zypper rl salt salt-minion salt-bash-completion python3-salt && zypper -n in salt salt-minion python3-salt'
current version installed on all machines sudo salt --no-color --out txt '*' cmd.run 'rpm -q salt-minion' queue=True | sort
backup-qam.qe.nue2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
backup.qa.suse.de: salt-minion-3006.0-150400.8.44.1.x86_64
baremetal-support.qa.suse.de: salt-minion-3006.0-150400.8.44.1.x86_64
diesel.qe.nue2.suse.org: salt-minion-3006.0-150400.8.44.1.ppc64le
imagetester.qe.nue2.suse.org: salt-minion-3005.1-150500.2.13.x86_64
jenkins.qa.suse.de: salt-minion-3006.0-150400.8.44.1.x86_64
openqa-monitor.qa.suse.de: salt-minion-3006.0-150400.8.44.1.x86_64
openqa-piworker.qa.suse.de: salt-minion-3005.1-150500.2.13.aarch64
openqa.suse.de: salt-minion-3006.0-150400.8.44.1.x86_64
openqaw5-xen.qa.suse.de: salt-minion-3006.0-150400.8.44.1.x86_64
openqaworker14.qa.suse.cz: salt-minion-3006.0-150400.8.44.1.x86_64
openqaworker16.qa.suse.cz: salt-minion-3006.0-150400.8.44.1.x86_64
openqaworker17.qa.suse.cz: salt-minion-3006.0-150400.8.44.1.x86_64
openqaworker18.qa.suse.cz: salt-minion-3006.0-150400.8.44.1.x86_64
openqaworker1.qe.nue2.suse.org: salt-minion-3006.0-150400.8.44.1.x86_64
openqaworker-arm-2.suse.de: salt-minion-3006.0-150400.8.44.1.aarch64
openqaworker-arm-3.suse.de: salt-minion-3006.0-150400.8.44.1.aarch64
petrol.qe.nue2.suse.org: salt-minion-3006.0-150400.8.44.1.ppc64le
powerqaworker-qam-1.qa.suse.de: salt-minion-3006.0-150400.8.44.1.ppc64le
qamasternue.qa.suse.de: salt-minion-3006.0-150400.8.44.1.x86_64
qesapworker-prg4.qa.suse.cz: salt-minion-3004-150400.8.25.1.x86_64
qesapworker-prg5.qa.suse.cz: salt-minion-3004-150400.8.25.1.x86_64
qesapworker-prg6.qa.suse.cz: salt-minion-3004-150400.8.25.1.x86_64
qesapworker-prg7.qa.suse.cz: salt-minion-3004-150400.8.25.1.x86_64
sapworker1.qe.nue2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
sapworker2.qe.nue2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
sapworker3.qe.nue2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
schort-server.qa.suse.de: salt-minion-3006.0-150400.8.44.1.x86_64
storage.oqa.suse.de: salt-minion-3006.0-150400.8.44.1.x86_64
tumblesle.qa.suse.de: salt-minion-3006.0-150400.8.44.1.x86_64
worker29.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
worker2.oqa.suse.de: salt-minion-3006.0-150400.8.44.1.x86_64
worker30.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
worker31.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
worker32.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
worker33.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
worker34.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
worker35.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
worker36.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
worker37.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
worker38.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
worker39.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
worker40.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.x86_64
worker-arm1.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.aarch64
worker-arm2.oqa.prg2.suse.org: salt-minion-3004-150400.8.25.1.aarch64
Updated by okurz about 1 year ago
Downgraded imagetester as it had 3005 and was showing "No response". sudo salt --no-color \* test.ping
good again
Updated by okurz about 1 year ago
I have been running salt-minion on multiple hosts since more than a week now with a fixed version as mentioned in https://bugzilla.opensuse.org/show_bug.cgi?id=1212816 and no further problems were observed so we can remove this workaround again:
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1015
I will handle the removal of locks and upgrades manually.
Updated by okurz about 1 year ago
MR merged
sudo salt --state-output=changes -C \* cmd.run 'zypper rl salt salt-minion salt-bash-completion && zypper rl -t patch openSUSE-SLE-15.4-2023-2571 openSUSE-SLE-15.4-2023-3145 openSUSE-SLE-15.4-2023-3863 && zypper -n in salt-minion' | grep -av 'Result: Clean'
From today:
openqa:~ # sudo salt --state-output=changes -C \* cmd.run 'rpm -q salt-minion' | grep -av 'Result: Clean'
s390zl13.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.s390x
s390zl12.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.s390x
worker36.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
worker35.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
worker33.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
worker39.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
worker34.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
worker38.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
worker32.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
worker40.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
worker31.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
worker37.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
backup-qam.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
worker29.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
worker30.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
sapworker3.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
worker-arm1.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.aarch64
worker-arm2.oqa.prg2.suse.org:
salt-minion-3006.0-150500.4.19.1.aarch64
sapworker1.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
sapworker2.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
openqaworker16.qa.suse.cz:
salt-minion-3006.0-150400.8.44.1.x86_64
openqaworker17.qa.suse.cz:
salt-minion-3006.0-150400.8.44.1.x86_64
openqaworker18.qa.suse.cz:
salt-minion-3006.0-150400.8.44.1.x86_64
openqaworker1.qe.nue2.suse.org:
salt-minion-3006.0-150400.8.44.1.x86_64
qesapworker-prg7.qa.suse.cz:
salt-minion-3006.0-150500.4.19.1.x86_64
qesapworker-prg5.qa.suse.cz:
salt-minion-3006.0-150500.4.19.1.x86_64
qesapworker-prg4.qa.suse.cz:
salt-minion-3006.0-150500.4.19.1.x86_64
qesapworker-prg6.qa.suse.cz:
salt-minion-3006.0-150500.4.19.1.x86_64
openqa.suse.de:
salt-minion-3006.0-150400.8.44.1.x86_64
qamaster.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
openqaw5-xen.qa.suse.de:
salt-minion-3006.0-150500.4.19.1.x86_64
openqaworker14.qa.suse.cz:
salt-minion-3006.0-150400.8.44.1.x86_64
petrol.qe.nue2.suse.org:
salt-minion-3006.0-150400.8.44.1.ppc64le
imagetester.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
monitor.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
jenkins.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
backup-vm.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
baremetal-support.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
diesel.qe.nue2.suse.org:
salt-minion-3006.0-150400.8.44.1.ppc64le
openqa-piworker.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.aarch64
tumblesle.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
schort-server.qe.nue2.suse.org:
salt-minion-3006.0-150500.4.19.1.x86_64
all systems seem to have an up-to-date salt-minion and are responsive. No related alerts. Checking rollback steps and ACs.
Updated by okurz about 1 year ago
- Description updated (diff)
- Status changed from Feedback to Resolved
All rollback steps and ACs fulfilled as well, done here
Updated by okurz 12 months ago
- Related to action #150965: At least diesel+petrol+mania fail to auto-update due to kernel locks preventing patches size:M added