action #128969
closed[alert][grafana] Failed systemd services alert (except openqa.suse.de) Salt (Uk02cifVkz)
0%
Description
Observation¶
multiple alert emails received 2023-05-09
https://stats.openqa-monitor.qa.suse.de/alerting/grafana/Uk02cifVkz/view?orgId=1
Updated by osukup over 1 year ago
looks like a broken package from security-sensor.repo ...
==> Suppress alert and wait for the newer working build?
Updated by livdywan over 1 year ago
osukup wrote:
looks like a broken package from security-sensor.repo ...
==> Suppress alert and wait for the newer working build?
There's a go stacktrace. I would guess we need to report this somewhere or we may not get a fix? Not sure where, though.
Updated by livdywan over 1 year ago
- Status changed from New to In Progress
- Assignee set to livdywan
Well, let's see if I can find out what to do here. Asking in Slack for now.
Btw since I was asked, here's how I double-checked that these are all Leap 15.4 machines: sudo salt -C '*' cmd.run 'grep PRETTY /etc/os-release'
Updated by openqa_review over 1 year ago
- Due date set to 2023-05-24
Setting due date based on mean cycle time of SUSE QE Tools
Updated by livdywan over 1 year ago
- Status changed from In Progress to Feedback
cdywan wrote:
Well, let's see if I can find out what to do here. Asking in Slack for now.
Btw since I was asked, here's how I double-checked that these are all Leap 15.4 machines:
sudo salt -C '*' cmd.run 'grep PRETTY /etc/os-release'
Apparently a fix for a regression in libbpfgo was applied, and after restarting the service it's looking to run fine again:
sudo salt -C '*' cmd.run 'systemctl restart velociraptor-client'
storage.oqa.suse.de:
[...]
sudo salt -C '*' cmd.run 'systemctl is-active velociraptor-client'
storage.oqa.suse.de:
active
[...]
Updated by okurz over 1 year ago
I followed the discussion. That's the benefit of having the main developer in the house :D
I suggest you crosscheck all velociraptor-client statuses on all OSD salt maintained machines, should be an easy salt exercise for you ;)
Updated by livdywan over 1 year ago
okurz wrote:
I suggest you crosscheck all velociraptor-client statuses on all OSD salt maintained machines, should be an easy salt exercise for you ;)
I did. See above. Or do you mean something not covered by cmd.run?
Updated by okurz over 1 year ago
oh, right. https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?viewPanel=6&orgId=1 is good again, resolve then?