action #159186
closed[alert] Systemd-services alert failing due to unit "rsnapshot@alpha" on host "storage"
0%
Description
Observation¶
See https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1&from=1713364104162&to=1713367862661, not sure why this is coming up now. According to salt-key -L
the storage host is not even in salt anymore.
Looks like the number of failing alerts went down to zero again so it is probably not useful to pause the alert.
Rollback steps¶
- Re-enable and start timers and salt-minion on storage.qe.prg2.suse.org.
- Remove silence "alertname=Failed systemd services alert (except openqa.suse.de)" from https://stats.openqa-monitor.qa.suse.de/alerting/silences
Updated by nicksinger 13 days ago
- Related to action #153742: Move of OSD machine NUE1 to PRG2 - storage.qe.prg2.suse.org added
Updated by mkittler 7 days ago
Since progress was down I couldn't check what's currently being done about this. So I created https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1161 when I saw the alert again. It nevertheless looks like the storage host cannot connect to both o3 and OSD.
Updated by openqa_review 5 days ago
- Due date set to 2024-05-10
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 4 days ago
- Due date deleted (
2024-05-10) - Status changed from In Progress to Blocked
The manual run of rsnapshot@alpha ended successfully. Updating host entry in /etc/salt/minion_id and started again salt-minion. But apparently salt can not reach the salt master:
https://sd.suse.com/servicedesk/customer/portal/1/SD-155344
Updated by livdywan 4 days ago
Unfortunately it still doesn't look that successful on the failed systemd services alert:
2024-04-26 14:53:40 storage rsnapshot@alpha 1
Updated by okurz about 7 hours ago
I setup a silence, monitored, rsnapshot@alpha was fine again after the last service. Everything updated with salt. I realized that the host is still on Leap 15.4, so doing https://progress.opensuse.org/projects/openqav3/wiki/#Distribution-upgrades
Updated by okurz about 7 hours ago
- Due date deleted (
2024-05-13) - Status changed from In Progress to Resolved