Actions
action #170338
openNo monitoring data from OSD since 2024-11-25 1449Z size:M
Status:
In Progress
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-11-27
Due date:
2024-12-12 (Due in 8 days)
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
Acceptance criteria¶
- AC1: There is current monitoring data from OSD itself on monitor.qa.suse.de
- AC2: There is also monitoring data after reboots of monitor+OSD
Acceptance tests¶
- AT1-1: https://monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=now-7d&to=now&var-host_disks=$__all&refresh=15m&viewPanel=panel-78 shows current data
- AT2-1: Same as AT1-1 but reboot monitor in before
- AT2-2: Same as AT1-1 but reboot OSD in before
Suggestions¶
- Handle IPv4+IPv6 double routing problems after setting up wireguard tunnels disrupting also our monitoring
- Understand what approach to take for routing with VPN in place and consider both source and target hosts for communication
- Might need changes to multiple hosts
- Make changes persistent in salt
- Ensure reboot consistency
Rollback actions¶
- Remove alert silence from https://monitor.qa.suse.de/alerting/silences called
rule_uid=~host_up_alert.*
- Remove alert silence from https://monitor.qa.suse.de/alerting/silences called
alertname=Failed systemd services alert (except openqa.suse.de)
Actions