action #151597
closed[alert] osiris-1 (osiris-1: partitions usage (%) alert Generic partitions_usage_alert_osiris-1 generic
0%
Description
Observation¶
Alert from Grafana:
1 firing alert instance
[IMAGE]
GROUPED BY
hostname=osiris-1
1 firing instances
Firing [stats.openqa-monitor.qa.suse.de]
osiris-1: partitions usage (%) alert
View alert [stats.openqa-monitor.qa.suse.de]
Values
A0=96.03554068410229
Labels
alertname
osiris-1: partitions usage (%) alert
grafana_folder
Generic
hostname
osiris-1
rule_uid
partitions_usage_alert_osiris-1
type
generic
Silence [stats.openqa-monitor.qa.suse.de]
View dashboard [stats.openqa-monitor.qa.suse.de]
View panel [stats.openqa-monitor.qa.suse.de]
Observed 32s before this notification was delivered, at 2023-11-28 11:49:00 +0100 CET
http://stats.openqa-monitor.qa.suse.de/alerting/grafana/partitions_usage_alert_osiris-1/view?orgId=1
http://stats.openqa-monitor.qa.suse.de/alerting/grafana/partitions_usage_alert_osiris-1/view?orgId=1
Updated by tinita about 1 year ago
- Related to action #138650: partition usage panels show a long list of undefined and no reasonable graphs at least for some generic machines size:M added
Updated by tinita about 1 year ago · Edited
- Status changed from New to In Progress
Since the alert doesn't really tell me which partition is problematic (see #138650) I had a look on osiris-1 and it's /var/lib/libvirt/images/dist.suse.de
Updated by tinita about 1 year ago
The mentioned partition is of the type nfs4
.
The alert is supposed to ignore nfs mounts, but checks for the exact string nfs
only.
I just changed it into a regex:
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1053 Ignore all nfs partitions
Updated by tinita about 1 year ago
It should be noted that the alert is about disk usage regarding size:
SELECT mean("used_percent") AS "used_percent" FROM "disk" WHERE ("host" = 'osiris-1' AND fstype !~ /^nfs/ AND fstype != 'udf') AND $timeFilter GROUP BY time($interval), "device", "fstype" fill(null)
while the linked panel is about diskio
:
SELECT non_negative_derivative(mean(reads),1s) as "read" FROM "diskio" WHERE "host" = 'osiris-1' AND $timeFilter GROUP BY time($interval), *
I think we should have two different panels.
Updated by okurz about 1 year ago
Updated by okurz about 1 year ago
The actual "alert query" still needs to be adjusted, e.g. see https://stats.openqa-monitor.qa.suse.de/alerting/grafana/partitions_usage_alert_osiris-1/view?returnTo=%2Falerting%2Flist%3Fsearch%3Dstate%3Afiring . See two occurences here https://gitlab.suse.de/search?search=fstype%20!%3D%20%27nfs%27%20AND&nav_source=navbar&project_id=743&group_id=39&search_code=true&repository_ref=master
Updated by tinita about 1 year ago
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1054 Ignore all nfs partitions in alerts