action #138650
closedpartition usage panels show a long list of undefined and no reasonable graphs at least for some generic machines size:M
0%
Description
Observation¶
Multiple generic machines in the partition usage panels show a long list of undefined and no reasonable graphs at least for some generic machines like https://monitor.qa.suse.de/d/GDada/dashboard-for-ada?orgId=1&refresh=1m&viewPanel=65090 and https://monitor.qa.suse.de/d/GDunreal6/dashboard-for-unreal6?orgId=1&from=1698210981717&to=1698391062616&viewPanel=65090 and
https://monitor.qa.suse.de/d/GDbackup-qam/dashboard-for-backup-qam?orgId=1&refresh=1m&viewPanel=65090
Expected result¶
https://monitor.qa.suse.de/d/GDopenqaworker1/dashboard-for-openqaworker1?orgId=1&refresh=1m&viewPanel=65090 looks fine, showing names and proper graphs of storage usage.
Steps to reproduce¶
See https://monitor.qa.suse.de/d/GDada/dashboard-for-ada?orgId=1&refresh=1m&viewPanel=65090
Acceptance criteria¶
- AC1: All salt controlled machines show reasonable graphs for partition usage
Suggestions¶
- Lookup more panels to find out if really all generic machines are affected but openQA workers are fine. Also check other roles, e.g. openQA webUI
- Take a look if we have differing queries for the partition usage in different files
- Fix and harmonize
Files
Updated by okurz about 1 year ago
- Related to action #138518: unreal6 partition usage alert added
Updated by okurz about 1 year ago
- Related to action #150887: [alert] [FIRING:1] s390zl12 (s390zl12: partitions usage (%) alert Generic partitions_usage_alert_s390zl12 generic), also s390zl13 size:M added
Updated by okurz about 1 year ago
- Target version changed from Tools - Next to Ready
Updated by mkittler about 1 year ago
- Subject changed from partition usage panels show a long list of undefined and no reasonable graphs at least for some generic machines to partition usage panels show a long list of undefined and no reasonable graphs at least for some generic machines size:M
- Status changed from New to Workable
Updated by tinita about 1 year ago
- Related to action #151597: [alert] osiris-1 (osiris-1: partitions usage (%) alert Generic partitions_usage_alert_osiris-1 generic added
Updated by tinita about 1 year ago
The query for this graph is
SELECT non_negative_derivative(mean(reads),1s) as "read" FROM "diskio" WHERE "host" = 'osiris-1' AND $timeFilter GROUP BY time($interval), *
which is almost the same as in the Disk I/O requests
panel:
SELECT non_negative_derivative(mean(read_bytes),1s) as "read" FROM "diskio" WHERE "host" =~ /$server$/ AND $timeFilter GROUP BY time($interval), *
For the storage usage we would need to select from disk
instead.
Updated by tinita about 1 year ago
- Status changed from Workable to In Progress
- Assignee set to tinita
Updated by tinita about 1 year ago
- File partitions-panel.png added
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1057 Fix partitions usage panel for generic machines
Updated by tinita about 1 year ago
- File partitions-panel.png partitions-panel.png added
Updated by tinita about 1 year ago
- Status changed from Feedback to Resolved