Project

General

Profile

Actions

action #138650

closed

partition usage panels show a long list of undefined and no reasonable graphs at least for some generic machines size:M

Added by okurz 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2023-10-27
Due date:
% Done:

0%

Estimated time:

Description

Observation

Multiple generic machines in the partition usage panels show a long list of undefined and no reasonable graphs at least for some generic machines like https://monitor.qa.suse.de/d/GDada/dashboard-for-ada?orgId=1&refresh=1m&viewPanel=65090 and https://monitor.qa.suse.de/d/GDunreal6/dashboard-for-unreal6?orgId=1&from=1698210981717&to=1698391062616&viewPanel=65090 and
https://monitor.qa.suse.de/d/GDbackup-qam/dashboard-for-backup-qam?orgId=1&refresh=1m&viewPanel=65090

Expected result

https://monitor.qa.suse.de/d/GDopenqaworker1/dashboard-for-openqaworker1?orgId=1&refresh=1m&viewPanel=65090 looks fine, showing names and proper graphs of storage usage.

Steps to reproduce

See https://monitor.qa.suse.de/d/GDada/dashboard-for-ada?orgId=1&refresh=1m&viewPanel=65090

Acceptance criteria

  • AC1: All salt controlled machines show reasonable graphs for partition usage

Suggestions

  • Lookup more panels to find out if really all generic machines are affected but openQA workers are fine. Also check other roles, e.g. openQA webUI
  • Take a look if we have differing queries for the partition usage in different files
  • Fix and harmonize

Files

partitions-panel.png (49.9 KB) partitions-panel.png tinita, 2023-11-28 13:30

Related issues 3 (0 open3 closed)

Related to openQA Infrastructure - action #138518: unreal6 partition usage alertResolvedokurz2023-10-25

Actions
Related to openQA Infrastructure - action #150887: [alert] [FIRING:1] s390zl12 (s390zl12: partitions usage (%) alert Generic partitions_usage_alert_s390zl12 generic), also s390zl13 size:MResolvedokurz2023-11-15

Actions
Related to openQA Infrastructure - action #151597: [alert] osiris-1 (osiris-1: partitions usage (%) alert Generic partitions_usage_alert_osiris-1 genericResolvedtinita2023-11-28

Actions
Actions #1

Updated by okurz 6 months ago

Actions #2

Updated by okurz 6 months ago

  • Related to action #150887: [alert] [FIRING:1] s390zl12 (s390zl12: partitions usage (%) alert Generic partitions_usage_alert_s390zl12 generic), also s390zl13 size:M added
Actions #3

Updated by okurz 5 months ago

  • Target version changed from Tools - Next to Ready
Actions #4

Updated by mkittler 5 months ago

  • Subject changed from partition usage panels show a long list of undefined and no reasonable graphs at least for some generic machines to partition usage panels show a long list of undefined and no reasonable graphs at least for some generic machines size:M
  • Status changed from New to Workable
Actions #5

Updated by tinita 5 months ago

  • Related to action #151597: [alert] osiris-1 (osiris-1: partitions usage (%) alert Generic partitions_usage_alert_osiris-1 generic added
Actions #6

Updated by tinita 5 months ago

The query for this graph is

SELECT non_negative_derivative(mean(reads),1s) as "read" FROM "diskio" WHERE "host" = 'osiris-1' AND $timeFilter GROUP BY time($interval), *

which is almost the same as in the Disk I/O requests panel:

SELECT non_negative_derivative(mean(read_bytes),1s) as "read" FROM "diskio" WHERE "host" =~ /$server$/ AND $timeFilter GROUP BY time($interval), *

For the storage usage we would need to select from disk instead.

Actions #7

Updated by okurz 5 months ago

  • Description updated (diff)
Actions #8

Updated by tinita 5 months ago

  • Status changed from Workable to In Progress
  • Assignee set to tinita
Actions #9

Updated by tinita 5 months ago

  • File partitions-panel.png added

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1057 Fix partitions usage panel for generic machines

Actions #10

Updated by tinita 5 months ago

  • File deleted (partitions-panel.png)
Actions #12

Updated by tinita 5 months ago

  • Status changed from In Progress to Feedback
Actions

Also available in: Atom PDF