Project

General

Profile

Actions

action #175791

open

coordination #161414: [epic] Improved salt based infrastructure management

[alert] storage: partitions usage (%) alert size:S

Added by jbaier_cz 12 days ago. Updated 5 days ago.

Status:
Blocked
Priority:
High
Assignee:
Category:
Regressions/Crashes
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

Values
A0=85.08932639307272 
Labels
alertname     storage: partitions usage (%) alert
grafana_folder     Generic
hostname     storage
rule_uid     partitions_usage_alert_storage
type     generic

So sda on the host storage is too full (85 % full).

http://monitor.qa.suse.de/d/GDstorage?orgId=1&viewPanel=65090

Suggestions

  • Clean up storage, probably taken by backup of backup VM (see related ticket)
  • Do not adjust the alert itself, it is perfectly fine

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure (public) - action #173347: Ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc. size:SResolvedgpathak

Actions
Copied from openQA Infrastructure (public) - action #150887: [alert] [FIRING:1] s390zl12 (s390zl12: partitions usage (%) alert Generic partitions_usage_alert_s390zl12 generic), also s390zl13 size:MResolvedokurz2023-11-15

Actions
Actions #1

Updated by jbaier_cz 12 days ago

  • Copied from action #150887: [alert] [FIRING:1] s390zl12 (s390zl12: partitions usage (%) alert Generic partitions_usage_alert_s390zl12 generic), also s390zl13 size:M added
Actions #2

Updated by gpathak 12 days ago

Related #173347

Actions #3

Updated by gpathak 12 days ago

gpathak wrote in #note-2:

Related #173347

Maybe we can get rid of /storage/backup/backup-vm/ as we have a continuous backup at /storage/rsnapshot/

Actions #4

Updated by okurz 12 days ago

  • Category set to Regressions/Crashes
Actions #5

Updated by okurz 12 days ago

  • Related to action #173347: Ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc. size:S added
Actions #6

Updated by okurz 12 days ago

  • Parent task set to #161414
Actions #7

Updated by okurz 11 days ago

  • Priority changed from High to Urgent
  • Start date deleted (2023-11-15)

Repeated alert

Actions #8

Updated by gpathak 11 days ago

  • Assignee set to gpathak
Actions #9

Updated by gpathak 11 days ago

  • Status changed from New to In Progress
Actions #10

Updated by gpathak 11 days ago · Edited

@okurz
I am planning to delete /storage/backup/backup-vm/ since this is duplicate of /storage/rsnapshot/
/storage/rsnapshot/ is always a latest up to date backup, I have to update the https://gitlab.suse.de/suse/wiki/-/blob/main/qe_infrastructure.md#backup-of-additional-services-running-on-qamaster accordingly if we choose to delete /storage/backup/backup-vm/

What are your thoughts? Can we move /storage/backup/backup-vm/ to some other machine?

Actions #11

Updated by okurz 11 days ago

Ok, go ahead

Actions #12

Updated by okurz 11 days ago

  • Subject changed from [alert] storage: partitions usage (%) alert to [alert] storage: partitions usage (%) alert size:S
  • Description updated (diff)
Actions #13

Updated by gpathak 11 days ago

Cleaned-up /storage/backup/backup-vm/ and created MR https://gitlab.suse.de/suse/wiki/-/merge_requests/8/diffs

Actions #14

Updated by gpathak 11 days ago

  • Status changed from In Progress to Feedback
Actions #15

Updated by livdywan 11 days ago

  • Status changed from Feedback to Resolved

gpathak wrote in #note-13:

Cleaned-up /storage/backup/backup-vm/ and created MR https://gitlab.suse.de/suse/wiki/-/merge_requests/8/diffs

Please remember an Urgent ticket should not remain in Feedback. If I see this correct it should be fixed, so let's resolve and re-open if there is any issues.

Actions #16

Updated by gpuliti 10 days ago

I've approved the mr

Actions #17

Updated by okurz 10 days ago

  • Status changed from Resolved to Workable

@livdywan as I told you our monitoring data tells if we are done. Please check again.

Actions #18

Updated by gpathak 10 days ago · Edited

@okurz @livdywan
Deleting /storage/backup/backup-vm/ backup of backup-vm freed-up around 222GiB of data.
We need to check old data to delete from storage if more disk space is needed. I will look into it later.
Since, the alert from Grafana is resolved, maybe we can lower the priority.

Actions #19

Updated by okurz 10 days ago

I think I misunderstood your proposal to delete backup-vm/ . I assumed you had an additional copy of backup-data.qcow2. Deleting backup-vm/ is in conflict with #173347. I suggest to bring back backup-vm/ and find more space elsewhere by either removing other data or ordering additional storage hardware.

Actions #20

Updated by livdywan 10 days ago

  • Status changed from Workable to In Progress
Actions #21

Updated by gpathak 9 days ago

okurz wrote in #note-19:

I think I misunderstood your proposal to delete backup-vm/ . I assumed you had an additional copy of backup-data.qcow2. Deleting backup-vm/ is in conflict with #173347. I suggest to bring back backup-vm/ and find more space elsewhere by either removing other data or ordering additional storage hardware.

We cannot delete anything more from storage. Bringing back backup-vm/ will cause grafana alert to trigger again, we need to silence the alert until we have additional storage.

Actions #22

Updated by gpathak 9 days ago

  • Priority changed from Urgent to High

Reducing the priority to High from Urgent , we have grafana alert resolved as of now.
We can change the priority if needed.

Actions #23

Updated by openqa_review 9 days ago

  • Due date set to 2025-02-06

Setting due date based on mean cycle time of SUSE QE Tools

Actions #24

Updated by okurz 8 days ago

  • Status changed from In Progress to Workable
Actions #25

Updated by gpathak 5 days ago

  • Due date deleted (2025-02-06)
  • Status changed from Workable to Blocked
Actions

Also available in: Atom PDF