Project

General

Profile

Actions

action #138518

closed

unreal6 partition usage alert

Added by livdywan 7 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2023-10-25
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

 A0=87.99948619941982  

Alert from 13.00 CEST

Acceptance criteria

  • AC1: No alerts about partition usage on unreal6

Rollback steps

  • Unsilence alert

Suggestions


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #138650: partition usage panels show a long list of undefined and no reasonable graphs at least for some generic machines size:MResolvedtinita2023-10-27

Actions
Actions #2

Updated by okurz 7 months ago

See #131546 as well. There is a rule in our salt to add a Cron job cleaning up assets. Maybe we can use the second SSD in the machine or have it replaced with a bigger one, there are still some in the shelf

Actions #3

Updated by okurz 7 months ago

  • Tags set to infra
  • Priority changed from High to Urgent

It seems the silence is not effective or was never created

Actions #4

Updated by livdywan 7 months ago

  • Priority changed from Urgent to High

Yes. Trying again.

Actions #5

Updated by okurz 7 months ago

  • Assignee set to okurz

discussing with mloviska about paths to improvement. I am thinking of replacing the second disk with a larger one.

Actions #6

Updated by okurz 7 months ago

  • Tags changed from infra to infra, next-frankencampus-visit
  • Status changed from New to In Progress

The machine has two physical storage slots so I can not simply add a third. I will power down the machine and replace the one holding the VG with a 800GB-1TB one. Then we can also check network connectivity afterwards again.

unreal6:~ # lsblk -o NAME,SERIAL
NAME                    SERIAL
sda                     CVDA410106EZ1207GN
└─sda1                  
  └─openqa_vg-openqa_lv 
sdb                     CVDA410105VG1207GN
├─sdb1                  
├─sdb2                  
└─sdb3          

so I will need to remove CVDA410106EZ1207GN and replace with a bigger one

Actions #7

Updated by okurz 7 months ago

Swapped disks and added 960GB as second. System booted into emergency mode (expected), in there disabled mount point relying on not anymore existing VG, continued boot and then

pvcreate /dev/sdb
vgcreate openqa_vg /dev/sdb
lvcreate -l 100%VG -n openqa_lv openqa_vg
mkfs.ext4 /dev/openqa_vg/openqa_lv
# enable back entry in /etc/fstab
mount -a
systemctl start libvirtd

monitoring jobs from https://openqa.suse.de/admin/workers referencing "unreal6" and retriggered according failures, e.g. https://openqa.suse.de/tests/12685089 scheduled just now, https://openqa.suse.de/tests/12684276#live running right now

Actions #8

Updated by openqa_review 6 months ago

  • Due date set to 2023-11-10

Setting due date based on mean cycle time of SUSE QE Tools

Actions #9

Updated by okurz 6 months ago

  • Status changed from In Progress to Feedback
  • Priority changed from High to Normal

A lot of tests passed using unreal6, e.g. https://openqa.suse.de/tests/12693711 and https://openqa.suse.de/tests/12684276 and https://openqa.suse.de/tests/12685122 . Alert is ok again. I have observed that the partition usage graphs are broken, reported #138650. Removed the silence accordingly. I will monitor over the next time.

Actions #10

Updated by okurz 6 months ago

  • Tags changed from infra, next-frankencampus-visit to infra
  • Due date deleted (2023-11-10)
  • Status changed from Feedback to Resolved

No related alert, monitoring graphs still broken.

/dev/mapper/openqa_vg-openqa_lv       880G   84G  751G  11% /var/lib/libvirt/images

has more than enough free space.

Actions #11

Updated by okurz 6 months ago

  • Related to action #138650: partition usage panels show a long list of undefined and no reasonable graphs at least for some generic machines size:M added
Actions

Also available in: Atom PDF