Project

General

Profile

Actions

action #173947

closed

[alert] s390zl12: partitions usage (%) alert Generic partitions_usage_alert_s390zl12 generic size:S

Added by tinita 3 months ago. Updated 18 days ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Regressions/Crashes
Start date:
2024-12-09
Due date:
% Done:

0%

Estimated time:

Description

Date: Mon, 09 Dec 2024 03:41:27 +0100

https://monitor.qa.suse.de/alerting/grafana/de4e513999b0b67eaa549ebfb7adb270d1735cf9/view?orgId=1

hostname=s390zl12

Suggestions


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #170122: [alert][FIRING:1] s390zl13 (s390zl13: partitions usage (%) alert Generic partitions_usage_alert_s390zl13 generic)Resolvedokurz2024-11-21

Actions
Actions #1

Updated by okurz 3 months ago

  • Related to action #170122: [alert][FIRING:1] s390zl13 (s390zl13: partitions usage (%) alert Generic partitions_usage_alert_s390zl13 generic) added
Actions #2

Updated by okurz 3 months ago

  • Tags set to infra, reactive work, s390x
Actions #3

Updated by okurz 3 months ago

  • Priority changed from High to Urgent
Actions #4

Updated by tinita 3 months ago

Looking at the last 90 days, the used disk space just goes up and up. there seems to be some cleanup happening, resolving the alert for a while.
https://monitor.qa.suse.de/d/GDs390zl12/dashboard-for-s390zl12?viewPanel=panel-65090&from=now-90d&to=now&timezone=browser&var-datasource=000000001&refresh=1m

Actions #5

Updated by mkittler 3 months ago

  • Subject changed from [alert] s390zl12: partitions usage (%) alert Generic partitions_usage_alert_s390zl12 generic to [alert] s390zl12: partitions usage (%) alert Generic partitions_usage_alert_s390zl12 generic size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #6

Updated by mkittler 3 months ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
Actions #7

Updated by mkittler 3 months ago

I deleted/moved problematic images and mentioned it on Slack. It is unfortunate that the root partition on this machine is rather small for a full-blown openSUSE installation with btrfs snapshots.

We're back at 79.8 % which should be sufficient for now.

Actions #8

Updated by mkittler 3 months ago

  • Status changed from In Progress to Resolved

There's not much gained from cleaning up further data in home directories but snapshots take a considerable amount of disk space:

martchus@s390zl12:~> sudo btrfs filesystem du -s /
     Total   Exclusive  Set shared  Filename
  45.29GiB     6.95GiB     5.49GiB  /
martchus@s390zl12:~> sudo btrfs filesystem du -s /home
     Total   Exclusive  Set shared  Filename
  88.00KiB    88.00KiB       0.00B  /home
martchus@s390zl12:~> sudo btrfs filesystem du -s /usr
     Total   Exclusive  Set shared  Filename
   3.86GiB       0.00B     3.86GiB  /usr
martchus@s390zl12:~> sudo btrfs filesystem du -s /.snapshots
     Total   Exclusive  Set shared  Filename
  40.11GiB     1.76GiB     5.49GiB  /.snapshots

Although it doesn't look as bad according to snapper:

martchus@s390zl12:~> sudo snapper list
   # | Type   | Pre # | Date                             | User | Used Space | Cleanup | Description           | Userdata     
-----+--------+-------+----------------------------------+------+------------+---------+-----------------------+--------------
  0  | single |       |                                  | root |            |         | current               |              
400* | single |       | Wed 08 May 2024 10:53:45 PM CEST | root |  29.92 MiB |         | writable copy of #391 |              
568  | pre    |       | Fri 15 Nov 2024 03:32:53 AM CET  | root |   1.16 GiB | number  | zypp(zypper)          | important=yes
569  | post   |   568 | Fri 15 Nov 2024 03:37:52 AM CET  | root |   7.55 MiB | number  |                       | important=yes
570  | pre    |       | Sun 17 Nov 2024 03:31:46 AM CET  | root |   2.69 MiB | number  | zypp(zypper)          | important=yes
571  | post   |   570 | Sun 17 Nov 2024 03:32:07 AM CET  | root | 205.95 MiB | number  |                       | important=yes
594  | pre    |       | Mon 09 Dec 2024 03:32:55 AM CET  | root | 469.26 MiB | number  | zypp(zypper)          | important=no 
595  | post   |   594 | Mon 09 Dec 2024 03:33:08 AM CET  | root |   2.02 MiB | number  |                       | important=no 
596  | pre    |       | Tue 10 Dec 2024 03:37:26 PM CET  | root | 352.00 KiB | number  | zypp(zypper)          | important=no 
597  | post   |   596 | Tue 10 Dec 2024 03:37:28 PM CET  | root | 416.00 KiB | number  |                       | important=no

I also checked all the other subvolumes but none contain a significant amount of data. So maybe the snapshots do use a lot of space but snapper doesn't show it.

Considering we're below 80 % I'll leave it at that, though. Although it would be interesting if someone could shed some light on the snapshotting situation.

Actions #9

Updated by ybonatakis 18 days ago ยท Edited

  • Status changed from Resolved to Workable

reopen it as there was a new alert today http://monitor.qa.suse.de/goto/hzIQSi5Hg?orgId=1

Actions #10

Updated by ybonatakis 18 days ago

but fixed two hours later. not sure if someone did anything

Actions #12

Updated by mkittler 18 days ago

  • Status changed from In Progress to Resolved

I couldn't come up with a quick fix. So I closed this 2 months old ticket again and created #177159 which we should discuss/estimate.

Actions

Also available in: Atom PDF