Project

General

Profile

Actions

action #137804

closed

coordination #137714: [qe-core] proposal for new tests that covers btrfsmaintenance

[qe-core] btrfsmaintenance - test btrfs balance on large disks

Added by szarate 7 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
New test
Target version:
Start date:
2023-10-12
Due date:
% Done:

10%

Estimated time:
Difficulty:
Sprint:
QE-Core: December Sprint 23 (Dec 13 - Jan 10)

Description

Balance

This test needs at least 2 extra block devices, which can be separate (virtual) disks or partitions of the same disk.
Create a raid0 device from 1 x 5Gb disks:

 # mkfs.btrfs -d raid0 -m raid0 /dev/vdb 
 # mount /dev/vdb /mnt/raid
 # df -h | grep raid
 /dev/vdb        5.0G  3.4M  4.8G   1% /mnt/raid

check statistics:

 # btrfs filesystem df /mnt/raid 
Data, RAID0: total=512.00MiB, used=64.00KiB
Data, single: total=512.00MiB, used=256.00KiB
System, RAID0: total=8.00MiB, used=0.00B
System, single: total=32.00MiB, used=16.00KiB
Metadata, RAID0: total=256.00MiB, used=112.00KiB
Metadata, single: total=256.00MiB, used=0.00B
GlobalReserve, single: total=3.25MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING:   Data: single, raid0
WARNING:   Metadata: single, raid0
WARNING:   System: single, raid0

create a big binary file

 # time dd if=/dev/random of=/mnt/raid/bigfile.bin bs=4M count=1024
1024+0 records in
1024+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 46.4441 s, 92.5 MB/s

real    0m46.445s
user    0m0.000s
sys 0m45.580s

check statistics to see disk usage

 # btrfs device usage  /mnt/raid/
/dev/vdb, ID: 1
   Device size:             5.00GiB
   Device slack:              0.00B
   Data,single:             3.94GiB
   Data,RAID0/1:          512.00MiB
   Metadata,single:       256.00MiB
   Metadata,RAID0/1:      256.00MiB
   System,single:          32.00MiB
   System,RAID0/1:          8.00MiB
   Unallocated:            21.00MiB

notice the difference between free space reported by 'df' and unallocated space reported by btrfs:

 # df -h | grep -E '(raid|Filesystem)'
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        5.0G  4.1G  470M  90% /mnt/raid
# btrfs filesystem df  /mnt/raid/
Data, RAID0: total=512.00MiB, used=448.50MiB
Data, single: total=3.94GiB, used=3.56GiB
System, RAID0: total=8.00MiB, used=0.00B
System, single: total=32.00MiB, used=16.00KiB
Metadata, RAID0: total=256.00MiB, used=4.33MiB
Metadata, single: total=256.00MiB, used=0.00B
GlobalReserve, single: total=3.80MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING:   Data: single, raid0
WARNING:   Metadata: single, raid0
WARNING:   System: single, raid0

now add a new disk device to the raid0:

 # btrfs device add -f /dev/vdc /mnt/raid/
 # btrfs device usage /mnt/raid/
/dev/vdb, ID: 1
   Device size:             5.00GiB
   Device slack:              0.00B
   Data,single:             3.94GiB
   Data,RAID0/1:          512.00MiB
   Metadata,single:       256.00MiB
   Metadata,RAID0/1:      256.00MiB
   System,single:          32.00MiB
   System,RAID0/1:          8.00MiB
   Unallocated:            21.00MiB

/dev/vdc, ID: 2
   Device size:             5.00GiB
   Device slack:              0.00B
   Unallocated:             5.00GiB

the aggregate filesystem now is larger, but the available space does not reflect the new size, because the filesystem is *unbalanced * ; we need to trigger balance of the filesytem. The next operation is I/O intensive :

# df -h | grep -E '(raid|Filesystem)'
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb         10G  4.1G  490M  90% /mnt/raid

# btrfs balance start --full-balance -v /mnt/raid 

now free space is available:

# df -h | grep raid
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb         10G  3.6G  6.3G  37% /mnt/raid

Here we can either force the balance, or just "start" it and let run in background, using the script in the maintenance package called /usr/share/btrfsmaintenance/btrfs-balance.sh

Acceptance Criteria

  • AC1: BTRFS balance test is scheduled in Staging and product in development for SLES and ALP
  • AC2: A good performance baseline is established, so we know when a bug or other defect hinders the performance.

Notes

  • Contact the Kernel team to ask for an isci disc or similar with 100GB or better 1TB for these tests to be ran
  • See documentation for backends at: https://github.com/os-autoinst/os-autoinst/blob/master/doc/backend_vars.asciidoc
  • The objective is to test the functionality of the balance tools provided by the package rather than testing the tool from a performance perspective; we're looking to avoid scenarios where tests are failing due to
  • stability of the test is proven by scheduling multiple test runs with different disk types and configurations, preferably using disks attached over the network via Netapp or something similar.

Files


Related issues 1 (0 open1 closed)

Related to openQA Tests - action #40163: [core][aarch64][s390x] test fails in btrfs_qgroups - needs to be scheduled on two-disk-machine.Resolvedrfan1

Actions
Actions #1

Updated by amanzini 6 months ago

My 2c:

if the objective of the test is to test the pure btrfs-balance functionality, e.g. start with an "unbalanced" setup and then rebalance it, the test might be done as well with "small" (20GB) disk, and be scheduled without dedicate hardware, something like we already do for RAID1 setup. Bigger disks may be required to better reflect the customer experience and reason about performance / system load during the balance operation.

Actions #2

Updated by amanzini 5 months ago

  • Assignee set to amanzini

need to clarify scheduling on bare metal or iscsi-attached storage for bigger disk sizes; in the meantime for development will go forward with virtual disks

Actions #3

Updated by szarate 4 months ago

  • Sprint changed from QE-Core: October Sprint 23 (Oct 11 - Nov 08) to QE-Core: December Sprint 23 (Dec 13 - Jan 10)
  • Tags changed from qe-core-october-sprint to qe-core-october-sprint, qe-core-december-sprint
Actions #4

Updated by amanzini 4 months ago

  • Status changed from Workable to In Progress
Actions #5

Updated by amanzini 4 months ago

some early notes:

  • by default /etc/sysconfig/btrfsmaintenance only balance "/" , need to change it to "auto" or point to our raid0 volume

  • on 15SP6 seems the maintenance scripts are just symlinks to /bin/true ? :

need to check on older versions

Actions #6

Updated by amanzini 4 months ago

  • Status changed from In Progress to Feedback
  • % Done changed from 0 to 10
Actions #7

Updated by amanzini 4 months ago

as the test requires NUMDISKS=3 , need some clarification on where and how to schedule it

Actions #8

Updated by amanzini 4 months ago

  • Status changed from Feedback to Blocked
Actions #9

Updated by amanzini 4 months ago

  • Status changed from Blocked to Feedback
Actions #10

Updated by amanzini 4 months ago

comment about scheduling on functional ? https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18348#issuecomment-1867334240

criteria says
AC1: BTRFS balance test is scheduled in Staging and product in development for SLES and ALP

Actions #11

Updated by szarate 4 months ago ยท Edited

amanzini wrote in #note-10:

comment about scheduling on functional ? https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18348#issuecomment-1867334240

criteria says
AC1: BTRFS balance test is scheduled in Staging and product in development for SLES and ALP

Functional is Product in development (there's one for each codestream of ALP, SLES), in case staging hasn't been enabled yet... lets leave it out for now, but do create a ticket about it for the future

Actions #13

Updated by amanzini 4 months ago

  • Status changed from Feedback to Resolved
Actions #14

Updated by szarate 3 months ago

  • Related to action #40163: [core][aarch64][s390x] test fails in btrfs_qgroups - needs to be scheduled on two-disk-machine. added
Actions #15

Updated by szarate 3 months ago

There's also #40163 to look at

Actions

Also available in: Atom PDF