action #137804: [qe-core] btrfsmaintenance - test btrfs balance on large disks - openQA Tests (public) - openSUSE Project Management Tool

Actions

Copy link

action #137804

closed

coordination #137714: [qe-core] proposal for new tests that covers btrfsmaintenance

[qe-core] btrfsmaintenance - test btrfs balance on large disks

Added by szarate over 1 year ago. Updated about 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

amanzini

Category:

New test

Target version:

QA (public) - QE-Core: Ready

Start date:

2023-10-12

Due date:

% Done:

10%

Estimated time:

Difficulty:

Sprint:

QE-Core: December Sprint 23 (Dec 13 - Jan 10)

Tags:

qe-core-december-sprint, qe-core-october-sprint

Description

Balance¶

This test needs at least 2 extra block devices, which can be separate (virtual) disks or partitions of the same disk.
Create a raid0 device from 1 x 5Gb disks:

 # mkfs.btrfs -d raid0 -m raid0 /dev/vdb 
 # mount /dev/vdb /mnt/raid
 # df -h | grep raid
 /dev/vdb        5.0G  3.4M  4.8G   1% /mnt/raid

check statistics:

 # btrfs filesystem df /mnt/raid 
Data, RAID0: total=512.00MiB, used=64.00KiB
Data, single: total=512.00MiB, used=256.00KiB
System, RAID0: total=8.00MiB, used=0.00B
System, single: total=32.00MiB, used=16.00KiB
Metadata, RAID0: total=256.00MiB, used=112.00KiB
Metadata, single: total=256.00MiB, used=0.00B
GlobalReserve, single: total=3.25MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING:   Data: single, raid0
WARNING:   Metadata: single, raid0
WARNING:   System: single, raid0

create a big binary file

 # time dd if=/dev/random of=/mnt/raid/bigfile.bin bs=4M count=1024
1024+0 records in
1024+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 46.4441 s, 92.5 MB/s

real    0m46.445s
user    0m0.000s
sys 0m45.580s

check statistics to see disk usage

 # btrfs device usage  /mnt/raid/
/dev/vdb, ID: 1
   Device size:             5.00GiB
   Device slack:              0.00B
   Data,single:             3.94GiB
   Data,RAID0/1:          512.00MiB
   Metadata,single:       256.00MiB
   Metadata,RAID0/1:      256.00MiB
   System,single:          32.00MiB
   System,RAID0/1:          8.00MiB
   Unallocated:            21.00MiB

notice the difference between free space reported by 'df' and unallocated space reported by btrfs:

 # df -h | grep -E '(raid|Filesystem)'
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        5.0G  4.1G  470M  90% /mnt/raid

# btrfs filesystem df  /mnt/raid/
Data, RAID0: total=512.00MiB, used=448.50MiB
Data, single: total=3.94GiB, used=3.56GiB
System, RAID0: total=8.00MiB, used=0.00B
System, single: total=32.00MiB, used=16.00KiB
Metadata, RAID0: total=256.00MiB, used=4.33MiB
Metadata, single: total=256.00MiB, used=0.00B
GlobalReserve, single: total=3.80MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING:   Data: single, raid0
WARNING:   Metadata: single, raid0
WARNING:   System: single, raid0

now add a new disk device to the raid0:

 # btrfs device add -f /dev/vdc /mnt/raid/
 # btrfs device usage /mnt/raid/
/dev/vdb, ID: 1
   Device size:             5.00GiB
   Device slack:              0.00B
   Data,single:             3.94GiB
   Data,RAID0/1:          512.00MiB
   Metadata,single:       256.00MiB
   Metadata,RAID0/1:      256.00MiB
   System,single:          32.00MiB
   System,RAID0/1:          8.00MiB
   Unallocated:            21.00MiB

/dev/vdc, ID: 2
   Device size:             5.00GiB
   Device slack:              0.00B
   Unallocated:             5.00GiB

the aggregate filesystem now is larger, but the available space does not reflect the new size, because the filesystem is *unbalanced * ; we need to trigger balance of the filesytem. The next operation is I/O intensive :

# df -h | grep -E '(raid|Filesystem)'
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb         10G  4.1G  490M  90% /mnt/raid

# btrfs balance start --full-balance -v /mnt/raid

now free space is available:

# df -h | grep raid
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb         10G  3.6G  6.3G  37% /mnt/raid

Here we can either force the balance, or just "start" it and let run in background, using the script in the maintenance package called /usr/share/btrfsmaintenance/btrfs-balance.sh

Acceptance Criteria¶

AC1: BTRFS balance test is scheduled in Staging and product in development for SLES and ALP
AC2: A good performance baseline is established, so we know when a bug or other defect hinders the performance.

Notes¶

Contact the Kernel team to ask for an isci disc or similar with 100GB or better 1TB for these tests to be ran
See documentation for backends at: https://github.com/os-autoinst/os-autoinst/blob/master/doc/backend_vars.asciidoc
The objective is to test the functionality of the balance tools provided by the package rather than testing the tool from a performance perspective; we're looking to avoid scenarios where tests are failing due to
stability of the test is proven by scheduling multiple test runs with different disk types and configurations, preferably using disks attached over the network via Netapp or something similar.

Files

clipboard-202312151432-kgvlo.png (8.27 KB) clipboard-202312151432-kgvlo.png

amanzini, 2023-12-15 13:32

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by amanzini over 1 year ago

My 2c:

if the objective of the test is to test the pure btrfs-balance functionality, e.g. start with an "unbalanced" setup and then rebalance it, the test might be done as well with "small" (20GB) disk, and be scheduled without dedicate hardware, something like we already do for RAID1 setup. Bigger disks may be required to better reflect the customer experience and reason about performance / system load during the balance operation.

Actions

Copy link

Updated by amanzini about 1 year ago

Assignee set to amanzini

need to clarify scheduling on bare metal or iscsi-attached storage for bigger disk sizes; in the meantime for development will go forward with virtual disks

Actions

Copy link

Updated by szarate about 1 year ago

Sprint changed from QE-Core: October Sprint 23 (Oct 11 - Nov 08) to QE-Core: December Sprint 23 (Dec 13 - Jan 10)
Tags changed from qe-core-october-sprint to qe-core-october-sprint, qe-core-december-sprint

Actions

Copy link

Updated by amanzini about 1 year ago

Status changed from Workable to In Progress

Actions

Copy link

Updated by amanzini about 1 year ago

File clipboard-202312151432-kgvlo.png clipboard-202312151432-kgvlo.png added

some early notes:

by default /etc/sysconfig/btrfsmaintenance only balance "/" , need to change it to "auto" or point to our raid0 volume
on 15SP6 seems the maintenance scripts are just symlinks to /bin/true ? :

need to check on older versions

Actions

Copy link

Updated by amanzini about 1 year ago

Status changed from In Progress to Feedback
% Done changed from 0 to 10

PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18348

Actions

Copy link

Updated by amanzini about 1 year ago

as the test requires NUMDISKS=3 , need some clarification on where and how to schedule it

Actions

Copy link

Updated by amanzini about 1 year ago

Status changed from Feedback to Blocked

Actions

Copy link

Updated by amanzini about 1 year ago

Status changed from Blocked to Feedback

Actions

Copy link

#10

Updated by amanzini about 1 year ago

comment about scheduling on functional ? https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18348#issuecomment-1867334240

criteria says
AC1: BTRFS balance test is scheduled in Staging and product in development for SLES and ALP

Actions

Copy link

#11

Updated by szarate about 1 year ago · Edited

amanzini wrote in #note-10:

comment about scheduling on functional ? https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18348#issuecomment-1867334240

criteria says
AC1: BTRFS balance test is scheduled in Staging and product in development for SLES and ALP

Functional is Product in development (there's one for each codestream of ALP, SLES), in case staging hasn't been enabled yet... lets leave it out for now, but do create a ticket about it for the future

Actions

Copy link

#12