coordination #137714
open[qe-core] proposal for new tests that covers btrfsmaintenance
100%
Description
The scope of the new test proposed is to check functionality and system load impact of the btrfsmaintenance
package. This is basically a set of scheduled tasks to perform 4 BTRFS maintenance operations:
- balance
- defrag
- scrub
- trim
I'd declare trim
operation out of scope for now, since it involves specific hardware (ssd disks), but we can think about scheduling tests on specific machines.
Balance¶
This test needs at least 2 extra block devices, which can be separate (virtual) disks or partitions of the same disk.
Create a raid0 device from 1 x 5Gb disks:
# mkfs.btrfs -d raid0 -m raid0 /dev/vdb
# mount /dev/vdb /mnt/raid
# df -h | grep raid
/dev/vdb 5.0G 3.4M 4.8G 1% /mnt/raid
check statistics:
# btrfs filesystem df /mnt/raid
Data, RAID0: total=512.00MiB, used=64.00KiB
Data, single: total=512.00MiB, used=256.00KiB
System, RAID0: total=8.00MiB, used=0.00B
System, single: total=32.00MiB, used=16.00KiB
Metadata, RAID0: total=256.00MiB, used=112.00KiB
Metadata, single: total=256.00MiB, used=0.00B
GlobalReserve, single: total=3.25MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING: Data: single, raid0
WARNING: Metadata: single, raid0
WARNING: System: single, raid0
create a big binary file
# time dd if=/dev/random of=/mnt/raid/bigfile.bin bs=4M count=1024
1024+0 records in
1024+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 46.4441 s, 92.5 MB/s
real 0m46.445s
user 0m0.000s
sys 0m45.580s
check statistics to see disk usage
# btrfs device usage /mnt/raid/
/dev/vdb, ID: 1
Device size: 5.00GiB
Device slack: 0.00B
Data,single: 3.94GiB
Data,RAID0/1: 512.00MiB
Metadata,single: 256.00MiB
Metadata,RAID0/1: 256.00MiB
System,single: 32.00MiB
System,RAID0/1: 8.00MiB
Unallocated: 21.00MiB
notice the difference between free space reported by 'df' and unallocated space reported by btrfs:
# df -h | grep -E '(raid|Filesystem)'
Filesystem Size Used Avail Use% Mounted on
/dev/vdb 5.0G 4.1G 470M 90% /mnt/raid
# btrfs filesystem df /mnt/raid/
Data, RAID0: total=512.00MiB, used=448.50MiB
Data, single: total=3.94GiB, used=3.56GiB
System, RAID0: total=8.00MiB, used=0.00B
System, single: total=32.00MiB, used=16.00KiB
Metadata, RAID0: total=256.00MiB, used=4.33MiB
Metadata, single: total=256.00MiB, used=0.00B
GlobalReserve, single: total=3.80MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING: Data: single, raid0
WARNING: Metadata: single, raid0
WARNING: System: single, raid0
now add a new disk device to the raid0:
# btrfs device add -f /dev/vdc /mnt/raid/
# btrfs device usage /mnt/raid/
/dev/vdb, ID: 1
Device size: 5.00GiB
Device slack: 0.00B
Data,single: 3.94GiB
Data,RAID0/1: 512.00MiB
Metadata,single: 256.00MiB
Metadata,RAID0/1: 256.00MiB
System,single: 32.00MiB
System,RAID0/1: 8.00MiB
Unallocated: 21.00MiB
/dev/vdc, ID: 2
Device size: 5.00GiB
Device slack: 0.00B
Unallocated: 5.00GiB
the aggregate filesystem now is larger, but the available space does not reflect the new size, because the filesystem is *unbalanced * ; we need to trigger balance of the filesytem. The next operation is I/O intensive :
# df -h | grep -E '(raid|Filesystem)'
Filesystem Size Used Avail Use% Mounted on
/dev/vdb 10G 4.1G 490M 90% /mnt/raid
# btrfs balance start --full-balance -v /mnt/raid
now free space is available:
# df -h | grep raid
Filesystem Size Used Avail Use% Mounted on
/dev/vdb 10G 3.6G 6.3G 37% /mnt/raid
Here we can either force the balance, or just "start" it and let run in background, using the script in the maintenance package called /usr/share/btrfsmaintenance/btrfs-balance.sh
Scrub¶
Scrub is a pass over all filesystem data and metadata and verifying the checksums. If a valid copy is available (replicated block group profiles) then the damaged one is repaired. All copies of the replicated profiles are validated.
to test btrfs-scrub the idea is to deliberately create a defective filesystem.
this can be accomplished in many ways, for example
- Manually 'Corrupt' Data Blocks
- Deleting important Metadata (using dd or similar)
- Unmount the Filesystem Abruptly
- leverage kernel's fault injection features https://www.kernel.org/doc/Documentation/fault-injection/fault-injection.txt
let's start by creating a RAID1 filesystem:
# mkfs.btrfs -f -d raid1 -m raid1 /dev/vdb /dev/vdc
# mount /dev/vdb /mnt/raid
# df -h | grep -E '(raid|Filesystem)'
Filesystem Size Used Avail Use% Mounted on
/dev/vdb 5.0G 3.7M 4.8G 1% /mnt/raid
# echo "this is a test" > /mnt/raid/myfile.txt
# umount /mnt/raid
now we need to find out at which offset in the block device is the string we just wrote
# grep --only-matching --byte-offset --max-count=1 --text "this is a test" /dev/vdc
9616961:this is a test
and now we can overwrite data only in one of the two disks, by writing directly in the block device :
# echo "THIS IS A MESS" | dd of=/dev/vdc bs=1 conv=notrunc seek=9616961
15+0 records in
15+0 records out
15 bytes copied, 0.00108664 s, 13.8 kB/s
remount RAID device:
# mount /dev/vdc /mnt/raid
create many small random files, just to increase data and metadata usage.
#! /bin/bash
for d in {1..50}; do
mkdir dir$( printf %03d "$d") && pushd $_
for n in {0..999}; do
dd if=/dev/zero of=file$( printf %03d "$n" ).bin bs=10 count=$(( RANDOM%1024 )) >& /dev/null
done
popd
done
start scrub:
# btrfs scrub start -B -f /mnt/raid/
The scrubbing status is recorded in /var/lib/btrfs/ in textual files named scrub.status.UUID for a filesystem identified by the given UUID. (Progress state is communicated through a named pipe in file scrub.progress.UUID in the same directory.) The status file is updated every 5 seconds. A resumed scrub will continue from the last saved position.
in the dmesg we can find some details:
[ 9537.259295] BTRFS info (device vdb): scrub: started on devid 2
[ 9537.259631] BTRFS info (device vdb): scrub: started on devid 1
[ 9537.260543] BTRFS info (device vdb): scrub: finished on devid 1 with status: 0
[ 9537.260563] BTRFS warning (device vdb): checksum error at logical 30572544 on dev /dev/vdc, physical 9601024: metadata leaf (level 0) in tree 5
[ 9537.260565] BTRFS warning (device vdb): checksum error at logical 30572544 on dev /dev/vdc, physical 9601024: metadata leaf (level 0) in tree 5
[ 9537.260566] BTRFS error (device vdb): bdev /dev/vdc errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[ 9537.260954] BTRFS error (device vdb): fixed up error at logical 30572544 on dev /dev/vdc
[ 9537.261073] BTRFS info (device vdb): scrub: finished on devid 2 with status: 0
# ls -l /var/lib/btrfs/scrub.status.*
-rw------- 1 root root 723 Oct 7 11:28 /var/lib/btrfs/scrub.status.71589e19-f992-464b-9113-5d5b8a19480d
-rw------- 1 root root 763 Oct 9 15:10 /var/lib/btrfs/scrub.status.ded4e395-487a-4ad1-aee3-a077ea110fb6
# cat /var/lib/btrfs/scrub.status.ded4e395-487a-4ad1-aee3-a077ea110fb6
scrub status:1
ded4e395-487a-4ad1-aee3-a077ea110fb6:1|data_extents_scrubbed:5|tree_extents_scrubbed:8|data_bytes_scrubbed:327680|tree_bytes_scrubbed:131072|read_errors:0|csum_errors:0|verify_errors:0|no_csum:80|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:1372585984|t_start:1696857002|t_resumed:0|duration:0|canceled:0|finished:1
Defrag¶
to test btrfs-defrag we need to create a fragmented filesytem.
This is hard because Btrfs is designed to handle data fragmentation automatically, aiming for optimal performance and data organization.
what we can do is :
- Copy and delete random data
- appending data to existing files
- Create snapshots
this is a long test to run and requires a lot of crunching time. One option is to prepare offline a "fragmented" QCOW2 disk and give it to the test, let's discuss if feasible.