action #137246
closed[qe-core] test fails in force_scheduled_tasks
Added by amanzini about 1 year ago. Updated about 1 year ago.
0%
Description
Observation¶
openQA test in scenario sle-15-SP1-Server-DVD-Incidents-Minimal-x86_64-qam-minimal-lvm@64bit fails in
force_scheduled_tasks
Test suite description¶
Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. disk image for grub2_lvm
minimal = base pattern, minimal (enhanced base) pattern are additional convenience paclkages
Reproducible¶
Fails since (at least) Build :30547:glibc (current job)
Expected result¶
Last good: :30115:openssl-1_1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by amanzini about 1 year ago
- Status changed from New to In Progress
First observation, this test fails often waiting for loadavg to "settle".
Maybe we can tweak a bit the definition of "loadavg is low when < 0.10" on newer systems ?
Another point to consider: the test may require up to 19 minutes to run and waits for the system's load average to drop before executing. This might not reflect real-world customer environments.
Updated by amanzini about 1 year ago
did some investigation runs with different RAM size, to see if the time took by this test may vary; of course it can be influenced by the worker resources and instant I/O load at the time the test runs.
QEMURAM=1536 https://openqa.suse.de/tests/12359963 5m 10s
QEMURAM=1536 https://openqa.suse.de/tests/12365819 2m 26s
QEMURAM=1536 https://openqa.suse.de/tests/12372082 3m 2s
QEMURAM=2048 https://openqa.suse.de/tests/12359964 5m 9s
QEMURAM=2048 https://openqa.suse.de/tests/12365820 3m 4s
QEMURAM=2048 https://openqa.suse.de/tests/12372083 3m 25s
QEMURAM=3072 https://openqa.suse.de/tests/12359965 4m 13s
QEMURAM=3072 https://openqa.suse.de/tests/12365821 10m 10s
QEMURAM=3072 https://openqa.suse.de/tests/12372084 3m 2s
QEMURAM=4096 https://openqa.suse.de/tests/12359966 2m 57s
QEMURAM=4096 https://openqa.suse.de/tests/12365822 3m 8s
QEMURAM=4096 https://openqa.suse.de/tests/12372085 3m 31s
QEMURAM=6144 https://openqa.suse.de/tests/12359967 3m 2s
QEMURAM=6144 https://openqa.suse.de/tests/12365823 3m 10s
QEMURAM=6144 https://openqa.suse.de/tests/12372086 5m 14s
QEMURAM=8192 https://openqa.suse.de/tests/12359968 3m 8s
QEMURAM=8192 https://openqa.suse.de/tests/12365824 5m 28s
QEMURAM=8192 https://openqa.suse.de/tests/12372087 6m 37s
looking at the video recording, seems the load can drop up to 0.11 then something else kicks in and bring it up again. So a quick fix to gain leverage here could be to relax the constraint checking for load <=0.11 and evaluate to not fail the test on the first load settling.
as side question, the variable $is_settled
is always 1 because when load is not settled, the whole program dies. So what's the point on checking the variable at line 61 ?
Updated by amanzini about 1 year ago
the same test fails often also in a migration test for S390X arch
Updated by amanzini about 1 year ago
@szarate raised interesting point about keeping a low loadavg, expressed in ticket https://bugzilla.opensuse.org/show_bug.cgi?id=1063638 . Basically high load impacted also test results due to lost characters in serial communication. So my proposals are on two sides:
- being observed only on older system, to relax a bit the
< 0.10
constraint to try address just this test flakyness. - create a new ticket to improve on dedicated filesystem tests like btrfsmaintenance including a check about system load average, after agreeing with stakeholder what can be a "sane" load average threshold. To make test more realistic, we could also create a "complex" BTRFS situation with quotas, snapshots and un-balanced filesytem on purpose.
Updated by amanzini about 1 year ago
- Status changed from In Progress to Feedback
PR for first part: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/17914
Updated by szarate about 1 year ago
- Sprint set to QE-Core: September Sprint 23 (Sep 06 - Oct 04)
- Tags set to bugbusters
create a new ticket to improve on dedicated filesystem tests like btrfsmaintenance including a check about system load average, after agreeing with stakeholders what can be a "sane" load average threshold. To make test more realistic, we could also create a "complex" BTRFS situation with quotas, snapshots, and un-balanced filesytem on purpose.
I like where this is going, once the settling of the load part is "managed" I'd love to hear your idea on a more complex btrfs situation with quotas and dirty filesystems, we could be crossing over Kernel territory, I know somebody on @pcervinka's team does some filesystem testing but I can't remember who
Updated by amanzini about 1 year ago
The scope of the new test proposed is to check functionality and system load impact of the btrfsmaintenance
package. This is basically a set of scheduled tasks to perform 4 BTRFS maintenance operations:
- balance
- defrag
- scrub
- trim
I'd declare trim
operation out of scope for now, since it involves specific hardware (ssd disks), but we can think about scheduling tests on specific machines.
Balance¶
This test needs at least 2 extra block devices, which can be separate (virtual) disks or partitions of the same disk.
Create a raid0 device from 1 x 5Gb disks:
# mkfs.btrfs -d raid0 -m raid0 /dev/vdb
# mount /dev/vdb /mnt/raid
# df -h | grep raid
/dev/vdb 5.0G 3.4M 4.8G 1% /mnt/raid
check statistics:
# btrfs filesystem df /mnt/raid
Data, RAID0: total=512.00MiB, used=64.00KiB
Data, single: total=512.00MiB, used=256.00KiB
System, RAID0: total=8.00MiB, used=0.00B
System, single: total=32.00MiB, used=16.00KiB
Metadata, RAID0: total=256.00MiB, used=112.00KiB
Metadata, single: total=256.00MiB, used=0.00B
GlobalReserve, single: total=3.25MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING: Data: single, raid0
WARNING: Metadata: single, raid0
WARNING: System: single, raid0
create a big binary file
# time dd if=/dev/random of=/mnt/raid/bigfile.bin bs=4M count=1024
1024+0 records in
1024+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 46.4441 s, 92.5 MB/s
real 0m46.445s
user 0m0.000s
sys 0m45.580s
check statistics to see disk usage
# btrfs device usage /mnt/raid/
/dev/vdb, ID: 1
Device size: 5.00GiB
Device slack: 0.00B
Data,single: 3.94GiB
Data,RAID0/1: 512.00MiB
Metadata,single: 256.00MiB
Metadata,RAID0/1: 256.00MiB
System,single: 32.00MiB
System,RAID0/1: 8.00MiB
Unallocated: 21.00MiB
notice the difference between free space reported by 'df' and unallocated space reported by btrfs:
# df -h | grep -E '(raid|Filesystem)'
Filesystem Size Used Avail Use% Mounted on
/dev/vdb 5.0G 4.1G 470M 90% /mnt/raid
# btrfs filesystem df /mnt/raid/
Data, RAID0: total=512.00MiB, used=448.50MiB
Data, single: total=3.94GiB, used=3.56GiB
System, RAID0: total=8.00MiB, used=0.00B
System, single: total=32.00MiB, used=16.00KiB
Metadata, RAID0: total=256.00MiB, used=4.33MiB
Metadata, single: total=256.00MiB, used=0.00B
GlobalReserve, single: total=3.80MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING: Data: single, raid0
WARNING: Metadata: single, raid0
WARNING: System: single, raid0
now add a new disk device to the raid0:
# btrfs device add -f /dev/vdc /mnt/raid/
# btrfs device usage /mnt/raid/
/dev/vdb, ID: 1
Device size: 5.00GiB
Device slack: 0.00B
Data,single: 3.94GiB
Data,RAID0/1: 512.00MiB
Metadata,single: 256.00MiB
Metadata,RAID0/1: 256.00MiB
System,single: 32.00MiB
System,RAID0/1: 8.00MiB
Unallocated: 21.00MiB
/dev/vdc, ID: 2
Device size: 5.00GiB
Device slack: 0.00B
Unallocated: 5.00GiB
the aggregate filesystem now is larger, but the available space does not reflect the new size, because the filesystem is *unbalanced * ; we need to trigger balance of the filesytem. The next operation is I/O intensive :
# df -h | grep -E '(raid|Filesystem)'
Filesystem Size Used Avail Use% Mounted on
/dev/vdb 10G 4.1G 490M 90% /mnt/raid
# btrfs balance start --full-balance -v /mnt/raid
now free space is available:
# df -h | grep raid
Filesystem Size Used Avail Use% Mounted on
/dev/vdb 10G 3.6G 6.3G 37% /mnt/raid
Here we can either force the balance, or just "start" it and let run in background, using the script in the maintenance package called /usr/share/btrfsmaintenance/btrfs-balance.sh
Scrub¶
Scrub is a pass over all filesystem data and metadata and verifying the checksums. If a valid copy is available (replicated block group profiles) then the damaged one is repaired. All copies of the replicated profiles are validated.
to test btrfs-scrub the idea is to deliberately create a defective filesystem.
this can be accomplished in many ways, for example
- Manually 'Corrupt' Data Blocks
- Deleting important Metadata (using dd or similar)
- Unmount the Filesystem Abruptly
- leverage kernel's fault injection features https://www.kernel.org/doc/Documentation/fault-injection/fault-injection.txt
let's start by creating a RAID1 filesystem:
# mkfs.btrfs -f -d raid1 -m raid1 /dev/vdb /dev/vdc
# mount /dev/vdb /mnt/raid
# df -h | grep -E '(raid|Filesystem)'
Filesystem Size Used Avail Use% Mounted on
/dev/vdb 5.0G 3.7M 4.8G 1% /mnt/raid
# echo "this is a test" > /mnt/raid/myfile.txt
# umount /mnt/raid
now we need to find out at which offset in the block device is the string we just wrote
# grep --only-matching --byte-offset --max-count=1 --text "this is a test" /dev/vdc
9616961:this is a test
and now we can overwrite data only in one of the two disks, by writing directly in the block device :
# echo "THIS IS A MESS" | dd of=/dev/vdc bs=1 conv=notrunc seek=9616961
15+0 records in
15+0 records out
15 bytes copied, 0.00108664 s, 13.8 kB/s
remount RAID device:
# mount /dev/vdc /mnt/raid
create many small random files, just to increase data and metadata usage.
#! /bin/bash
for d in {1..50}; do
mkdir dir$( printf %03d "$d") && pushd $_
for n in {0..999}; do
dd if=/dev/zero of=file$( printf %03d "$n" ).bin bs=10 count=$(( RANDOM%1024 )) >& /dev/null
done
popd
done
start scrub:
# btrfs scrub start -B -f /mnt/raid/
The scrubbing status is recorded in /var/lib/btrfs/ in textual files named scrub.status.UUID for a filesystem identified by the given UUID. (Progress state is communicated through a named pipe in file scrub.progress.UUID in the same directory.) The status file is updated every 5 seconds. A resumed scrub will continue from the last saved position.
in the dmesg we can find some details:
[ 9537.259295] BTRFS info (device vdb): scrub: started on devid 2
[ 9537.259631] BTRFS info (device vdb): scrub: started on devid 1
[ 9537.260543] BTRFS info (device vdb): scrub: finished on devid 1 with status: 0
[ 9537.260563] BTRFS warning (device vdb): checksum error at logical 30572544 on dev /dev/vdc, physical 9601024: metadata leaf (level 0) in tree 5
[ 9537.260565] BTRFS warning (device vdb): checksum error at logical 30572544 on dev /dev/vdc, physical 9601024: metadata leaf (level 0) in tree 5
[ 9537.260566] BTRFS error (device vdb): bdev /dev/vdc errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[ 9537.260954] BTRFS error (device vdb): fixed up error at logical 30572544 on dev /dev/vdc
[ 9537.261073] BTRFS info (device vdb): scrub: finished on devid 2 with status: 0
# ls -l /var/lib/btrfs/scrub.status.*
-rw------- 1 root root 723 Oct 7 11:28 /var/lib/btrfs/scrub.status.71589e19-f992-464b-9113-5d5b8a19480d
-rw------- 1 root root 763 Oct 9 15:10 /var/lib/btrfs/scrub.status.ded4e395-487a-4ad1-aee3-a077ea110fb6
# cat /var/lib/btrfs/scrub.status.ded4e395-487a-4ad1-aee3-a077ea110fb6
scrub status:1
ded4e395-487a-4ad1-aee3-a077ea110fb6:1|data_extents_scrubbed:5|tree_extents_scrubbed:8|data_bytes_scrubbed:327680|tree_bytes_scrubbed:131072|read_errors:0|csum_errors:0|verify_errors:0|no_csum:80|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:1372585984|t_start:1696857002|t_resumed:0|duration:0|canceled:0|finished:1
Defrag¶
to test btrfs-defrag we need to create a fragmented filesytem.
This is hard because Btrfs is designed to handle data fragmentation automatically, aiming for optimal performance and data organization.
what we can do is :
- Copy and delete random data
- appending data to existing files
- Create snapshots
this is a long test to run and requires a lot of crunching time. One option is to prepare offline a "fragmented" QCOW2 disk and give it to the test, let's discuss if feasible.
Updated by dzedro about 1 year ago
AFAIK the "test" force_scheduled_tasks was created to force scheduled tasks e.g. btrfsmaintenance and wait for low load after installation,
because in the past this tasks did run randomly and random tests were failing due to "high load".
All you described is nice, but IMO that should go into separate test.
Updated by amanzini about 1 year ago
- Status changed from Feedback to Resolved
yes I agree; so I will close this ticket and open a new https://progress.opensuse.org/issues/137714
Updated by amanzini about 1 year ago
- Related to coordination #137714: [qe-core] proposal for new tests that covers btrfsmaintenance added