Project

General

Profile

Actions

action #137246

closed

[qe-core] test fails in force_scheduled_tasks

Added by amanzini about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Start date:
2023-09-29
Due date:
% Done:

0%

Estimated time:
Difficulty:
Sprint:
QE-Core: September Sprint 23 (Sep 06 - Oct 04)

Description

Observation

openQA test in scenario sle-15-SP1-Server-DVD-Incidents-Minimal-x86_64-qam-minimal-lvm@64bit fails in
force_scheduled_tasks

Test suite description

Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. disk image for grub2_lvm
minimal = base pattern, minimal (enhanced base) pattern are additional convenience paclkages

Reproducible

Fails since (at least) Build :30547:glibc (current job)

Expected result

Last good: :30115:openssl-1_1 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 1 (1 open0 closed)

Related to openQA Tests (public) - coordination #137714: [qe-core] proposal for new tests that covers btrfsmaintenanceBlocked2023-10-12

Actions
Actions #1

Updated by amanzini about 1 year ago

  • Status changed from New to In Progress

First observation, this test fails often waiting for loadavg to "settle".
Maybe we can tweak a bit the definition of "loadavg is low when < 0.10" on newer systems ?

Another point to consider: the test may require up to 19 minutes to run and waits for the system's load average to drop before executing. This might not reflect real-world customer environments.

Actions #2

Updated by amanzini about 1 year ago

did some investigation runs with different RAM size, to see if the time took by this test may vary; of course it can be influenced by the worker resources and instant I/O load at the time the test runs.

QEMURAM=1536 https://openqa.suse.de/tests/12359963 5m 10s
QEMURAM=1536 https://openqa.suse.de/tests/12365819 2m 26s
QEMURAM=1536 https://openqa.suse.de/tests/12372082 3m 2s
QEMURAM=2048 https://openqa.suse.de/tests/12359964 5m 9s
QEMURAM=2048 https://openqa.suse.de/tests/12365820 3m 4s
QEMURAM=2048 https://openqa.suse.de/tests/12372083 3m 25s
QEMURAM=3072 https://openqa.suse.de/tests/12359965 4m 13s
QEMURAM=3072 https://openqa.suse.de/tests/12365821 10m 10s
QEMURAM=3072 https://openqa.suse.de/tests/12372084 3m 2s
QEMURAM=4096 https://openqa.suse.de/tests/12359966 2m 57s
QEMURAM=4096 https://openqa.suse.de/tests/12365822 3m 8s
QEMURAM=4096 https://openqa.suse.de/tests/12372085 3m 31s
QEMURAM=6144 https://openqa.suse.de/tests/12359967 3m 2s
QEMURAM=6144 https://openqa.suse.de/tests/12365823 3m 10s
QEMURAM=6144 https://openqa.suse.de/tests/12372086 5m 14s
QEMURAM=8192 https://openqa.suse.de/tests/12359968 3m 8s
QEMURAM=8192 https://openqa.suse.de/tests/12365824 5m 28s
QEMURAM=8192 https://openqa.suse.de/tests/12372087 6m 37s

looking at the video recording, seems the load can drop up to 0.11 then something else kicks in and bring it up again. So a quick fix to gain leverage here could be to relax the constraint checking for load <=0.11 and evaluate to not fail the test on the first load settling.

as side question, the variable $is_settled is always 1 because when load is not settled, the whole program dies. So what's the point on checking the variable at line 61 ?

Actions #3

Updated by amanzini about 1 year ago

the same test fails often also in a migration test for S390X arch

Actions #4

Updated by amanzini about 1 year ago

@szarate raised interesting point about keeping a low loadavg, expressed in ticket https://bugzilla.opensuse.org/show_bug.cgi?id=1063638 . Basically high load impacted also test results due to lost characters in serial communication. So my proposals are on two sides:

  • being observed only on older system, to relax a bit the < 0.10 constraint to try address just this test flakyness.
  • create a new ticket to improve on dedicated filesystem tests like btrfsmaintenance including a check about system load average, after agreeing with stakeholder what can be a "sane" load average threshold. To make test more realistic, we could also create a "complex" BTRFS situation with quotas, snapshots and un-balanced filesytem on purpose.
Actions #5

Updated by amanzini about 1 year ago

  • Status changed from In Progress to Feedback
Actions #6

Updated by szarate about 1 year ago

  • Sprint set to QE-Core: September Sprint 23 (Sep 06 - Oct 04)
  • Tags set to bugbusters

create a new ticket to improve on dedicated filesystem tests like btrfsmaintenance including a check about system load average, after agreeing with stakeholders what can be a "sane" load average threshold. To make test more realistic, we could also create a "complex" BTRFS situation with quotas, snapshots, and un-balanced filesytem on purpose.

I like where this is going, once the settling of the load part is "managed" I'd love to hear your idea on a more complex btrfs situation with quotas and dirty filesystems, we could be crossing over Kernel territory, I know somebody on @pcervinka's team does some filesystem testing but I can't remember who

Actions #7

Updated by amanzini about 1 year ago

The scope of the new test proposed is to check functionality and system load impact of the btrfsmaintenance package. This is basically a set of scheduled tasks to perform 4 BTRFS maintenance operations:

  • balance
  • defrag
  • scrub
  • trim

I'd declare trim operation out of scope for now, since it involves specific hardware (ssd disks), but we can think about scheduling tests on specific machines.

Balance

This test needs at least 2 extra block devices, which can be separate (virtual) disks or partitions of the same disk.
Create a raid0 device from 1 x 5Gb disks:

 # mkfs.btrfs -d raid0 -m raid0 /dev/vdb 
 # mount /dev/vdb /mnt/raid
 # df -h | grep raid
 /dev/vdb        5.0G  3.4M  4.8G   1% /mnt/raid

check statistics:

 # btrfs filesystem df /mnt/raid 
Data, RAID0: total=512.00MiB, used=64.00KiB
Data, single: total=512.00MiB, used=256.00KiB
System, RAID0: total=8.00MiB, used=0.00B
System, single: total=32.00MiB, used=16.00KiB
Metadata, RAID0: total=256.00MiB, used=112.00KiB
Metadata, single: total=256.00MiB, used=0.00B
GlobalReserve, single: total=3.25MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING:   Data: single, raid0
WARNING:   Metadata: single, raid0
WARNING:   System: single, raid0

create a big binary file

 # time dd if=/dev/random of=/mnt/raid/bigfile.bin bs=4M count=1024
1024+0 records in
1024+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 46.4441 s, 92.5 MB/s

real    0m46.445s
user    0m0.000s
sys 0m45.580s

check statistics to see disk usage

 # btrfs device usage  /mnt/raid/
/dev/vdb, ID: 1
   Device size:             5.00GiB
   Device slack:              0.00B
   Data,single:             3.94GiB
   Data,RAID0/1:          512.00MiB
   Metadata,single:       256.00MiB
   Metadata,RAID0/1:      256.00MiB
   System,single:          32.00MiB
   System,RAID0/1:          8.00MiB
   Unallocated:            21.00MiB

notice the difference between free space reported by 'df' and unallocated space reported by btrfs:

 # df -h | grep -E '(raid|Filesystem)'
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        5.0G  4.1G  470M  90% /mnt/raid
# btrfs filesystem df  /mnt/raid/
Data, RAID0: total=512.00MiB, used=448.50MiB
Data, single: total=3.94GiB, used=3.56GiB
System, RAID0: total=8.00MiB, used=0.00B
System, single: total=32.00MiB, used=16.00KiB
Metadata, RAID0: total=256.00MiB, used=4.33MiB
Metadata, single: total=256.00MiB, used=0.00B
GlobalReserve, single: total=3.80MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING:   Data: single, raid0
WARNING:   Metadata: single, raid0
WARNING:   System: single, raid0

now add a new disk device to the raid0:

 # btrfs device add -f /dev/vdc /mnt/raid/
 # btrfs device usage /mnt/raid/
/dev/vdb, ID: 1
   Device size:             5.00GiB
   Device slack:              0.00B
   Data,single:             3.94GiB
   Data,RAID0/1:          512.00MiB
   Metadata,single:       256.00MiB
   Metadata,RAID0/1:      256.00MiB
   System,single:          32.00MiB
   System,RAID0/1:          8.00MiB
   Unallocated:            21.00MiB

/dev/vdc, ID: 2
   Device size:             5.00GiB
   Device slack:              0.00B
   Unallocated:             5.00GiB

the aggregate filesystem now is larger, but the available space does not reflect the new size, because the filesystem is *unbalanced * ; we need to trigger balance of the filesytem. The next operation is I/O intensive :

# df -h | grep -E '(raid|Filesystem)'
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb         10G  4.1G  490M  90% /mnt/raid

# btrfs balance start --full-balance -v /mnt/raid 

now free space is available:

# df -h | grep raid
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb         10G  3.6G  6.3G  37% /mnt/raid

Here we can either force the balance, or just "start" it and let run in background, using the script in the maintenance package called /usr/share/btrfsmaintenance/btrfs-balance.sh

Scrub

Scrub is a pass over all filesystem data and metadata and verifying the checksums. If a valid copy is available (replicated block group profiles) then the damaged one is repaired. All copies of the replicated profiles are validated.

to test btrfs-scrub the idea is to deliberately create a defective filesystem.

this can be accomplished in many ways, for example

let's start by creating a RAID1 filesystem:

 # mkfs.btrfs -f -d raid1 -m raid1 /dev/vdb /dev/vdc

 # mount /dev/vdb /mnt/raid

 # df -h | grep -E '(raid|Filesystem)'
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        5.0G  3.7M  4.8G   1% /mnt/raid

 # echo "this is a test" > /mnt/raid/myfile.txt
 # umount /mnt/raid

now we need to find out at which offset in the block device is the string we just wrote

 # grep --only-matching --byte-offset --max-count=1 --text "this is a test" /dev/vdc
9616961:this is a test

and now we can overwrite data only in one of the two disks, by writing directly in the block device :

 # echo "THIS IS A MESS" | dd of=/dev/vdc bs=1 conv=notrunc seek=9616961 
15+0 records in
15+0 records out
15 bytes copied, 0.00108664 s, 13.8 kB/s

remount RAID device:

 # mount /dev/vdc /mnt/raid

create many small random files, just to increase data and metadata usage.

 #! /bin/bash
for d in {1..50}; do
  mkdir dir$( printf %03d "$d") && pushd $_
  for n in {0..999}; do
     dd if=/dev/zero of=file$( printf %03d "$n" ).bin bs=10 count=$(( RANDOM%1024 )) >& /dev/null
  done
  popd 
done

start scrub:

 # btrfs scrub start -B -f /mnt/raid/

The scrubbing status is recorded in /var/lib/btrfs/ in textual files named scrub.status.UUID for a filesystem identified by the given UUID. (Progress state is communicated through a named pipe in file scrub.progress.UUID in the same directory.) The status file is updated every 5 seconds. A resumed scrub will continue from the last saved position.

in the dmesg we can find some details:

[ 9537.259295] BTRFS info (device vdb): scrub: started on devid 2
[ 9537.259631] BTRFS info (device vdb): scrub: started on devid 1
[ 9537.260543] BTRFS info (device vdb): scrub: finished on devid 1 with status: 0
[ 9537.260563] BTRFS warning (device vdb): checksum error at logical 30572544 on dev /dev/vdc, physical 9601024: metadata leaf (level 0) in tree 5
[ 9537.260565] BTRFS warning (device vdb): checksum error at logical 30572544 on dev /dev/vdc, physical 9601024: metadata leaf (level 0) in tree 5
[ 9537.260566] BTRFS error (device vdb): bdev /dev/vdc errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[ 9537.260954] BTRFS error (device vdb): fixed up error at logical 30572544 on dev /dev/vdc
[ 9537.261073] BTRFS info (device vdb): scrub: finished on devid 2 with status: 0

 # ls -l /var/lib/btrfs/scrub.status.*
-rw------- 1 root root 723 Oct  7 11:28 /var/lib/btrfs/scrub.status.71589e19-f992-464b-9113-5d5b8a19480d
-rw------- 1 root root 763 Oct  9 15:10 /var/lib/btrfs/scrub.status.ded4e395-487a-4ad1-aee3-a077ea110fb6

 # cat /var/lib/btrfs/scrub.status.ded4e395-487a-4ad1-aee3-a077ea110fb6 
scrub status:1
ded4e395-487a-4ad1-aee3-a077ea110fb6:1|data_extents_scrubbed:5|tree_extents_scrubbed:8|data_bytes_scrubbed:327680|tree_bytes_scrubbed:131072|read_errors:0|csum_errors:0|verify_errors:0|no_csum:80|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:1372585984|t_start:1696857002|t_resumed:0|duration:0|canceled:0|finished:1

Defrag

to test btrfs-defrag we need to create a fragmented filesytem.
This is hard because Btrfs is designed to handle data fragmentation automatically, aiming for optimal performance and data organization.

what we can do is :

  • Copy and delete random data
  • appending data to existing files
  • Create snapshots

this is a long test to run and requires a lot of crunching time. One option is to prepare offline a "fragmented" QCOW2 disk and give it to the test, let's discuss if feasible.

Actions #8

Updated by dzedro about 1 year ago

AFAIK the "test" force_scheduled_tasks was created to force scheduled tasks e.g. btrfsmaintenance and wait for low load after installation,
because in the past this tasks did run randomly and random tests were failing due to "high load".
All you described is nice, but IMO that should go into separate test.

Actions #9

Updated by amanzini about 1 year ago

  • Status changed from Feedback to Resolved

yes I agree; so I will close this ticket and open a new https://progress.opensuse.org/issues/137714

Actions #10

Updated by amanzini about 1 year ago

  • Related to coordination #137714: [qe-core] proposal for new tests that covers btrfsmaintenance added
Actions

Also available in: Atom PDF