Project

General

Profile

Actions

action #181790

closed

Fix HDD issue on kerosene.qe.nue2.suse.org size:S

Added by dheidler 6 days ago. Updated 3 days ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Start date:
2025-05-05
Due date:
2025-05-20
% Done:

0%

Estimated time:
Tags:

Description

Observation

The /dev/sda (which is a hw raid0) is breaking the ext4 fs often resulting in failing tests (likely due to hw failure).
That disk array holds /var/lib/openqa

Acceptance criteria

  • AC1: openQA tests consistently pass on kerosene without giving indications of corrupted filesystem problems

Suggestions

  • Confirm if the disks work properly (SMART)
  • Check the yellow thingie for error messages
  • Consider reformatting disks to a journaled filesystem
  • Setup btrfs RAID5 - okurz strongly suggests to not use RAID5 but a combination of RAID0 and RAID1
Actions #1

Updated by dheidler 6 days ago

  • Description updated (diff)
Actions #2

Updated by favogt 6 days ago

We formatted one drive from the previous RAID as JBOD to access smart data and are using that directly for /var/lib/openqa ATM.

Note that /var/lib/openqa/ssh/ contains an ssh key + config for connecting to ariel for rsync.

Actions #3

Updated by dheidler 6 days ago

Running badblocks -w on the remaining disks.

Actions #4

Updated by openqa_review 5 days ago

  • Due date set to 2025-05-20

Setting due date based on mean cycle time of SUSE QE Tools

Actions #5

Updated by dheidler 5 days ago

Badblocks didn't return errors for sdc-sdg.
The root fs (seems unaffected) is sdb and sda is the remaining disk currently used as described in https://progress.opensuse.org/projects/openqa-infrastructure/activity?from=2025-05-05

Actions #6

Updated by dheidler 5 days ago

  • Tags set to infra
Actions #7

Updated by dheidler 5 days ago

Copied /var/lib/openqa to sdc and testing now sda with badblocks -wsv.
Also testing sdd-sdg again with that cmd.

Actions #8

Updated by okurz 5 days ago

  • Subject changed from Fix HDD issue on kerosene.qe.nue2.suse.org to Fix HDD issue on kerosene.qe.nue2.suse.org size:S
  • Description updated (diff)
Actions #9

Updated by dheidler 4 days ago

No errors found using badblocks.

Creating a btrfs raid10 over all the storage disks now.

mkfs.btrfs -d raid10 -m raid10 /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
Actions #10

Updated by dheidler 4 days ago

  • Status changed from In Progress to Resolved
Actions #11

Updated by livdywan 3 days ago

dheidler wrote in #note-10:

Test seems to run fine: https://openqa.opensuse.org/tests/5046470

Nice.

Did you confirm if errors were visible/gone in the UI or smart?

Actions

Also available in: Atom PDF