Project

General

Profile

Actions

action #177973

open

openQA Project (public) - coordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6

Dying disk on qa-power8-3: Needs replacement?

Added by gpathak 4 days ago. Updated 4 days ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Organisational
Start date:
2024-11-14
Due date:
% Done:

0%

Estimated time:

Description

Motivation

While working on #169939 @gpathak observed messages in boot logs

[   28.432972][    C0] ipr 0001:08:00.0: 8150: Permanent IOA failure
[   28.432986][    C0] ipr: 00000000: 04448200 13512400 FFFFFFFF 103034F0
...
[   28.433297][    C0] ipr: 000003B0: 0040EF00 00A27DD0 14411245 EF000014
[   28.433302][    C0] ipr: 000003C0: 000000B0 00A27DD0 144111C3 CE000000
[   28.433307][    C0] ipr: 000003D0: 49434F4D 57414954 14410EB6 CE000000
[   28.433354][    C0] ipr 0001:08:00.0: FFF4: Disk device problem
[   28.433360][    C0] ipr: -----Failing Device Information-----
[   28.433364][    C0] ipr: World Wide Unique ID: 5000CCA01D06CF5C0000000000000000
[   28.433370][    C0] ipr: Device Resource Path: 00-03
[   28.433374][    C0] ipr: Primary Problem Description: Device detected hardware error 
[   28.433379][    C0] ipr: Secondary Problem Description:  Status Check                   
[   28.433384][    C0] ipr: SCSI Sense Data:
[   28.433387][    C0] ipr: 00000000: 70000400 00000018 00000000 44000000
[   28.433393][    C0] ipr: 00000010: 00000000 F4400000 00000000 00000000
[   28.433398][    C0] ipr: SCSI Command Descriptor Block: 
[   28.433402][    C0] ipr: 00000000: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
[   28.433407][    C0] ipr: Additional IOA Data:
[   28.433411][    C0] ipr: 00000000: 455300CC 07B00007 00000000 84000030
[   28.433416][    C0] ipr: 00000010: 00000000 00000000 0B7EDFC0 00000000
[   28.433421][    C0] ipr: 00000020: 00000000 0B7ED8A0 C8008000 00000000
[   28.433427][    C0] ipr: 00000030: 00000000 00000000 00000480 8F000000
[   28.433432][    C0] ipr: 00000040: 001F9D1B 00000000 00000000 00000000
...
[   28.433505][    C0] ipr: 00000120: 43490018 00000002 0003FFFF FFFFFFFF
[   28.433510][    C0] ipr: 00000130: 5000CCA0 1D06CF5D 00001770 545209C0
[  129.774662][   T11] sd 0:0:3:0: [sdc] Asking for cache data failed

Doing some online search, stumbled upon this IBM website
Which indicates some issue with existing hard disk or loose cable?

Acceptance criteria

Suggestions


Files

Crash-Log.7z (2.07 MB) Crash-Log.7z gpathak, 2025-01-23 05:17
Crash-Log.tar.gz (4.37 MB) Crash-Log.tar.gz gpathak, 2025-01-23 09:39
qa-power8-softlockup-2.log (112 KB) qa-power8-softlockup-2.log gpathak, 2025-01-24 11:29
qa-power8-3-kernel-error.log (94.2 KB) qa-power8-3-kernel-error.log gpathak, 2025-02-15 08:43
clipboard-202502271847-zjmq6.png (30.6 KB) clipboard-202502271847-zjmq6.png gpathak, 2025-02-27 13:17

Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure (public) - action #169939: Upgrade Power8 o3 workers to openSUSE Leap 15.6 size:MResolvedgpathak2024-11-14

Actions
Actions

Also available in: Atom PDF