Actions
action #177973
openopenQA Project (public) - coordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6
Dying disk on qa-power8-3: Needs replacement?
Status:
New
Priority:
Low
Assignee:
-
Category:
Organisational
Target version:
Start date:
2024-11-14
Due date:
% Done:
0%
Estimated time:
Description
Motivation¶
While working on #169939 @gpathak observed messages in boot logs
[ 28.432972][ C0] ipr 0001:08:00.0: 8150: Permanent IOA failure
[ 28.432986][ C0] ipr: 00000000: 04448200 13512400 FFFFFFFF 103034F0
...
[ 28.433297][ C0] ipr: 000003B0: 0040EF00 00A27DD0 14411245 EF000014
[ 28.433302][ C0] ipr: 000003C0: 000000B0 00A27DD0 144111C3 CE000000
[ 28.433307][ C0] ipr: 000003D0: 49434F4D 57414954 14410EB6 CE000000
[ 28.433354][ C0] ipr 0001:08:00.0: FFF4: Disk device problem
[ 28.433360][ C0] ipr: -----Failing Device Information-----
[ 28.433364][ C0] ipr: World Wide Unique ID: 5000CCA01D06CF5C0000000000000000
[ 28.433370][ C0] ipr: Device Resource Path: 00-03
[ 28.433374][ C0] ipr: Primary Problem Description: Device detected hardware error
[ 28.433379][ C0] ipr: Secondary Problem Description: Status Check
[ 28.433384][ C0] ipr: SCSI Sense Data:
[ 28.433387][ C0] ipr: 00000000: 70000400 00000018 00000000 44000000
[ 28.433393][ C0] ipr: 00000010: 00000000 F4400000 00000000 00000000
[ 28.433398][ C0] ipr: SCSI Command Descriptor Block:
[ 28.433402][ C0] ipr: 00000000: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
[ 28.433407][ C0] ipr: Additional IOA Data:
[ 28.433411][ C0] ipr: 00000000: 455300CC 07B00007 00000000 84000030
[ 28.433416][ C0] ipr: 00000010: 00000000 00000000 0B7EDFC0 00000000
[ 28.433421][ C0] ipr: 00000020: 00000000 0B7ED8A0 C8008000 00000000
[ 28.433427][ C0] ipr: 00000030: 00000000 00000000 00000480 8F000000
[ 28.433432][ C0] ipr: 00000040: 001F9D1B 00000000 00000000 00000000
...
[ 28.433505][ C0] ipr: 00000120: 43490018 00000002 0003FFFF FFFFFFFF
[ 28.433510][ C0] ipr: 00000130: 5000CCA0 1D06CF5D 00001770 545209C0
[ 129.774662][ T11] sd 0:0:3:0: [sdc] Asking for cache data failed
Doing some online search, stumbled upon this IBM website
Which indicates some issue with existing hard disk or loose cable?
Acceptance criteria¶
Suggestions¶
Files
Updated by gpathak 4 days ago
- Copied from action #169939: Upgrade Power8 o3 workers to openSUSE Leap 15.6 size:M added
Updated by gpathak 3 days ago
okurz wrote in #note-12:
It sure sounds like broken hardware but did those errors only appear in a newer, unstable kernel?
Ohh, I totally missed this point. Actually, the crash wasn't happening on older kernel and that's why I never checked for this message in Boot logs.
Could it be due to a changed SCSI firmware in newer kernel/OS releases?
Actions