Actions
action #156460
closedPotential FS corruption on osd due to 2 VMs accessing the same disk
Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2024-03-01
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
Users noticed slowness of osd in https://suse.slack.com/archives/C02CANHLANP/p1709297645213609; openqa-monitor.qa.suse.de also show problem with availability.
Logs on osd shows potential problem with FS
Mar 01 14:29:14 openqa salt-master[25856]: [ERROR ] Unable to remove /var/cache/salt/master/jobs/26/4669e8a06e5502583ba67b138a9c30b97efbfff1f8af0b92f937ad8b70035d: [Errno 117] Structure needs cleaning: '.min>
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #467326: comm salt-master: deleted inode referenced: 467329
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #467326: comm salt-master: deleted inode referenced: 467329
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #428053: comm salt-master: deleted inode referenced: 428056
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #428053: comm salt-master: deleted inode referenced: 428056
Mar 01 14:29:14 openqa salt-master[25856]: [ERROR ] Unable to remove /var/cache/salt/master/jobs/08/96cf9ed4cc58d8c044fe257e5e977516e49383070eea5680e3f8d53fc31712: [Errno 117] Structure needs cleaning: '.min>
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #358221: comm salt-master: deleted inode referenced: 358225
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #358221: comm salt-master: deleted inode referenced: 358225
Mar 01 14:29:14 openqa salt-master[25856]: [ERROR ] Unable to remove /var/cache/salt/master/jobs/eb/8843afe01ce61b501612957cc3df3a3d8371a9c2694ebd800b47d514066853: [Errno 117] Structure needs cleaning: '.min>
Mar 01 14:29:14 openqa openqa-websockets-daemon[15372]: [debug] [pid:15372] Updating seen of worker 1951 from worker_status (free)
There might be a situation where two VMs were running with the same backing device according to https://suse.slack.com/archives/C02CANHLANP/p1709299401351479?thread_ts=1709297645.213609&cid=C02CANHLANP
The server was rebooted to get it to consistent state, but unfortunately due the FS corruption osd is currently in the maintenance mode and needs recovery.
Files
Actions