action #47858
closed[u] test fails in first_boot - pflash overlay deleted causing: mkdir vm-snapshots: Structure needs cleaning
0%
Description
Observation¶
Some jobs are incomplete due to problems related to the pflash apparently:
[2019-02-13T12:33:17.767 UTC] [debug] Saving snapshot (Current VM state is running).
[2019-02-13T12:33:17.818 UTC] [debug] EVENT {"event":"STOP","timestamp":{"microseconds":818501,"seconds":1550061197}}
[2019-02-13T12:33:17.828 UTC] [debug] blockdev-snapshot-sync({'arguments' => {'format' => 'qcow2','node-name' => 'hd0','snapshot-file' => '/var/lib/openqa/pool/16/raid/hd0-overlay1','snapshot-node-name' => 'hd0-overlay1'},'execute' => 'blockdev-snapshot-sync'}) -> {'return' => {}}
[2019-02-13T12:33:17.837 UTC] [debug] blockdev-snapshot-sync({'arguments' => {'format' => 'qcow2','node-name' => 'cd0-overlay0','snapshot-file' => '/var/lib/openqa/pool/16/raid/cd0-overlay1','snapshot-node-name' => 'cd0-overlay1'},'execute' => 'blockdev-snapshot-sync'}) -> {'return' => {}}
[2019-02-13T12:33:17.842 UTC] [debug] blockdev-snapshot-sync({'arguments' => {'format' => 'qcow2','node-name' => 'pflash-code-overlay0','snapshot-file' => '/var/lib/openqa/pool/16/raid/pflash-code-overlay1','snapshot-node-name' => 'pflash-code-overlay1'},'execute' => 'blockdev-snapshot-sync'}) -> {'error' => {'class' => 'GenericError','desc' => 'Cannot find device= nor node_name=pflash-code-overlay0'}}
[2019-02-13T12:33:17.851 UTC] [debug] blockdev-snapshot-sync({'arguments' => {'device' => 'pflash-code-overlay0','format' => 'qcow2','snapshot-file' => '/var/lib/openqa/pool/16/raid/pflash-code-overlay1','snapshot-node-name' => 'pflash-code-overlay1'},'execute' => 'blockdev-snapshot-sync'}) -> {'return' => {}}
[2019-02-13T12:33:17.856 UTC] [debug] blockdev-snapshot-sync({'arguments' => {'format' => 'qcow2','node-name' => 'pflash-vars-overlay0','snapshot-file' => '/var/lib/openqa/pool/16/raid/pflash-vars-overlay1','snapshot-node-name' => 'pflash-vars-overlay1'},'execute' => 'blockdev-snapshot-sync'}) -> {'error' => {'class' => 'GenericError','desc' => 'Cannot find device= nor node_name=pflash-vars-overlay0'}}
[2019-02-13T12:33:17.865 UTC] [debug] blockdev-snapshot-sync({'arguments' => {'device' => 'pflash-vars-overlay0','format' => 'qcow2','snapshot-file' => '/var/lib/openqa/pool/16/raid/pflash-vars-overlay1','snapshot-node-name' => 'pflash-vars-overlay1'},'execute' => 'blockdev-snapshot-sync'}) -> {'return' => {}}
[2019-02-13T12:33:17.867 UTC] [debug] Backend process died, backend errors are reported below in the following lines:
mkdir vm-snapshots: Structure needs cleaning at /usr/lib/os-autoinst/backend/qemu.pm line 413.
Reproducible¶
It's sporadic:
Updated by szarate about 5 years ago
- Assignee set to rpalethorpe
I wonder if Richie can shed some light here
Updated by rpalethorpe about 5 years ago
- Status changed from New to In Progress
- Assignee changed from rpalethorpe to szarate
The inner error message appears to be from the file system:
https://unix.stackexchange.com/questions/330742/cannot-remove-file-structure-needs-cleaning#330767
So the directory/inode is being corrupted maybe.
Looks like the worker has this problem on every job, but other workers don't. It always happens with the pflash overlay, but that might just be because it is always the first overlay QEMU tries to access. So it is probably an issue with the worker's file system.
Updated by szarate about 5 years ago
- Project changed from openQA Project to openQA Infrastructure
- Subject changed from test fails in first_boot - pflash overlay deleted causing: mkdir vm-snapshots: Structure needs cleaning to [u] test fails in first_boot - pflash overlay deleted causing: mkdir vm-snapshots: Structure needs cleaning
- Status changed from In Progress to Workable
- Assignee deleted (
szarate)
Moving to infraestructure for the time being, addin tag to keep it under radar :), I guess it will become annoying in the short future
Updated by okurz about 5 years ago
- Target version set to Milestone 24
"under radar"? Wouldn't that imply we don't see it? ;)
Updated by okurz about 5 years ago
- Status changed from Workable to Rejected
- Assignee set to okurz
In the same scenario I have seen an incomplete in two consecutive jobs but not later: https://openqa.suse.de/tests/latest?distri=sle&arch=aarch64&flavor=Installer-DVD&machine=aarch64&version=15-SP1&test=allmodules%2Ballpatterns#next_previous . We already have many more jobs past that that look ok again. I assume for now that we do not need to do anything. I am not aware of this same error appearing anywhere else as well so I assume we are good to call it "Rejected" for now as the problem does not appear again so far and we did not do anything. Please reopen if you see it again. @rpalethorpe thanks for the investigation help and the helpful explanation.
Updated by okurz about 5 years ago
- Status changed from Rejected to Workable
- Assignee deleted (
okurz)
happened again https://openqa.suse.de/tests/2534467/file/autoinst-log.txt
Updated by SLindoMansilla about 5 years ago
- Has duplicate action #49583: [arm][fs] cannot access '/var/lib/openqa/pool/16/vm-snapshots': Structure needs cleaning on openqaworker-arm-1 added
Updated by SLindoMansilla about 5 years ago
- Status changed from Workable to Rejected
Resolved in #49583