action #55595
closed[cloud][pcm] debug memory dump - placeholder
0%
Description
I guess you know what you are doing in https://openqa.suse.de/tests/latest?flavor=Server-DVD&distri=sle&test=sandbox_cfconrad&machine=64bit&arch=x86_64&version=12-SP5 , I just wanted to create a ticket to use as label to see if I can catch all incompletes. I hope you don't mind :)
Updated by cfconrad over 5 years ago
Thx for creating the ticket.
Motivation:¶
When https://github.com/os-autoinst/os-autoinst/pull/1182 was deployed. We encountered problems that jobs which run backend::qemu:save_memory_dump()
produce incomplete jobs.
The tests shown with the link above, was a try to get passed or even failed jobs when using save_memory_dump()
without PR#1182. Okurz nicely pointed me to https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8202 which I understand that we have general problems with this function.
Logs on error:¶
[2019-08-15T17:34:49.111 CEST] [debug] Memory dump completed.
[2019-08-15T17:34:49.286 CEST] [debug] EVENT {"event":"RESUME","timestamp":{"microseconds":286561,"seconds":1565883289}}
ulogs/save_memory_dump_03-vm-memory-dump: 44.6 MiB / 223.2 MiB = 0.200, 2.4 MiB/s, 1:31
[2019-08-15T17:36:21.848 CEST] [debug] sysread failed:
[2019-08-15T17:36:21.849 CEST] [debug] THERE IS NOTHING TO READ 15 4 3
Current investigation:¶
- It doesn't happen always
- Currently no differences with nested virtualization setup
- The issue is somehow triggered by this
system()
call https://github.com/os-autoinst/os-autoinst/blob/master/backend/qemu.pm#L341- Replaced this call with a
sleep()
of equal size, didn't show the issue - Checked if qmp getfd really passed the fd, it does
- Checked if closefd is needed, but it really isn't
- Checked if file get touched somehow after migration is ready, it's not
- Replaced this call with a
- The fd which fails on sysread was always https://github.com/os-autoinst/os-autoinst/blob/master/isotovideo#L333
Next steps:¶
- Check if failure occur with busy wait
- Verify amount of memory usage during xz
- Check file usage with lsof
- strace isotovideo
- check errno on sysread failed
- ...
Updated by cfconrad over 5 years ago
Deleted the jobs for osd, as they do not have any relevance anymore:
for i in 3253886 3253885 3253884 3252334 3250820 3250810 3250759; do openqa-client --host https://openqa.suse.de jobs/$i delete; done
Updated by cfconrad over 5 years ago
lsof
just before executing xz
shows that qemu still have that file open
qemu-syst 28293 _openqa-worker 106u REG 8,17 89677667 2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28296 qemu-syst _openqa-worker 106u REG 8,17 84762624 2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28298 qemu-syst _openqa-worker 106u REG 8,17 85458695 2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28299 qemu-syst _openqa-worker 106u REG 8,17 86212359 2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28301 qemu-syst _openqa-worker 106u REG 8,17 86937600 2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28354 qemu-syst _openqa-worker 106u REG 8,17 87646051 2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28355 qemu-syst _openqa-worker 106u REG 8,17 88334179 2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28356 qemu-syst _openqa-worker 106u REG 8,17 89055075 2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
Updated by cfconrad over 5 years ago
Looks like it is something more simple. Replacing the the xz call with system('false');
produce also an incomplete job.
Updated by cfconrad over 5 years ago
- Related to action #43631: [tools] Job terminated by a SIGTERM, ending up incomplete, unclear reason for stopping even though test could have looked green so far, "Result: done" added
Updated by cfconrad over 5 years ago
sigh
So the system('false')
thing was just because of autodie ':all'
in qemu.pm.
Maybe we should just replace system(xz)
with simple_run(xz)
.
Updated by okurz over 5 years ago
sorry if that "feature" I introduced some years ago caused you troubles :/ My idea of "autodie" was really more of a "last resort", to not silently skip over errors and end up somewhere even more weirdly. Why not "runcmd" from osutils.pm?
Updated by cfconrad over 5 years ago
no problem, lesson learned :)
And we need to add -Q
to xz
, as xz
returns with 2 if "Something worth a warning occurred, but no actual errors occurred.".
Updated by cfconrad over 5 years ago
Updated by cfconrad over 5 years ago
- Related to action #48671: [opensuse] save_memory_dump make isotovideo to fail added
Updated by cfconrad over 5 years ago
- Status changed from New to Feedback
PR was merged, lets wait for feedback.
Regarding os-autoinst PR#1182 and try to get it in again, I created a separate ticket: https://progress.opensuse.org/issues/55883