Project

General

Profile

Actions

action #55595

closed

[cloud][pcm] debug memory dump - placeholder

Added by okurz over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2019-08-15
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

I guess you know what you are doing in https://openqa.suse.de/tests/latest?flavor=Server-DVD&distri=sle&test=sandbox_cfconrad&machine=64bit&arch=x86_64&version=12-SP5 , I just wanted to create a ticket to use as label to see if I can catch all incompletes. I hope you don't mind :)


Related issues 2 (0 open2 closed)

Related to openQA Project - action #43631: [tools] Job terminated by a SIGTERM, ending up incomplete, unclear reason for stopping even though test could have looked green so far, "Result: done"Resolvedokurz2018-11-09

Actions
Related to openQA Tests - action #48671: [opensuse] save_memory_dump make isotovideo to failResolvedokurz2019-03-05

Actions
Actions #1

Updated by cfconrad over 4 years ago

Thx for creating the ticket.

Motivation:

When https://github.com/os-autoinst/os-autoinst/pull/1182 was deployed. We encountered problems that jobs which run backend::qemu:save_memory_dump() produce incomplete jobs.
The tests shown with the link above, was a try to get passed or even failed jobs when using save_memory_dump() without PR#1182. Okurz nicely pointed me to https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8202 which I understand that we have general problems with this function.

Logs on error:

[2019-08-15T17:34:49.111 CEST] [debug] Memory dump completed.
[2019-08-15T17:34:49.286 CEST] [debug] EVENT {"event":"RESUME","timestamp":{"microseconds":286561,"seconds":1565883289}}
ulogs/save_memory_dump_03-vm-memory-dump: 44.6 MiB / 223.2 MiB = 0.200, 2.4 MiB/s, 1:31
[2019-08-15T17:36:21.848 CEST] [debug] sysread failed: 
[2019-08-15T17:36:21.849 CEST] [debug] THERE IS NOTHING TO READ 15 4 3

Current investigation:

Next steps:

  • Check if failure occur with busy wait
  • Verify amount of memory usage during xz
  • Check file usage with lsof
  • strace isotovideo
  • check errno on sysread failed
  • ...
Actions #2

Updated by cfconrad over 4 years ago

Deleted the jobs for osd, as they do not have any relevance anymore:

for i in 3253886 3253885 3253884 3252334 3250820 3250810 3250759; do openqa-client --host https://openqa.suse.de jobs/$i delete; done
Actions #3

Updated by cfconrad over 4 years ago

lsof just before executing xz shows that qemu still have that file open

qemu-syst 28293                 _openqa-worker  106u      REG               8,17   89677667     2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28296 qemu-syst _openqa-worker  106u      REG               8,17   84762624     2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28298 qemu-syst _openqa-worker  106u      REG               8,17   85458695     2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28299 qemu-syst _openqa-worker  106u      REG               8,17   86212359     2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28301 qemu-syst _openqa-worker  106u      REG               8,17   86937600     2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28354 qemu-syst _openqa-worker  106u      REG               8,17   87646051     2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28355 qemu-syst _openqa-worker  106u      REG               8,17   88334179     2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
qemu-syst 28293 28356 qemu-syst _openqa-worker  106u      REG               8,17   89055075     2884914 /var/lib/openqa/pool/1/ulogs/save_memory_dump_01-vm-memory-dump
Actions #4

Updated by cfconrad over 4 years ago

Looks like it is something more simple. Replacing the the xz call with system('false'); produce also an incomplete job.

Actions #5

Updated by cfconrad over 4 years ago

  • Related to action #43631: [tools] Job terminated by a SIGTERM, ending up incomplete, unclear reason for stopping even though test could have looked green so far, "Result: done" added
Actions #6

Updated by cfconrad over 4 years ago

sigh

So the system('false') thing was just because of autodie ':all' in qemu.pm.

Maybe we should just replace system(xz) with simple_run(xz).

Actions #7

Updated by okurz over 4 years ago

sorry if that "feature" I introduced some years ago caused you troubles :/ My idea of "autodie" was really more of a "last resort", to not silently skip over errors and end up somewhere even more weirdly. Why not "runcmd" from osutils.pm?

Actions #8

Updated by cfconrad over 4 years ago

no problem, lesson learned :)

And we need to add -Q to xz, as xz returns with 2 if "Something worth a warning occurred, but no actual errors occurred.".

Actions #10

Updated by cfconrad over 4 years ago

  • Related to action #48671: [opensuse] save_memory_dump make isotovideo to fail added
Actions #11

Updated by cfconrad over 4 years ago

  • Status changed from New to Feedback

PR was merged, lets wait for feedback.

Regarding os-autoinst PR#1182 and try to get it in again, I created a separate ticket: https://progress.opensuse.org/issues/55883

Actions #12

Updated by cfconrad over 4 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF