https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842017-05-31T20:55:08ZopenSUSE Project Management ToolopenQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=520382017-05-31T20:55:08Zokurzokurz@suse.com
<ul></ul><p>As this happened more often recently I disabled the call to save_memory_dump in our test code with <a href="https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/2986" class="external">https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/2986</a> . Please readd when done here.</p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=565022017-07-14T06:15:12Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>qemu "migrate" within testapi::save_memory_dump command never finishes within 2h</i> to <i>[tools]qemu "migrate" within testapi::save_memory_dump command never finishes within 2h</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul><p>blocking us from better investigation in bootup problems, e.g. see <a href="https://openqa.suse.de/tests/1054715#step/first_boot/3" class="external">https://openqa.suse.de/tests/1054715#step/first_boot/3</a></p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=724272017-11-20T08:41:03Zcoolocoolo@suse.com
<ul><li><strong>Target version</strong> set to <i>Ready</i></li></ul><p>I'm afraid I was never a believer in this feature. So if qemu can't dump within 2 hours we have to get rid of this api call alltogether</p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=724422017-11-20T09:03:08Zszarate
<ul></ul><p>I have some love for this feature, however qemu upstream has several bugs related to live migration. (Can't find the bug id in RH)</p>
<p><a class="user active user-mention" href="https://progress.opensuse.org/users/17668">@okurz</a>: I was never a fan of enabling the memory dumps for all tests :) </p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=726462017-11-20T14:48:56Zszarate
<ul><li><strong>Subject</strong> changed from <i>[tools]qemu "migrate" within testapi::save_memory_dump command never finishes within 2h</i> to <i>[tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2h</i></li></ul><p>This is more about investigating and finding out who to poke, which will be most likely agraf and finding out which are the bug # related to this. I doubt that extra work will need to be done on the backend side of things.</p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=740892017-11-23T09:03:42Zszarate
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul><p>So i created a very small PR with this: <a href="https://github.com/os-autoinst/os-autoinst/pull/882" class="external">https://github.com/os-autoinst/os-autoinst/pull/882</a> but it's only covering for qemu's fault here. </p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=757152017-12-01T08:48:08Zszarate
<ul><li><strong>Assignee</strong> set to <i>szarate</i></li></ul> openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=759372017-12-01T14:35:54Zszarate
<ul><li><strong>Target version</strong> changed from <i>Ready</i> to <i>Current Sprint</i></li></ul><p>Carry over to sprint 201712.1, assigning to "Current Sprint".</p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=785442017-12-08T21:55:50Zszarate
<ul></ul><p>So, boo#1072000 and boo#1072008 say two things:</p>
<p>1: For now we can't timeout the memory dumps.<br>
2: Memory dumps cannot be called at any point in time.</p>
<p>PR: <a href="https://github.com/os-autoinst/os-autoinst/pull/892" class="external">https://github.com/os-autoinst/os-autoinst/pull/892</a> has been created to solve the problem generated by boo#1072008 as a workaround but boo#1072000 needs a real fix.</p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=829832018-01-12T14:44:28Zszarate
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Blocked</i></li><li><strong>Target version</strong> changed from <i>Current Sprint</i> to <i>future</i></li></ul><p>So PR has been merged setting this to Blocked per boo#1072000 and boo#1072008</p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=851682018-01-22T15:00:05Zrpalethorperichard.palethorpe@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-4 priority-default closed parent" href="/issues/30649">action #30649</a>: [tools][openqa] Improve performance by using migrations and external snapshots</i> added</li></ul> openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=859692018-01-24T15:35:09Zrpalethorperichard.palethorpe@suse.com
<ul></ul><p>As part of the other ticket I have linked to, I have converted the migration to dump to an fd instead. Once I have applied some other fixes then I will make a PR.</p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=892812018-02-07T13:46:44Zrpalethorperichard.palethorpe@suse.com
<ul></ul><p>The following pull request should help: <a href="https://github.com/os-autoinst/os-autoinst/pull/918">https://github.com/os-autoinst/os-autoinst/pull/918</a>. I have made a number of potential fixes, but I suspect the main issue was that the VM was still running. Doing a live migration probably requires tweaking the CPU throttling and other settings otherwise QEMU will get stuck trying to reach the low water mark before freezing the VM.</p>
<p>I did some profiling of a full test run using the new memory dumper:<br>
CPU cycles:<br>
80% used by xz<br>
11% used by qemu<br>
6% used by OpenQA</p>
<p>Page faults:<br>
40-60% caused by OpenQA, at least 20% of which came from libopencv<br>
20% caused by xz<br>
10% caused by qemu</p>
<p>Disk I/O:<br>
Swapper wrote 600-800MB<br>
xz wrote 11-12MB<br>
qemu wrote 4-5MB</p>
<p>In all cases QEMU is not using many resources. xz uses quite a lot, but this is expected and on CPU limited workers it can be replaced by bzip2. Using internal QEMU migration compression would probably save some disk I/O, but it makes the dumps difficult to read and xz/bzip2 archives are 10-40% smaller. QEMU can also migrate to a socket, so another option would be to read from a socket and compress the data using a library, then write the result to disk. That would be best done in C/C++ because Perl's library support is not so good. Alternatively we could go back to using the exec URI in migrate, but it is not clear if QEMU will report errors from bzip and sh accurately and there are too many pipes involved for my liking. I'm guessing that most of the disk I/O is attributed to swapper because the file system is deferring or combining writes which confuses the reporting.</p>
<p>Interestingly opencv related code is page faulting a lot even though needles were only used during boot and the rest of the test was using serial terminal.</p>
<p>The profiling was done with:<br>
$ sudo perf record -e cycles,faults sudo -u _openqa-worker --preserve-env=QEMU ~/qa/openQA/script/worker --isotovideo ~/qa/os-autoinst/isotovideo<br>
$ perf report --hierarchy</p>
<p>and</p>
<p>$ sudo blktrace -d /dev/sdX<br>
$ blkparse -shi sdX</p>
<p>Where sdX is the drive where /var/lib/openqa/pool is mounted.</p>
<p>It would be nice to attach 'perf record' to QEMU every time it does a snapshot, memory dump, etc. with full stack trace and whatever then detach straight after. This could be coded into os-autoinst, but it might be better to set up some monitoring software which can insert a probe to achieve the same thing without hard coding it.</p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=908202018-02-14T11:14:12Zrpalethorperichard.palethorpe@suse.com
<ul></ul><p>Pull request is now merged. When os-autoinst is deployed we need to look out for compilation failures (because I touched the C part) and increased CPU and drive usage.</p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=994662018-03-09T09:38:42Zrpalethorperichard.palethorpe@suse.com
<ul><li><strong>Status</strong> changed from <i>Blocked</i> to <i>Resolved</i></li></ul><p>I hope this is fixed now, if anyone has seen another failure please reopen and let me know.</p>
openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=1297542018-06-15T19:08:02Zokurzokurz@suse.com
<ul><li><strong>Target version</strong> changed from <i>future</i> to <i>future</i></li></ul> openQA Project - action #19390: [tools][sprint 201711.2] qemu "migrate" within testapi::save_memory_dump command never finishes within 2hhttps://progress.opensuse.org/issues/19390?journal_id=1586872018-10-18T13:05:45Zokurzokurz@suse.com
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-4 status-3 priority-4 priority-default closed" href="/issues/42683">action #42683</a>: [functional][u] Make save_memory_dump work again and re-enable save_memory_dump call in tests/installation/first_boot and other boot modules</i> added</li></ul>