https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842018-03-20T14:56:13ZopenSUSE Project Management ToolopenQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=1039842018-03-20T14:56:13Zdasantiagodasantiago@suse.com
<ul></ul><p>Does this happens in some specific part of the test, like on a restart or it's always 100% on CPU?</p>
openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=1039992018-03-20T15:21:39Zmichalnowakmnowak@suse.com
<ul></ul><p>On both Xen HVM & Hyper-V it happens at the end of <code>bootloader_svirt</code> / at the beginning of <code>bootloader_uefi</code>. On that boundary is switch from <code>svirt</code> to <code>sut</code> console.</p>
openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=1040112018-03-20T15:37:14Zdasantiagodasantiago@suse.com
<ul></ul><p>michalnowak wrote:</p>
<blockquote>
<p>On both Xen HVM & Hyper-V it happens at the end of <code>bootloader_svirt</code> / at the beginning of <code>bootloader_uefi</code>. On that boundary is switch from <code>svirt</code> to <code>sut</code> console.</p>
</blockquote>
<p>Then, it looks like it's because of the polling of the serial console... Don't you agree? Or the CPU usage don't estabilize after that?</p>
openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=1040172018-03-20T15:54:12Zmichalnowakmnowak@suse.com
<ul></ul><p>Looks like this, tracked it to <code>define_and_start</code>.</p>
openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=1139862018-04-19T10:33:22Zmichalnowakmnowak@suse.com
<ul></ul><p>Perhaps the 100% CPU utilization harms the shared believe that two svirt worker can replace one qemu worker? It still should be true for disk IO, but CPU time is probably affected greatly. Also running more than two svirt jobs on laptop makes the fan go crazy.</p>
openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=2136232019-05-21T11:46:03Zcoolocoolo@suse.com
<ul></ul><p>this is still the case, right?</p>
openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=2138422019-05-22T07:14:28Zmichalnowakmnowak@suse.com
<ul></ul><p>Yes, it is.</p>
openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=2138542019-05-22T07:25:30Zcoolocoolo@suse.com
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li><li><strong>Target version</strong> set to <i>Ready</i></li></ul> openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=2223262019-06-20T15:30:12Zokurzokurz@suse.com
<ul><li><strong>Category</strong> changed from <i>132</i> to <i>Feature requests</i></li></ul> openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=2502562019-10-15T13:05:51Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>mkittler</i></li><li><strong>Target version</strong> changed from <i>Ready</i> to <i>Current Sprint</i></li></ul><p><a class="user active user-mention" href="https://progress.opensuse.org/users/24986">@dasantiago</a> is right. I've added some debug printing in the relevant functions in <code>baseclass.pm</code> to confirm the theory. There is also already a related warning visible in the log:</p>
<pre><code>alling Net::SSH2::Channel::readline in non-blocking mode is usually a programming error at /hdd/openqa-devel/repos/os-autoinst/backend/baseclass.pm line 1225.
</code></pre>
<p>It likely can't be made blocking without impairing the backend's responsiveness. I have to dig into the backend code to find a solution. It might not be trivial.</p>
openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=2504392019-10-15T14:46:29Zmkittlermarius.kittler@suse.com
<ul></ul><p>The code actually uses <code>IO::Select</code> to only read from the SSH channel when the underlying socket is ready to read. But apparently that's not sufficient. The socket appears to be always ready to read although reading from the SSH channel mostly results in the error "operation would block".</p>
<p>I changed the code from reading line by line to use <a href="https://metacpan.org/pod/Net::SSH2::Channel#read2-(-[max_size]-)" class="external">Net::SSH2::Channel::read2</a> so the extended data would be consumed as well. However, that doesn't change a thing.</p>
<p>So I'm not sure how to integrate Net::SSH2::Channel into our async processing.</p>
openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=2514982019-10-21T09:17:42Zcoolocoolo@suse.com
<ul><li><strong>Category</strong> changed from <i>Feature requests</i> to <i>Regressions/Crashes</i></li></ul> openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=2516152019-10-21T15:09:50Zmkittlermarius.kittler@suse.com
<ul></ul><p>Apparently the SSH socket was just passed to the write FDs for <code>IO::Select</code>. This PR attempts to fix it: <a href="https://github.com/os-autoinst/os-autoinst/pull/1239" class="external">https://github.com/os-autoinst/os-autoinst/pull/1239</a></p>
<p>It actually decreases the CPU usage to almost nothing. However, it seems to break other things (or it is just my local setup).</p>
openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=2528692019-10-25T12:39:24Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul><p>The PR has been merged but not deployed on all relevant production workers.</p>
openQA Project - action #33529: isotovideo: backend takes 100 % of CPU when driving svirt jobhttps://progress.opensuse.org/issues/33529?journal_id=2557112019-11-07T17:06:08Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li><li><strong>Target version</strong> changed from <i>Current Sprint</i> to <i>Done</i></li></ul><p>I've just had a look at the CPU usage on openqaworker2. It runs a few svirt jobs but none of the cores is constantly busy.</p>
<p>The change likely caused a regression. There's <a href="https://progress.opensuse.org/issues/59190" class="external">another ticket for it</a> so I'll close this one.</p>