https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842019-11-28T16:09:14ZopenSUSE Project Management ToolopenQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2606182019-11-28T16:09:14Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-6 priority-high2 closed" href="/issues/59267">action #59267</a>: test incompletes in installation_overview - when trying to switch to root-ssh?</i> added</li></ul> openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2606242019-11-28T16:10:05Zokurzokurz@suse.com
<ul><li><strong>Project</strong> changed from <i>openQA Tests</i> to <i>openQA Project</i></li><li><strong>Subject</strong> changed from <i>test fails in login_console</i> to <i>Better feedback for "The console .* is not responding (half-open socket?)"</i></li><li><strong>Category</strong> changed from <i>Bugs in existing tests</i> to <i>Feature requests</i></li></ul> openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2607232019-11-29T08:41:09Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>Better feedback for "The console .* is not responding (half-open socket?)"</i> to <i>Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of "svirt serial: unable to read: transport read (error code: -43)"</i></li><li><strong>Category</strong> changed from <i>Feature requests</i> to <i>Regressions/Crashes</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>Workable</i></li><li><strong>Assignee</strong> set to <i>cfconrad</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>Urgent</i></li></ul><p><a class="user active user-mention" href="https://progress.opensuse.org/users/30028">@cfconrad</a> I have the suspicion that <a href="https://github.com/os-autoinst/os-autoinst/pull/1166" class="external">https://github.com/os-autoinst/os-autoinst/pull/1166</a> caused this issue with the super-big logfiles as this seems to occur more often now. Can you please look into this as soon as possible? We can also revert your change again for now if you prefer.</p>
<p>Btw, I found this issue while monitoring incompletes. The output of daily monitoring and alerting jobs can be seen in e.g. <a href="https://gitlab.suse.de/openqa/auto-review/-/jobs/144344" class="external">https://gitlab.suse.de/openqa/auto-review/-/jobs/144344</a></p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2607292019-11-29T08:52:34Zcfconradcfamullaconrad@suse.com
<ul></ul><p>ok, I'm on it.</p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2607382019-11-29T09:54:15Zcfconradcfamullaconrad@suse.com
<ul></ul><p>I don't understand our <code>check_ssh_serial()</code> error handling.</p>
<p>When I read this <a href="https://www.libssh2.org/libssh2_channel_read_ex.html" class="external">https://www.libssh2.org/libssh2_channel_read_ex.html</a> which is called in <a href="https://github.com/rkitover/net-ssh2/blob/master/SSH2.xs#L1854" class="external">https://github.com/rkitover/net-ssh2/blob/master/SSH2.xs#L1854</a> , which looks like the read implementation of NET::SSH2, we need special handling for LIBSSH2_ERROR_EAGAIN but every other error on <code>read()</code> is like a connection lost.</p>
<p>I'm not sure, why we do not handle connection lost here properly?</p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2607982019-11-29T12:42:28Zcfconradcfamullaconrad@suse.com
<ul></ul><p>Agreed with okurz to use Worker <code>grenache-1:10</code> for debugging.</p>
<pre><code>systemctl mask openqa-worker@10
systemctl stop openqa-worker@10
</code></pre>
<p>Next time I would try with <code>--now --runtime</code></p>
<p>This job was "canceled" when stopping the worker <a href="https://openqa.suse.de/tests/3641819" class="external">https://openqa.suse.de/tests/3641819</a>.</p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2608072019-11-29T13:46:03Zcfconradcfamullaconrad@suse.com
<ul></ul><p>As one run is quite time consuming I got a second ipmi host (tails)</p>
<pre><code>ssh grenache-1 -- systemctl mask --now --runtime openqa-worker@9
</code></pre> openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2608402019-11-29T16:47:12Zmkittlermarius.kittler@suse.com
<ul></ul><p><a class="user active user-mention" href="https://progress.opensuse.org/users/30028">@cfconrad</a> I'm the last who touched the error handling within that function to prevent the svirt backend's high CPU usage. Before my changes there was not error handling at all so I thought it would be a good idea to log the errors at least. Apparently that's not sufficient and even made things worse considering the huge log files we get now. It would likely be the best to remove the file descriptor from the select to give up on reading from that connection. We could also try to reconnect if that makes sense.</p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2608492019-11-30T05:44:20Zcfconradcfamullaconrad@suse.com
<ul></ul><p><a class="user active user-mention" href="https://progress.opensuse.org/users/22072">@mkittler</a> yes, I think this will be a good idea.</p>
<p>I'm able to reproduce this behavior in my environment. Currently it was just 1 of 3 runs, while one run cost +2h :/</p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2612182019-12-02T11:31:17Zcfconradcfamullaconrad@suse.com
<ul><li><strong>Status</strong> changed from <i>Workable</i> to <i>In Progress</i></li></ul><p><a href="https://github.com/os-autoinst/os-autoinst/pull/1298" class="external">https://github.com/os-autoinst/os-autoinst/pull/1298</a></p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2614522019-12-03T10:12:20Zcfconradcfamullaconrad@suse.com
<ul></ul><p>Enabled tails, as there are waiting jobs for him:</p>
<pre><code>ssh grenache-1 -- sudo systemctl unmask --runtime openqa-worker@9
</code></pre>
<p>Keep <code>grenache-1:10</code> for some more debugging.</p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2616772019-12-04T08:14:22Zwaynechen55wchen@suse.com
<ul></ul><p>cfconrad wrote:</p>
<blockquote>
<p>Enabled tails, as there are waiting jobs for him:</p>
<pre><code>ssh grenache-1 -- sudo systemctl unmask --runtime openqa-worker@9
</code></pre>
<p>Keep <code>grenache-1:10</code> for some more debugging.</p>
</blockquote>
<p>Anything special to grenache-1:10 ? Could you help share more info about this grenache-1:10 ?</p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2618242019-12-04T15:17:26Zcfconradcfamullaconrad@suse.com
<ul></ul><p>No there is nothing special with <code>grenache-1:10</code>, but when I block this worker, I'm able to use the (ipmi config)[[https://gitlab.suse.de/openqa/salt-pillars-openqa/blob/master/openqa/workerconf.sls#L585]] in my worker.<br>
And then I'm actually able to tests this.</p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2621212019-12-05T09:25:12Zokurzokurz@suse.com
<ul></ul><p><a href="https://github.com/os-autoinst/os-autoinst/pull/1298" class="external">https://github.com/os-autoinst/os-autoinst/pull/1298</a> is deployed to both o3 and osd now and it seems we do not have this problem showing up in logs anymore. Unless you plan further work here please resolve.</p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2621392019-12-05T09:36:44Zokurzokurz@suse.com
<ul></ul><p>however we seem to have a follow-up problem which I assume to be related to your work as you are the only one working on consoles recently: There are some incompletes , e.g. <a href="https://openqa.suse.de/tests/3665709/file/autoinst-log.txt" class="external">https://openqa.suse.de/tests/3665709/file/autoinst-log.txt</a> showing:</p>
<pre><code>[2019-12-05T10:21:37.567 CET] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/kernel/boot_ltp.pm:45 called opensusebasetest::select_serial_terminal
[2019-12-05T10:21:37.567 CET] [debug] <<< testapi::type_string(text='
')
[2019-12-05T10:21:37.568 CET] [debug] <<< consoles::serial_screen::type_string(cmd='backend_type_string', json_cmd_token='twSkAPBc', text='
')
[2019-12-05T10:21:37.568 CET] [debug] Backend process died, backend errors are reported below in the following lines:
Can't use an undefined value as a symbol reference at /usr/lib/os-autoinst/consoles/serial_screen.pm line 106.
</code></pre>
<p>as the ticket is still "Urgent" this also applies here :)</p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2621452019-12-05T09:39:09Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of "svirt serial: unable to read: transport read (error code: -43)"</i> to <i>"(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log message</i></li></ul> openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2623312019-12-05T12:42:36Zcfconradcfamullaconrad@suse.com
<ul></ul><p>I also have created a ticket to it. As this is actually a regression of <a href="https://progress.opensuse.org/issues/55883" class="external">https://progress.opensuse.org/issues/55883</a> PR: <a href="https://github.com/os-autoinst/os-autoinst/pull/1274" class="external">https://github.com/os-autoinst/os-autoinst/pull/1274</a></p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2623372019-12-05T12:43:00Zcfconradcfamullaconrad@suse.com
<ul></ul><p>The issue is fixed by this: <a href="https://github.com/os-autoinst/os-autoinst/pull/1313" class="external">https://github.com/os-autoinst/os-autoinst/pull/1313</a></p>
openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2623552019-12-05T12:44:50Zcfconradcfamullaconrad@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-5 priority-high3 closed" href="/issues/60725">action #60725</a>: [kernel]Can't use an undefined value as a symbol reference at /usr/lib/os-autoinst/consoles/serial_screen.pm line 106.</i> added</li></ul> openQA Project - action #60416: "(undefined value as a symbol reference.*serial_screen|svirt serial: unable to read)" Better feedback for "The console .* is not responding (half-open socket?)", very big log files with repetition of log messagehttps://progress.opensuse.org/issues/60416?journal_id=2625502019-12-05T15:38:41Zcfconradcfamullaconrad@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>I would set this error now to resolved.<br>
The error from <a href="https://progress.opensuse.org/issues/60416#note-15" class="external">https://progress.opensuse.org/issues/60416#note-15</a> is not related to this. And I will track it in <a href="https://progress.opensuse.org/issues/60725" class="external">https://progress.opensuse.org/issues/60725</a></p>
<p>Enable worker <code>grenache-1:10</code></p>
<pre><code>ssh grenache-1 -- sudo systemctl unmask openqa-worker@10
ssh grenache-1 -- sudo systemctl start openqa-worker@10
</code></pre>