https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842019-12-09T09:47:22ZopenSUSE Project Management ToolopenQA Project - action #60815: Broken SSH serial console (again)https://progress.opensuse.org/issues/60815?journal_id=2630302019-12-09T09:47:22Zcoolocoolo@suse.com
<ul></ul><p><code>git log 3973b078..91c754a2</code> is quite a lot, nothing obvious but quite a bit about ssh, serial and rpc</p>
openQA Project - action #60815: Broken SSH serial console (again)https://progress.opensuse.org/issues/60815?journal_id=2630332019-12-09T09:47:49Zcoolocoolo@suse.com
<ul><li><strong>Project</strong> changed from <i>openQA Infrastructure</i> to <i>openQA Project</i></li><li><strong>Subject</strong> changed from <i>[OpenQA] Broken SSH serial console (again)</i> to <i>Broken SSH serial console (again)</i></li><li><strong>Category</strong> set to <i>Regressions/Crashes</i></li><li><strong>Assignee</strong> deleted (<del><i>okurz</i></del>)</li><li><strong>Target version</strong> set to <i>Ready</i></li></ul> openQA Project - action #60815: Broken SSH serial console (again)https://progress.opensuse.org/issues/60815?journal_id=2630362019-12-09T09:55:11Zcoolocoolo@suse.com
<ul></ul><p><a href="https://openqa.suse.de/tests/3680889#step/update_kernel/9" class="external">https://openqa.suse.de/tests/3680889#step/update_kernel/9</a> contains a lot of text and most look fine - and then it ends with</p>
<pre><code>[ OK ]
[ OK ] Reached target Multi-User System.
[ OK ]
Starting Update UTMP about System Runlevel Changes...[ OK ] Started Update UTMP about System Runlevel Changes.
Welcometo SUSE Lux Enterprseerer 1 SP4 s3x) ernl 4.12.14-95.6-default (ttysclp0).
login:
</code></pre>
<p>This looks like 2 processes are writing independently to each other into the console (file).</p>
openQA Project - action #60815: Broken SSH serial console (again)https://progress.opensuse.org/issues/60815?journal_id=2630902019-12-09T10:19:50Zmkittlermarius.kittler@suse.com
<ul></ul><p>I've just had a look at the Git log.</p>
<p><a class="user active user-mention" href="https://progress.opensuse.org/users/30028">@cfconrad</a> Maybe change for stopping the serial console in all error cases (beside <code>LIBSSH2_ERROR_EAGAIN</code>) is wrong after all?</p>
<pre><code> # read from SSH channel (receiving extended data channel as well via `$chan->ext_data('merge')`)
my $chan = $self->{serial_chan};
my $buffer;
- my $bytes_read = $chan->read($buffer, 4096);
- my $could_read_once = defined $bytes_read;
- while (defined $bytes_read) {
+ while (defined(my $bytes_read = $chan->read($buffer, 4096))) {
return 1 unless $bytes_read > 0;
print $buffer;
open(my $serial, '>>', $self->{serialfile});
@@ -1277,9 +1275,10 @@ sub check_ssh_serial {
}
my ($error_code, $error_name, $error_string) = $ssh->error;
- return 1 if $could_read_once && $error_code == LIBSSH2_ERROR_EAGAIN;
+ return 1 if $error_code == LIBSSH2_ERROR_EAGAIN;
- bmwqemu::diag("svirt serial: unable to read: $error_string (error code: $error_code)");
+ bmwqemu::diag("ssh serial: unable to read: $error_string (error code: $error_code) - closing connection");
+ $self->stop_ssh_serial();
return 1;
}
</code></pre>
<p>But <code>ssh serial: unable to read:</code> is not in the log so if that's causing the problem it would be a logic bug before in the code before (which I don't see).</p>
<p>I'm also wondering about this change:</p>
<pre><code>- $self->{socket_fd} = $socket_fd;
+ $self->{fd_read} = $fd_read;
+ $self->{fd_write} = $fd_write // $fd_read;
</code></pre>
<p>Possibly assigning <code>$fd_read</code> to <code>$self->{fd_write}</code> looks wrong. I confused that with <code>$self->{select_read}</code> and <code>$self->{select_write}</code>.</p>
openQA Project - action #60815: Broken SSH serial console (again)https://progress.opensuse.org/issues/60815?journal_id=2631202019-12-09T12:07:57Zcfconradcfamullaconrad@suse.com
<ul></ul><p>I think the issue is cause by <a href="https://github.com/os-autoinst/os-autoinst/pull/1298" class="external">https://github.com/os-autoinst/os-autoinst/pull/1298</a> and fixed with <a href="https://github.com/os-autoinst/os-autoinst/pull/1319" class="external">https://github.com/os-autoinst/os-autoinst/pull/1319</a></p>
openQA Project - action #60815: Broken SSH serial console (again)https://progress.opensuse.org/issues/60815?journal_id=2631472019-12-09T12:51:26Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>cfconrad</i></li><li><strong>Target version</strong> changed from <i>Ready</i> to <i>Current Sprint</i></li></ul><p>Now I'm feeling stupid for not seeing this obvious logic bug because I have actually suspected that there is a bug in exactly that place (see my last comment).</p>
<p><a class="user active user-mention" href="https://progress.opensuse.org/users/30028">@cfconrad</a> I'm assigning you since you've provided the fix.</p>
openQA Project - action #60815: Broken SSH serial console (again)https://progress.opensuse.org/issues/60815?journal_id=2631502019-12-09T13:10:53Zcfconradcfamullaconrad@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul><blockquote>
<p>Now I'm feeling stupid for not seeing this obvious logic bug because I have actually suspected that there is a bug in exactly that place (see my last comment).</p>
</blockquote>
<p>I don't feel better, as I introduced it :/</p>
<p>Put it to Feedback as we waiting for more results from productive instance. PR was merged, thx!</p>
openQA Project - action #60815: Broken SSH serial console (again)https://progress.opensuse.org/issues/60815?journal_id=2632942019-12-10T07:46:41Zokurzokurz@suse.com
<ul></ul><p>I selectively installed the new version of os-autoinst on grenache-1.qa and retriggered the failure from the ticket description as <a href="https://openqa.suse.de/tests/3684656" class="external">https://openqa.suse.de/tests/3684656</a></p>
<p>EDIT: looks good. Have also installed the new os-autoinst on all other workers. We can now retrigger all jobs that failed for the same reason on osd.</p>
openQA Project - action #60815: Broken SSH serial console (again)https://progress.opensuse.org/issues/60815?journal_id=2633062019-12-10T08:11:47Zcfconradcfamullaconrad@suse.com
<ul></ul><p>+1, thx. Let me know if I can do something.</p>
openQA Project - action #60815: Broken SSH serial console (again)https://progress.opensuse.org/issues/60815?journal_id=2633332019-12-10T08:53:11Zokurzokurz@suse.com
<ul></ul><p>retrigger tests that fail because of this problem :)</p>
openQA Project - action #60815: Broken SSH serial console (again)https://progress.opensuse.org/issues/60815?journal_id=2633992019-12-10T10:03:08Zcfconradcfamullaconrad@suse.com
<ul></ul><p>ok, spend some time to walk through the webfrontend to search and retrigger tests.<br>
Didn't note all of them, I think there were some more but these two are in the list:</p>
<p><a href="https://openqa.suse.de/t3684922" class="external">https://openqa.suse.de/t3684922</a> - install_ltp+sle+Server-DVD-Incidents-Kernel => uploading<br>
<a href="https://openqa.suse.de/t3684940" class="external">https://openqa.suse.de/t3684940</a> - install_ltp+sle+Server-DVD-Incidents-Kernel => running</p>
openQA Project - action #60815: Broken SSH serial console (again)https://progress.opensuse.org/issues/60815?journal_id=2638402019-12-11T11:09:04Zcfconradcfamullaconrad@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul>