action #42359
closed[functional][y][sporadic] test fails in gnuhealth_install to write to /dev/ttyS0 "Input/output error" while serial getty seems to restart at the same time?
0%
Description
Observation¶
openQA test in scenario opensuse-15.1-DVD-x86_64-gnuhealth@64bit fails in
gnuhealth_install
trying to write to /dev/ttyS0 with "Input/output error" while serial getty seems to restart at the same time?
https://openqa.opensuse.org/tests/771001/file/gnuhealth_install-journal.log shows that at the time of the screenshot (20:20 - 20:22) the lines:
Oct 10 20:21:00 susetest su[1851]: pam_unix(su:session): session opened for user root by bernhard(uid=1000)
Oct 10 20:21:01 susetest systemd[1]: serial-getty@ttyS0.service: Service hold-off time over, scheduling restart.
Oct 10 20:21:01 susetest systemd[1]: Stopped Serial Getty on ttyS0.
Oct 10 20:21:01 susetest systemd[1]: Started Serial Getty on ttyS0.
Shouldn't the getty process have been masked in before, e.g. by https://openqa.opensuse.org/tests/770994#step/system_prepare/1 ? Neither the parent job creating the image nor the downstream job call "consoletest_setup" which looks ok just judging about the name because we do not want to run any pure "consoletests" in gnuhealth.
Good reading regarding how it works: http://0pointer.de/blog/projects/serial-console.html
Reproducible¶
Fails since (at least) Build 315.1 (current job)
Expected result¶
Last good: 314.2 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by okurz over 5 years ago
- Subject changed from [functional][u][sporadic] test fails in gnuhealth_install to write to /dev/ttyS0 "Input/output error" while serial getty seems to restart at the same time? to [functional][y][sporadic] test fails in gnuhealth_install to write to /dev/ttyS0 "Input/output error" while serial getty seems to restart at the same time?
- Status changed from New to Workable
- Assignee set to riafarov
- Target version set to Milestone 20
@riafarov could you take a look please because you were involved with masking the serial getty services?
Updated by riafarov over 5 years ago
- Assignee changed from riafarov to okurz
@okurz, stopping of serial-getty is done in consoletest_setup, which is not schedule neither in the given test suite, nor in createhdd, so it's not related. So I guess we need to investigate further if it's a product bug. And potentially do this change in system_prepare which doesn't sound 100% right to me.
Updated by okurz over 5 years ago
- Due date set to 2018-11-20
- Assignee deleted (
okurz)
Thank you for your explanation. I am still suspecting that some change changed the test schedule accordingly so let's take a look soon.
Updated by okurz over 5 years ago
- Due date changed from 2018-11-20 to 2018-11-06
- Priority changed from Normal to Urgent
https://openqa.suse.de/tests/2174010#step/consoletest_setup/6 looks like the same symptoms and this is happening already in consoletest_setup. Could it be that we have a product regression here?
Updated by okurz over 5 years ago
https://bugzilla.opensuse.org/show_bug.cgi?id=1112109 reported. Probably we still need a workaround though
Updated by riafarov over 5 years ago
Issue is there for 3 months now: https://openqa.opensuse.org/tests/714854#step/gnuhealth_install/7
Change where we set permissions to serial in the image creation job was introduced on 9-th of August, which doesn't match to that occurrence. Meaning, we can exclude it as a culprit. Investigating further.
Updated by riafarov over 5 years ago
So, I found easy way to reproduce the error.
Start writing to ttyS0 in infinite loop: while true; do echo bb > /dev/ttyS0 ; done
run systemctl stop serial-getty@ttyS0
"Input/output error" is shown for all attempts to write to tty in the loop. Once command stopped and started again, it works fine again.
Updated by riafarov over 5 years ago
- Priority changed from Urgent to High
So I've updated the bug with my findings, it's a regression, hopefully we'll get some feedback there. As of now, most popular solution on web is to disable getty service. Problem with this approach is that we rely on Welcome messages to detect that system has booted up. Also, for SLE 15 we failed before reached the step where we stop serial-getty service.
After discussion with @okurz, we came up with 2 potential scalable solutions:
- Static: Always disable serial-getty before writing to serial, use ssh to detect that system booted up
- Dynamic: Use same approach as for activating ttys, but for serial devices. Before writing to serial device, trigger some activation method. In our case it should set permissions and handle serial-getty service. Potentially, more steps.
Other options are hacks, which we will try to avoid as issue is not new and is not that critical yet.
Updated by riafarov over 5 years ago
- Status changed from Workable to Feedback
Updated by riafarov over 5 years ago
@zluo has setup where he can reproduce this issue all the time, on Monday I will reuse it to figure out better solution to work the bug around.
Updated by zluo over 5 years ago
http://e13.suse.de/tests/9807 shows successful test run. but other test runs were failed...
Updated by riafarov over 5 years ago
- Due date changed from 2018-11-06 to 2018-11-20
https://openqa.opensuse.org/tests/788063#step/gnuhealth_install/6 still shows same issue, but not X11.
Updated by okurz over 5 years ago
- Related to action #39575: [functional][u] Split consoletest_setup to smaller parts which serve single purpose per module added
Updated by okurz over 5 years ago
https://openqa.opensuse.org/tests/788063#step/gnuhealth_install/5 is the most recent failure. The fix was introduced to console/consoletest_setup and that module is not triggered in the gnucash scenario. In #39575 we try to further split consoletest_setup which should allow us then to find a better spot where to trigger the serial port handling.
Updated by riafarov over 5 years ago
- Status changed from Feedback to In Progress
Updated by riafarov over 5 years ago
- Status changed from In Progress to Feedback
Updated by osukup over 5 years ago
looks like this breaks https://openqa.suse.de/tests/2263786#step/updates_packagekit_gpk/12 -- on SLE12GA and SLE12-SP1
Updated by riafarov over 5 years ago
- Due date changed from 2018-11-20 to 2018-12-04
Change was reverted, need to reduce scope when the workaround is applied.
Updated by riafarov over 5 years ago
- Status changed from Feedback to In Progress
Updated by riafarov over 5 years ago
- Status changed from In Progress to Feedback
Updated by szarate over 5 years ago
Keeps happening: https://openqa.opensuse.org/tests/802983#comments
Updated by okurz over 5 years ago
- Target version changed from Milestone 20 to Milestone 21
Updated by riafarov over 5 years ago
Yes, PR still not merged. Also I don't think that we should label job with progress ticket, because it's a workaround for bug.
Updated by riafarov over 5 years ago
- Due date changed from 2018-12-04 to 2018-12-18
Updated by okurz over 5 years ago
- Blocks action #44186: [functional][u] test fails in kdump_and_crash - Login prompt appeared in serial console output unexpectedly added
Updated by riafarov over 5 years ago
- Status changed from Feedback to Resolved