Project

General

Profile

Actions

action #42359

closed

[functional][y][sporadic] test fails in gnuhealth_install to write to /dev/ttyS0 "Input/output error" while serial getty seems to restart at the same time?

Added by okurz over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 21
Start date:
2018-10-11
Due date:
2018-12-18
% Done:

0%

Estimated time:
3.00 h
Difficulty:

Description

Observation

openQA test in scenario opensuse-15.1-DVD-x86_64-gnuhealth@64bit fails in
gnuhealth_install
trying to write to /dev/ttyS0 with "Input/output error" while serial getty seems to restart at the same time?
https://openqa.opensuse.org/tests/771001/file/gnuhealth_install-journal.log shows that at the time of the screenshot (20:20 - 20:22) the lines:

Oct 10 20:21:00 susetest su[1851]: pam_unix(su:session): session opened for user root by bernhard(uid=1000)
Oct 10 20:21:01 susetest systemd[1]: serial-getty@ttyS0.service: Service hold-off time over, scheduling restart.
Oct 10 20:21:01 susetest systemd[1]: Stopped Serial Getty on ttyS0.
Oct 10 20:21:01 susetest systemd[1]: Started Serial Getty on ttyS0.

Shouldn't the getty process have been masked in before, e.g. by https://openqa.opensuse.org/tests/770994#step/system_prepare/1 ? Neither the parent job creating the image nor the downstream job call "consoletest_setup" which looks ok just judging about the name because we do not want to run any pure "consoletests" in gnuhealth.

Good reading regarding how it works: http://0pointer.de/blog/projects/serial-console.html

Reproducible

Fails since (at least) Build 315.1 (current job)

Expected result

Last good: 314.2 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 2 (0 open2 closed)

Related to openQA Tests - action #39575: [functional][u] Split consoletest_setup to smaller parts which serve single purpose per moduleResolvedzluo2018-08-10

Actions
Blocks openQA Tests - action #44186: [functional][u] test fails in kdump_and_crash - Login prompt appeared in serial console output unexpectedlyRejectedokurz2018-11-21

Actions
Actions #1

Updated by okurz over 5 years ago

  • Subject changed from [functional][u][sporadic] test fails in gnuhealth_install to write to /dev/ttyS0 "Input/output error" while serial getty seems to restart at the same time? to [functional][y][sporadic] test fails in gnuhealth_install to write to /dev/ttyS0 "Input/output error" while serial getty seems to restart at the same time?
  • Status changed from New to Workable
  • Assignee set to riafarov
  • Target version set to Milestone 20

@riafarov could you take a look please because you were involved with masking the serial getty services?

Actions #2

Updated by riafarov over 5 years ago

  • Assignee changed from riafarov to okurz

@okurz, stopping of serial-getty is done in consoletest_setup, which is not schedule neither in the given test suite, nor in createhdd, so it's not related. So I guess we need to investigate further if it's a product bug. And potentially do this change in system_prepare which doesn't sound 100% right to me.

Actions #3

Updated by okurz over 5 years ago

  • Due date set to 2018-11-20
  • Assignee deleted (okurz)

Thank you for your explanation. I am still suspecting that some change changed the test schedule accordingly so let's take a look soon.

Actions #4

Updated by okurz over 5 years ago

  • Due date changed from 2018-11-20 to 2018-11-06
  • Priority changed from Normal to Urgent

https://openqa.suse.de/tests/2174010#step/consoletest_setup/6 looks like the same symptoms and this is happening already in consoletest_setup. Could it be that we have a product regression here?

Actions #5

Updated by okurz over 5 years ago

https://bugzilla.opensuse.org/show_bug.cgi?id=1112109 reported. Probably we still need a workaround though

Actions #6

Updated by riafarov over 5 years ago

  • Assignee set to riafarov
Actions #7

Updated by riafarov over 5 years ago

Issue is there for 3 months now: https://openqa.opensuse.org/tests/714854#step/gnuhealth_install/7
Change where we set permissions to serial in the image creation job was introduced on 9-th of August, which doesn't match to that occurrence. Meaning, we can exclude it as a culprit. Investigating further.

Actions #8

Updated by riafarov over 5 years ago

So, I found easy way to reproduce the error.
Start writing to ttyS0 in infinite loop: while true; do echo bb > /dev/ttyS0 ; done
run systemctl stop serial-getty@ttyS0
"Input/output error" is shown for all attempts to write to tty in the loop. Once command stopped and started again, it works fine again.

Actions #9

Updated by riafarov over 5 years ago

  • Priority changed from Urgent to High

So I've updated the bug with my findings, it's a regression, hopefully we'll get some feedback there. As of now, most popular solution on web is to disable getty service. Problem with this approach is that we rely on Welcome messages to detect that system has booted up. Also, for SLE 15 we failed before reached the step where we stop serial-getty service.
After discussion with @okurz, we came up with 2 potential scalable solutions:

  1. Static: Always disable serial-getty before writing to serial, use ssh to detect that system booted up
  2. Dynamic: Use same approach as for activating ttys, but for serial devices. Before writing to serial device, trigger some activation method. In our case it should set permissions and handle serial-getty service. Potentially, more steps.

Other options are hacks, which we will try to avoid as issue is not new and is not that critical yet.

Actions #10

Updated by riafarov over 5 years ago

  • Description updated (diff)
Actions #11

Updated by riafarov over 5 years ago

  • Status changed from Workable to Feedback
Actions #12

Updated by riafarov over 5 years ago

@zluo has setup where he can reproduce this issue all the time, on Monday I will reuse it to figure out better solution to work the bug around.

Actions #13

Updated by zluo over 5 years ago

http://e13.suse.de/tests/9807 shows successful test run. but other test runs were failed...

Actions #14

Updated by riafarov over 5 years ago

  • Due date changed from 2018-11-06 to 2018-11-20
Actions #15

Updated by okurz over 5 years ago

  • Related to action #39575: [functional][u] Split consoletest_setup to smaller parts which serve single purpose per module added
Actions #16

Updated by okurz over 5 years ago

https://openqa.opensuse.org/tests/788063#step/gnuhealth_install/5 is the most recent failure. The fix was introduced to console/consoletest_setup and that module is not triggered in the gnucash scenario. In #39575 we try to further split consoletest_setup which should allow us then to find a better spot where to trigger the serial port handling.

Actions #17

Updated by riafarov over 5 years ago

  • Status changed from Feedback to In Progress
Actions #18

Updated by riafarov over 5 years ago

  • Estimated time set to 3.00 h
Actions #20

Updated by riafarov over 5 years ago

  • Status changed from In Progress to Feedback
Actions #21

Updated by osukup over 5 years ago

looks like this breaks https://openqa.suse.de/tests/2263786#step/updates_packagekit_gpk/12 -- on SLE12GA and SLE12-SP1

Actions #22

Updated by riafarov over 5 years ago

  • Due date changed from 2018-11-20 to 2018-12-04

Change was reverted, need to reduce scope when the workaround is applied.

Actions #23

Updated by riafarov over 5 years ago

  • Status changed from Feedback to In Progress
Actions #24

Updated by riafarov over 5 years ago

  • Status changed from In Progress to Feedback
Actions #26

Updated by okurz over 5 years ago

  • Target version changed from Milestone 20 to Milestone 21
Actions #27

Updated by riafarov over 5 years ago

Yes, PR still not merged. Also I don't think that we should label job with progress ticket, because it's a workaround for bug.

Actions #28

Updated by riafarov over 5 years ago

  • Due date changed from 2018-12-04 to 2018-12-18
Actions #29

Updated by okurz over 5 years ago

  • Blocks action #44186: [functional][u] test fails in kdump_and_crash - Login prompt appeared in serial console output unexpectedly added
Actions #30

Updated by riafarov over 5 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF