action #30613: [functional][y][hard] script_output might include extra serial output - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

#1

Updated by nicksinger almost 7 years ago

Blocks action #29646: [sle][functional][medium]test fails in zypper_lifecycle due to script_output is not always reliable added

Actions

Copy link

#2

Updated by okurz almost 7 years ago

Category set to Feature requests

I think there was something that mudler was working on but could not find it right now. For now script_output works as designed: It does not promise to return only what was expected so the test code needs to adapt correspondingly. We did this successfully in the past and I think it should be possible in the case you linked as well.

Actions

Copy link

#3

Updated by okurz almost 7 years ago

Blocks deleted (action #29646: [sle][functional][medium]test fails in zypper_lifecycle due to script_output is not always reliable)

Actions

Copy link

#4

Updated by okurz almost 7 years ago

Related to action #29646: [sle][functional][medium]test fails in zypper_lifecycle due to script_output is not always reliable added

Actions

Copy link

#5

Updated by nicksinger almost 7 years ago

Several reasons to object:

Hard to review
Confirms the general opinion directed to openQA: "it's not reliable after all"
Undocumented behavior
Wrongly documented: https://github.com/os-autoinst/openQA/blob/master/docs/WritingTests.asciidoc#using-text-consoles-and-the-serial-terminal ("However you may find it more convenient, readable or reliable to do it in the Perl test module.")

Setting it back as blocker.

Actions

Copy link

#6

Updated by nicksinger almost 7 years ago

Related to deleted (action #29646: [sle][functional][medium]test fails in zypper_lifecycle due to script_output is not always reliable)

Actions

Copy link

#7

Updated by nicksinger almost 7 years ago

Blocks action #29646: [sle][functional][medium]test fails in zypper_lifecycle due to script_output is not always reliable added

Actions

Copy link

#8

Updated by coolo almost 7 years ago

Subject changed from script_output generates unreliable script outputs to script_output might include extra serial output
Priority changed from High to Normal
Target version set to Ready

Actions

Copy link

#9

Updated by okurz almost 7 years ago

An alternative to the proposed solution using a dedicated serial line could be what we are already doing in test workarounds, e.g. see https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4327/files , that is
using special markers in the output that is looked for in the serial port output, e.g.

sub script_output_safe {
    my ($cmd, %args) = @_;
    my $out = script_output "echo '>>>'$($cmd)'<<<'";
    my $expected_output;
    if ($output =~ />>>(?<expected_output>.+)<<</) {
        $expected_output = $+{expected_output};
    }
    return $expected_output;
}

or maybe even more simple:

sub script_output_safe {
    my ($cmd, %args) = @_;
    my $out = script_output "echo '>>>'$($cmd)'<<<'";
    return $+ if $output =~ />>>(?<expected_output>.+)<<</;
}

Actions

Copy link

#10

Updated by okurz almost 7 years ago

Status changed from New to In Progress
Assignee set to okurz

https://github.com/os-autoinst/os-autoinst/pull/920

Actions

Copy link

#11

Updated by JERiveraMoya almost 7 years ago

Related to action #26112: [sle][functional] test fails in zypper_lifecycle - bash script fails added

Actions

Copy link

#12

Updated by riafarov almost 7 years ago

Blocks deleted (action #29646: [sle][functional][medium]test fails in zypper_lifecycle due to script_output is not always reliable)

Actions

Copy link

#13

Updated by riafarov almost 7 years ago

Related to action #29646: [sle][functional][medium]test fails in zypper_lifecycle due to script_output is not always reliable added

Actions

Copy link

#14

Updated by michalnowak almost 7 years ago

This one just blows my mind: https://openqa.suse.de/tests/1458433#step/apache_nss/37

# wait_serial expected: <html><h2>HelloLinux</h2></html>

 # Result:
 <html><h2>HelloLinux</h2></html>

Actions

Copy link

#16

Updated by riafarov almost 7 years ago

So, unfortunately now we have even more complicated things on xen pv: https://openqa.suse.de/tests/1483693#step/zypper_lifecycle/29 and https://openqa.suse.de/tests/1483677#step/zypper_lifecycle/42

For the first one, we can see output on the screen: https://openqa.suse.de/tests/1483693#step/zypper_lifecycle/27 but when we got it on serial device: https://openqa.suse.de/tests/1483693#step/zypper_lifecycle/28 letter 'e' is missing. So we it's not extra serial output, but seems that we cannot rely on serial device on xen pv.

Same happened here: https://openqa.suse.de/tests/1483677#step/zypper_lifecycle/41 , but then here https://openqa.suse.de/tests/1483677#step/zypper_lifecycle/42 where b is missing

Actions

Copy link

#17

Updated by michalnowak almost 7 years ago

Both of those cases are on Hyper-V not Xen PV. And that's kinda expected on Hyper-V, because on Hyper-V serial console is inherently unreliable, it's leaking characters, especially for strings longer that 16 characters. zypper_lifecycle is written in such a way that workarounding it for Hyper-V means to re-write it. I'll just disable it there: https://progress.opensuse.org/issues/32029.

Actions

Copy link

#18

Updated by rpalethorpe almost 7 years ago

This is solved by using the Serial Terminal Console (VIRTIO_CONSOLE) which uses a separate serial port in QEMU, but it is an entirely different mode of communicating. See the new docs here: https://github.com/os-autoinst/openQA/pull/1586

It would be quite reasonable to refactor the VNC console code so that it uses the virtio_screen.pm code to read emulated serial terminal output (serial0.txt). Then when the virtio serial port is available it could use that instead of the emulated serial port (where the kernel log usually goes). So input would still be sent over VNC, but the output would be sent to a separate serial port as suggested by Nick and would be processed more efficiently. We could also modify wait_serial so that the user can 'hint' which serial port they want, because sometimes they may want to read something sent to the kernel log.

After that is done, it would also be possible to send text input over the serial terminal console (the virtio port in QEMU), but have it graphically displayed in VNC by running some software on the SUT (possibly just screen). This would make VNC almost as fast as a pure serial terminal, but still allow needles to be used, which would be very useful with TUIs like Yast. Although it might be better to contribute to VNC by implementing copy & paste properly.

This should also apply to other consoles which are using SSH, IPMI or physical serial port(s).

Currently I am refactoring the script_output code to move it to distribution.pm to help fix a race condition.

Actions

Copy link

#19

Updated by riafarov almost 7 years ago

Is duplicate of action #26108: [sle][functional][u][sporadic] test fails in btrfs_send_receive - checksum of snapshot does not match original checksum added

Actions

Copy link

#20

Updated by riafarov almost 7 years ago

Is duplicate of deleted (action #26108: [sle][functional][u][sporadic] test fails in btrfs_send_receive - checksum of snapshot does not match original checksum)

Actions

Copy link

#21

Updated by riafarov almost 7 years ago

Related to action #26108: [sle][functional][u][sporadic] test fails in btrfs_send_receive - checksum of snapshot does not match original checksum added

Actions

Copy link

#22

Updated by okurz almost 7 years ago

Status changed from In Progress to Feedback

So my proposed change is live and deployed but does not prevent the problems as intended. In https://openqa.suse.de/tests/1483794/file/serial0.txt we can see

t9vXJ
c2066953541fa467dac88003fb3f9896  -
SCRIPT_FINISHEDt9vXJ-0-

showing the begin and end marker

but then we have

Welcome to SUSE Linux Enterprise Server 15 Beta7 (ppc64le) - Kernel 4.12.14-12.3-default (hvc0).

eth0: 10.0.2.15 fec0::14fb:1a86:faa9:82c4


susetest login: c2066953541fa467dac88003fb3f9896  -
SCRIPT_FINISHEDt9vXJ-0-

@rpalethorpe did some cool changes. How can we reevaluate the current situation? Everyone that observes problems in script_output please mention URLs here and describe the symptoms.

Actions

Copy link

#23

Updated by rpalethorpe almost 7 years ago

Even after my changes to script_output there still appears to be some issue on ARM with cat: https://openqa.suse.de/tests/1534756. We should see 'cat /tmp/ver_linux_before.txt' after calling 'cat - > /tmp/scriptORQ2Z.sh; echo ORQ2Z-$?-', but it is totally missing. There is probably some race condition here. Possibly we could wait for the SUT to echo the new line character.

Actions

Copy link

#26

Updated by okurz almost 7 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: textmode+role_textmode
https://openqa.suse.de/tests/1570712

Actions

Copy link

#27

Updated by okurz over 6 years ago

Related to action #33958: [sle][functional][u][fast] test fails in zypper_lifecycle - 'end of support' line not found added

Actions

Copy link

#28

Updated by okurz over 6 years ago

Another idea I am having: Why not just disable the getty service on the serial console which seems to cause most if not all of the problems now? We are not making use of it during the normal tests anyway, right? I would not consider it a workaround because you can say that script_output currently expects that the serial terminal used is expecting it to be reserved and not shared. We can extend the script_output documentation of course.

Actions

Copy link

#29

Updated by okurz over 6 years ago

Related to action #29637: [tools][kernel] test suddenly print login words -> script_output used wrong added

Actions

Copy link

#30

Updated by okurz over 6 years ago

Subject changed from script_output might include extra serial output to [functional][u]script_output might include extra serial output
Due date set to 2018-05-22
Status changed from Feedback to Workable
Assignee deleted (~~okurz~~)
Target version changed from Ready to Milestone 16

see #30613#note-28

Actions

Copy link

#31

Updated by riafarov over 6 years ago

Subject changed from [functional][u]script_output might include extra serial output to [functional][u][hard] script_output might include extra serial output

NOTE: ticket includes just trying out solution from comment #28 and not completely new solution.

Actions

Copy link

#32

Updated by okurz over 6 years ago

Due date changed from 2018-05-22 to 2018-06-05

I don't see it being done this sprint, let's delay it for one sprint.

Actions

Copy link

#33

Updated by SLindoMansilla over 6 years ago

Due date changed from 2018-06-05 to 2018-06-19

Not enough capacity during sprint 18

Actions

Copy link

#34

Updated by okurz over 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: gnome+proxy_SCC+allmodules
https://openqa.suse.de/tests/1748781

Actions

Copy link

#35

Updated by okurz over 6 years ago

Target version changed from Milestone 16 to Milestone 17

we are in M17 by now

Actions

Copy link

#36

Updated by okurz over 6 years ago

Target version changed from Milestone 17 to Milestone 17

Actions

Copy link

#37

Updated by mgriessmeier over 6 years ago

Subject changed from [functional][u][hard] script_output might include extra serial output to [functional][y][hard] script_output might include extra serial output
Due date changed from 2018-06-19 to 2018-07-03

revisit in planning to help everyone understand what's the next step to do

Actions

Copy link

#38

Updated by okurz over 6 years ago

Description updated (diff)

So we had been shifting this ticket around a lot :) Let's try one more sprint on [y] otherwise we shift to "future".

Actions

Copy link

#39

Updated by JERiveraMoya over 6 years ago

Estimated time set to 13.00 h

Actions

Copy link

#40

Updated by riafarov over 6 years ago

Status changed from Workable to In Progress
Assignee set to riafarov

Actions

Copy link

#41

Updated by riafarov over 6 years ago

Status changed from In Progress to Resolved

So idea with getty didn't work well, getty is not running against serial console and all what we see is duplicated output which we get for other ttys. So, I guess the only way to solve this issue completely is to use second serial device which will be dedicated for script_output.

@okurz, I suggest removing this ticket from the sprint or we can resolve it. Feel free to reschedule. I also believe we have mitigated issue already with workarounds, but it's still is the reason of some sporadic failures. So we can consider applying second suggestion, which will be tricky for some backends (e.g. ipmi).

Actions

Copy link

#42

Updated by riafarov over 6 years ago

Due date deleted (~~2018-07-03~~)
Status changed from Resolved to Workable
Assignee deleted (~~riafarov~~)

Actions

Copy link

#43

Updated by riafarov over 6 years ago

Due date set to 2018-07-03

Actions

Copy link

#44

Updated by okurz over 6 years ago

riafarov wrote:

So idea with getty didn't work well, getty is not running against serial console and all what we see is duplicated output which we get for other ttys

Can you reference tests you ran that support this expectation? I do not understand how "duplicated output for other ttys" could show up on the serial port.

I failed to find a recently failing job on this with a quick search. The one mentioned in #30613#note-34 is actually not this ticket, the failure here is not related to "unexpected serial output" so please do not label with a progress ticket. It looks like either the lifecycle data has not been updated or we need to update the test module to cope with EOL reports on the legacy module. I suggest to clarify in a bug report.

Actions

Copy link

#45

Updated by riafarov over 6 years ago

@okurz As you may remember why original idea didn't work, because when we set tty boot parameters we have, that serial device also is used as /dev/console and hence gets everything we also see on tty we use.

As for runs, getty service is started for each tty we activate e.g. getty@tty1, but there is no getty service started against our serial console (ttyS0 on qemu). So, please, let me know if you meant something else with the idea of disabling getty.

Actions

Copy link

#46

Updated by okurz over 6 years ago

hm, if there is not getty running on $serialdev what provides the login prompt then? Or is it really because of /dev/console? I doubt it. Btw, have we ever seen it on x86_64 or only aarch64+ppc64le?

Actions

Copy link

#47

Updated by riafarov over 6 years ago

Assignee set to riafarov

It mostly affects ppc64. We get login prompt for tty which we select, and we don't switch to ttyS0. Let me double check it. I've taken a look on my runs and there command failed.

Actions

Copy link

#48

Updated by riafarov over 6 years ago

So, here is the run I did: http://g226.suse.de/tests/1997#step/consoletest_setup/27 but as you can see on serial, it doesn't really help for ppc64le and we still get login prompt on serial:
http://g226.suse.de/tests/1997/file/serial0.txt

For x86 I got same number of "Welcome" messages on serial as without stopped getty@ttyS0, so my quick guess is that it's related to /dev/console, but obviously I can be wrong.

Actions

Copy link

#49

Updated by pvorel over 6 years ago

riafarov wrote:

It mostly affects ppc64. We get login prompt for tty which we select, and we don't switch to ttyS0. Let me double check it. I've taken a look on my runs and there command failed.

As you talk about ppc64, ttyS0 and /dev/console, this is probably not related, but just in case you don't know that:
So on ppc64 it's hvc0 instead of ttyS0.

There is actually comment in this ticket proving this:
Welcome to SUSE Linux Enterprise Server 15 Beta7 (ppc64le) - Kernel 4.12.14-12.3-default (hvc0).

OT: We had to do some quirks for LTP on ppc64 running with VIRTIO_CONSOLE we had to add hvc1 (#18980 https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/tests/kernel/install_ltp.pm#L259).

Actions

Copy link

#50

Updated by riafarov over 6 years ago

Status changed from Workable to Feedback

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5315

Actions

Copy link

#51

Updated by pvorel over 6 years ago

FYI: We suffer occasional problems with LTP since this commit:
b3ee11ed LTP: Merge ltp_setup_networking.pm into install_ltp.pm

https://openqa.opensuse.org/tests/698252#step/install_ltp/141
AFAIK we never had any problem on x86_64 before. I'll test your PR.

Actions

Copy link

#52

Updated by okurz over 6 years ago

But this LTP is not even using the serial device but the virtio based serial terminal. I hardly doubt that is related.

Actions

Copy link

#53

Updated by riafarov over 6 years ago

@pvorel Hmm, PR I've created hasn't been yet merged. Can it be related to workers performance?

Actions

Copy link

#54

Updated by riafarov over 6 years ago

Due date changed from 2018-07-03 to 2018-07-31

Actions

Copy link

#55

Updated by pvorel over 6 years ago

okurz wrote:

But this LTP is not even using the serial device but the virtio based serial terminal. I hardly doubt that is related.

OK, I'm sorry for unrelated comment.

riafarov wrote:

@pvorel Hmm, PR I've created hasn't been yet merged. Can it be related to workers performance?

I have no idea.

Actions

Copy link

#56

Updated by okurz over 6 years ago

So now I wonder if we still need any workarounds, I proposed https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5379 to remove workarounds. Would you like to test the PR?

Actions

Copy link

#57

Updated by riafarov over 6 years ago

@okurz, we have discussed a problem of changing scope of the ticket, in this case I would prefer resolving this one and having separate ticket for removing other workarounds. On top, we have only mitigated an issue, so other workarounds still may be useful.

Actions

Copy link

#58

Updated by riafarov over 6 years ago

Related to action #38417: [sle][functional][u][easy] Cleanup workarounds for serial output which are not required anymore added

Actions

Copy link

#59

Updated by riafarov over 6 years ago

Status changed from Feedback to Resolved

Actions

Copy link

#60

Updated by mgriessmeier over 6 years ago

Status changed from Resolved to In Progress

This needs to be rediscussed, since it broke s390 and svirt reboots due to masking the serialdev service in the creation job and not unmasking it again.
see e.g. https://openqa.suse.de/tests/1838736

we've reverted the PR with https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5427/

Actions

Copy link

#61

Updated by okurz over 6 years ago

Target version changed from Milestone 17 to Milestone 18

@riafarov feel free to unassign after we reopened now but please restate your explanation from the revert PR why we broke it

Actions

Copy link

#62

Updated by okurz over 6 years ago

Related to action #37027: [tw][desktop] Update create_hdd_* to make sure right permissions added

Actions

Copy link

#63

Updated by okurz over 6 years ago

Related to action #38480: [functional][u][fast][qam] job mau-extratests fails in new tests update_alternatives, aplay, openssl_alpn, curl_ipv6, wget_ipv6, gpg, shells added

Actions

Copy link

#64

Updated by riafarov over 6 years ago

Due date changed from 2018-07-31 to 2018-08-14

E.g. split this part to ensure_serial_console_permissions.

Actions

Copy link

#65

Updated by riafarov over 6 years ago

Related to deleted (action #38417: [sle][functional][u][easy] Cleanup workarounds for serial output which are not required anymore)

Actions

Copy link

#66

Updated by riafarov over 6 years ago

Blocks action #38417: [sle][functional][u][easy] Cleanup workarounds for serial output which are not required anymore added

Actions

Copy link

#67

Updated by riafarov over 6 years ago

Status changed from In Progress to Feedback

We are waiting when PR to split consoletest_setup will get merged and then make a decision where to add part to stop service. As mentioned above, we should have it enabled when reboot on svirt and zVM, as read serial to detect grub, welcome screens.

Actions

Copy link

#68

Updated by riafarov over 6 years ago

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5564

Actions

Copy link

#69

Updated by riafarov over 6 years ago

Related to action #39665: [functional][u][ipmi] test fails in system_prepare, /dev/sshserial is used instead of /dev/ttyS1 added

Actions

Copy link

#70

Updated by riafarov over 6 years ago

Due date changed from 2018-08-14 to 2018-08-28

Actions

Copy link

#71

Updated by riafarov over 6 years ago

Broke this one: https://openqa.suse.de/tests/1936687#step/consoletest_setup/29

Actions

Copy link

#72

Updated by riafarov over 6 years ago

Status changed from Feedback to Resolved

Failure is not related to the change, but to the worker performance, we have quite some sporadic failures on it. Here is the run with less jobs running on it: https://openqa.suse.de/tests/1945118#step/consoletest_setup/19

This issues will be addressed in separate tickets.

Actions

Copy link

#73

Updated by cfconrad over 6 years ago

Created a new ticket [1] for the cases where script_output() fail in "cat -" and never return to prompt.

Which was mentioned in note-23 [2] but seems to be a different issue.

[1] https://progress.opensuse.org/issues/40913
[2] https://progress.opensuse.org/issues/30613#note-23

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #30613

[functional][y][hard] script_output might include extra serial output

Problem¶

Example¶

Expected result¶

Suggestions¶

Updated by nicksinger almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by nicksinger almost 7 years ago

Updated by nicksinger almost 7 years ago

Updated by nicksinger almost 7 years ago

Updated by coolo almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by JERiveraMoya almost 7 years ago

Updated by riafarov almost 7 years ago

Updated by riafarov almost 7 years ago

Updated by michalnowak almost 7 years ago

Updated by riafarov almost 7 years ago

Updated by michalnowak almost 7 years ago

Updated by rpalethorpe almost 7 years ago

Updated by riafarov almost 7 years ago

Updated by riafarov almost 7 years ago

Updated by riafarov almost 7 years ago

Updated by okurz almost 7 years ago

Updated by rpalethorpe almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz over 6 years ago

Updated by okurz over 6 years ago

Updated by okurz over 6 years ago

Updated by okurz over 6 years ago

Updated by riafarov over 6 years ago

Updated by okurz over 6 years ago

Updated by SLindoMansilla over 6 years ago

Updated by okurz over 6 years ago

Updated by okurz over 6 years ago

Updated by okurz over 6 years ago

Updated by mgriessmeier over 6 years ago

Updated by okurz over 6 years ago

Updated by JERiveraMoya over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by okurz over 6 years ago

Updated by riafarov over 6 years ago

Updated by okurz over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by pvorel over 6 years ago

Updated by riafarov over 6 years ago

Updated by pvorel over 6 years ago

Updated by okurz over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by pvorel over 6 years ago

Updated by okurz over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by mgriessmeier over 6 years ago

Updated by okurz over 6 years ago

Updated by okurz over 6 years ago

Updated by okurz over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago

Updated by riafarov over 6 years ago