Project

General

Profile

action #64568

[functional][u] test fails in vnc_two_passwords - Test fails due to (apparent) timeout

Added by szarate 2 months ago. Updated 5 days ago.

Status:
In Progress
Priority:
Normal
Category:
Bugs in existing tests
Target version:
SUSE QA tests - Milestone 30
Start date:
2020-03-18
Due date:
% Done:

0%

Estimated time:
42.00 h
Difficulty:
Duration:

Description

Observation

It seems that wait_serial didn't find any match on time and the test failed maybe bumping the timeout by 10 more seconds?

openQA test in scenario sle-15-SP2-Online-ppc64le-extra_tests_gnome_sdk@ppc64le fails in
vnc_two_passwords

Suggestions

  • This needs investigation. Probably a race condition (sub generate_vnc_events?)

Reproducible

Fails since (at least) Build 150.1

Expected result

Last good: 146.1 (or more recent)

Further details

Always latest result in this scenario: latest

History

#1 Updated by SLindoMansilla about 2 months ago

  • Description updated (diff)
  • Status changed from New to Workable
  • Target version set to Milestone 30
  • Estimated time set to 42.00 h

#2 Updated by zluo about 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

checking

#3 Updated by zluo about 2 months ago

https://openqa.suse.de/tests/4077630#step/vnc_two_passwords/18 shows that it looks different after

# Close xev 
send_key 'ctrl-c';

It still fails at command 'wc -l /tmp/xev_log | grep "0 "'

This is a sporadic issue atm, 1 failure of 51 test runs.

#4 Updated by zluo about 2 months ago

Add wait_still_screen after ctrl-c, to give a little more time for next command at prompt, because I think if typing command can might hit issue when xev is not close yet.

# Close xev
send_key 'ctrl-c';
wait_still_screen;
# Check if xev recorded events or not - RO/RW mode
if ($opt->{change}) {
assert_script_run '[ -s /tmp/xev_log ]';
}

else {
my $timeout = 30;
$timeout = 60 if is_ppc64le;
assert_script_run 'wc -l /tmp/xev_log | grep "^0 "', $timeout;
}
save_screenshot;
assert_script_run 'rm /tmp/xev_log';
}

test:

https://openqa.suse.de/tests/4079375#step/vnc_two_passwords/16

#5 Updated by zluo about 2 months ago

https://openqa.suse.de/tests/4079389#step/vnc_two_passwords/16 shows performance issue, so this could be also an issue in general.

#6 Updated by zluo about 2 months ago

"QEMU" : "ppc64",
"QEMUCPU" : "host",
"QEMUCPUS" : "1",
"QEMUMACHINE" : "usb=off",
"QEMUPORT" : 20022,
"QEMURAM" : "1536",
"QEMUTHREADS" : "1",
"QEMUVGA" : "std",
"QEMU_COMPRESS_QCOW2" : 1,

the above settings are used for production. more tests failed clearly if I compare tests with timeout 60 sec.

#7 Updated by zluo about 2 months ago

https://openqa.suse.de/tests/4091217#step/vnc_two_passwords/25 shows that xev log is not yet empty:

as assert_script_run 'wc -l /tmp/xev_log | grep "0 "'; is called.

This might be related to the issue that xev is hanging or not closed in time.

#8 Updated by zluo about 2 months ago

Repeat send_key ctrl-c and wait for closing xev in vnc_two_passwords.pm

this helps now.

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9946 updated.

#9 Updated by SLindoMansilla about 2 months ago

PR to improve synchronization point: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/10004
If ppc64le still fails, it could be:

Scenario A

ppc64le needs more time to close xev. for that, the timeout of the wait_serial can be increased. (please avoid any kind of "sleep-like" instruction unless other options are tried)

Scenario B

Even after process started directly by xev command is stopped, it could be that a thread/sub-process is still hung, in that case, ps -C cmd should be used to look for the guilty process and only continue after that process has exited. (please avoid any kind of "sleep-like" instruction unless other options are tried)

#11 Updated by zluo about 1 month ago

  • Status changed from In Progress to Resolved

checked the results and it is resolved, thanks @SLindoMansilla

#12 Updated by SLindoMansilla about 1 month ago

  • Status changed from Resolved to Workable
  • Assignee changed from zluo to SLindoMansilla

#13 Updated by SLindoMansilla about 1 month ago

I am not able to reproduce the new fail locally, we need more logs to investigate the source of the problem: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/10040

#14 Updated by SLindoMansilla 20 days ago

  • Status changed from Workable to In Progress

Die if vncviewer or xev didn't finished after stopping them: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/10218

#15 Updated by okurz 5 days ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_gnome@ppc64le-2g
https://openqa.suse.de/tests/4261368

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

Also available in: Atom PDF