Project

General

Profile

Actions

action #114532

closed

[qe-core][sles15sp4]test fails in tomcat with most failures at “WebSocketsTest"

Added by rfan1 over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
Start date:
2022-07-22
Due date:
% Done:

100%

Estimated time:
4.00 h
Difficulty:
Sprint:
QE-Core: July Sprint (Jul 06 - Aug 03)

Description

Observation

openQA test in scenario sle-15-SP4-Server-DVD-Updates-x86_64-qam-regression-tomcat@64bit fails in
tomcat

Test suite description

Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml.

Reproducible

Fails since (at least) Build 20220721-1

Expected result

Last good: 20220720-1 (or more recent)

Further details

Always latest result in this scenario: latest

Actions #1

Updated by rfan1 over 2 years ago

This ticket failed multiple times in the past several builds.
And seems most of the failure occurred with "WebSocketsTest" function.

I am wondering there are some performance problems there, but can't understand why sp4 hits the issue more often.

Actions #2

Updated by rfan1 over 2 years ago

  • Subject changed from [qe-core][sles15sp4]test fails in tomcat with test at “WebSocketsTest" to [qe-core][sles15sp4]test fails in tomcat with most failures at “WebSocketsTest"
Actions #3

Updated by rfan1 over 2 years ago

Re-run with 4GB memory
http://openqa.suse.de/t9197005

Actions #4

Updated by rfan1 over 2 years ago

  • Tags set to bugbusters
  • Status changed from New to In Progress
  • Assignee set to rfan1
  • Target version set to QE-Core: Ready
  • Estimated time set to 4.00 h

More test runs with 4g mem:

#for i in 1 2 3 4 5; do openqa-clone-job  --from http://openqa.suse.de --host http://openqa.suse.de 9207213 _GROUP_ID=0 -skip-download --skip-chained-deps QEMURAM=4096 BUILD=tomcat;sleep 1;done
Created job #9207333: sle-15-SP4-Server-DVD-Updates-x86_64-Build20220723-1-qam-regression-tomcat@64bit -> http://openqa.suse.de/t9207333
Created job #9207334: sle-15-SP4-Server-DVD-Updates-x86_64-Build20220723-1-qam-regression-tomcat@64bit -> http://openqa.suse.de/t9207334
Created job #9207335: sle-15-SP4-Server-DVD-Updates-x86_64-Build20220723-1-qam-regression-tomcat@64bit -> http://openqa.suse.de/t9207335
Created job #9207336: sle-15-SP4-Server-DVD-Updates-x86_64-Build20220723-1-qam-regression-tomcat@64bit -> http://openqa.suse.de/t9207336
Created job #9207337: sle-15-SP4-Server-DVD-Updates-x86_64-Build20220723-1-qam-regression-tomcat@64bit -> http://openqa.suse.de/t9207337
Actions #5

Updated by rfan1 over 2 years ago

  • % Done changed from 0 to 20

Increasing mem size and adjusting some timeout value can help much!
now the failure ratio reduces to ~10%

Actions #6

Updated by geor over 2 years ago

In my opinion, the multiplayer drawboard example is prone to needle mismatch, because it requires the use of sending page downs and page ups, and needle matching will depend on screen size parameters among other factors.
I believe, in the sake of stability, that we can safely delete the function that tests the drawboard example.
We already cover the rest of Tomcat's WebSocket examples, and, in reality, this drawboard example is not really tested with multiple connections as it would ideally be the case.
So as far as coverage is concerned we should be ok.

Actions #7

Updated by rfan1 over 2 years ago

geor wrote:

In my opinion, the multiplayer drawboard example is prone to needle mismatch, because it requires the use of sending page downs and page ups, and needle matching will depend on screen size parameters among other factors.
I believe, in the sake of stability, that we can safely delete the function that tests the drawboard example.
We already cover the rest of Tomcat's WebSocket examples, and, in reality, this drawboard example is not really tested with multiple connections as it would ideally be the case.
So as far as coverage is concerned we should be ok.

Thanks much!
That can explain my I can't fix the issue by increasing the mem size and timeout value. It is not a performance issue.
Let me try to delete this function and see.

Actions #8

Updated by rfan1 over 2 years ago

#openqa-clone-job  --from http://openqa.suse.de --host http://openqa.suse.de 9207213 _GROUP_ID=0 -skip-download --skip-chained-deps BUILD=tomcat_drawboard_4g _timeout QEMURAM=4096 CASEDIR=https://github.com/rfan1/os-autoinst-distri-opensuse.git#tomcat_multiplayer_drawboard

http://openqa.suse.de/tests/9208390#next_previous
Good news: In past 15 Runs, only failed once and seems caused by page refreshing issue.

Actions #10

Updated by apappas over 2 years ago

  • Tags changed from bugbusters to bugbusters, qe-core-coverage

I am adding the qe-core-coverage tag. We use this to mark when tickets require us to reduce (perceived) coverage.

Actions #11

Updated by rfan1 over 2 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 70 to 90
Actions #12

Updated by rfan1 over 2 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 90 to 100

The new openqa run passed now:
https://openqa.suse.de/tests/9219325

Actions #13

Updated by szarate over 2 years ago

  • Sprint set to QE-Core: July Sprint (Jul 06 - Aug 03)
Actions

Also available in: Atom PDF