action #59187: unstable/flaky test: ui/15-admin-workers.t - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #59187

closed

unstable/flaky test: ui/15-admin-workers.t

Added by okurz over 5 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

okurz

Category:

Regressions/Crashes

Target version:

Done

Start date:

2019-11-07

Due date:

% Done:

Estimated time:

Description

Observation¶

From https://circleci.com/gh/os-autoinst/openQA/4398

while :; do\
    [ $n -lt "3" ] || exit 1;\
    [ $n -eq 0 ] || echo Retrying...;\
    prove -l --harness TAP::Harness::JUnit t/ui/15-admin-workers.t && break;\
    n=$[$n+1];\
done
t/ui/15-admin-workers.t .. 6/? 
#   Failed test 'still on 99963'
#   at t/ui/15-admin-workers.t line 161.
#                   'All Tests
# Job Groups
# Logged in as Demo
# Worker localhost:1
# Host: localhost
# Instance: 1
# Alive: yes
# Websocket connection: Active
# Seen: about 2 hours ago
# Status: Working
# Properties
# Key Value
# JOBTOKEN token99963
# Previous jobs
# Show
# 10
# 25
# 50
# 100
# entries
# Test Result Finished
# opensuse-13.1-NET-x86_64-Build0091-kde@64bit not yet
# opensuse-Factory-staging_e-x86_64-Build87.5011-minimalx@32bit 0 about an hour ago
# Showing 1 to 2 of 2 entries
# Previous
# 1
# Next
# openQA is licensed GPL-2.0 - Version git-4.5.1528009330.e68ebe2b-79820978'
#     doesn't match '(?^:Worker status\nJob: 99963)'
t/ui/15-admin-workers.t .. 18/? # Looks like you failed 1 test of 19.
t/ui/15-admin-workers.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/19 subtests 

Test Summary Report
-------------------
t/ui/15-admin-workers.t (Wstat: 256 Tests: 19 Failed: 1)
  Failed test:  10
  Non-zero exit status: 1
Files=1, Tests=19, 28.0708 wallclock secs ( 0.04 usr  0.01 sys + 16.63 cusr  0.79 csys = 17.47 CPU)
Result: FAIL
Retrying...
t/ui/15-admin-workers.t .. 1/? 
    #   Failed test 'working 99961'
    #   at t/ui/15-admin-workers.t line 127.
    #                   ''
    #     doesn't match '(?^:Worker status\nJob: 99961)'
    # Looks like you failed 1 test of 15.
t/ui/15-admin-workers.t .. 6/? 
#   Failed test 'worker overview'
#   at t/ui/15-admin-workers.t line 139.
t/ui/15-admin-workers.t .. 7/? # Looks like you failed 1 test of 19.
t/ui/15-admin-workers.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/19 subtests 

Test Summary Report
-------------------
t/ui/15-admin-workers.t (Wstat: 256 Tests: 19 Failed: 1)
  Failed test:  6
  Non-zero exit status: 1
Files=1, Tests=19, 27.0958 wallclock secs ( 0.04 usr  0.00 sys + 16.08 cusr  0.66 csys = 16.78 CPU)
Result: FAIL
Retrying...
t/ui/15-admin-workers.t .. 10/? 
#   Failed test 'still on 99963'
#   at t/ui/15-admin-workers.t line 161.
#                   'All Tests
# Job Groups
# Logged in as Demo
# Worker localhost:1
# Host: localhost
# Instance: 1
# Alive: yes
# Websocket connection: Active
# Seen: about 2 hours ago
# Status: Working
# Properties
# Key Value
# JOBTOKEN token99963
# Previous jobs
# Show
# 10
# 25
# 50
# 100
# entries
# Test Result Finished
# opensuse-13.1-NET-x86_64-Build0091-kde@64bit not yet
# opensuse-Factory-staging_e-x86_64-Build87.5011-minimalx@32bit 0 about an hour ago
# Showing 1 to 2 of 2 entries
# Previous
# 1
# Next
# openQA is licensed GPL-2.0 - Version git-4.5.1528009330.e68ebe2b-79820978'
#     doesn't match '(?^:Worker status\nJob: 99963)'
# Looks like you failed 1 test of 19.
t/ui/15-admin-workers.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/19 subtests 

Test Summary Report
-------------------
t/ui/15-admin-workers.t (Wstat: 256 Tests: 19 Failed: 1)
  Failed test:  10
  Non-zero exit status: 1

so I see three retries where in the first and third there is full output corresponding to the worker page but not with the expected string "Worker status…". In the second example the page actually does not seem to return any results. The text "Worker status" comes from https://github.com/os-autoinst/openQA/blob/master/templates/admin/workers/worker_status.html.ep#L12

Actions

Copy link

Updated by okurz over 5 years ago

Status changed from New to In Progress
Assignee set to okurz

I can see that andriinikitin is also working on this already with https://github.com/os-autoinst/openQA/pull/2382 . I am currently calling for i in {1..100}; do echo "TRY $i" && make PROVE_ARGS="-v t/ui/15-admin-workers.t" test-with-database || break ; done and could not reproduce the problem within 51 runs, aborted early. Maybe it fails when I slow it down with nice -n 19

EDIT: 100 times with nice -n 19 and also not failed once. So it only fails in circle CI somehow? Or in our container. At least a single run of make CHECKSTYLE=0 PROVE_ARGS="t/ui/15-admin-workers.t" UITEST=1 prepare-and-launch-docker-to-run-tests-within executes t/ui/15-admin-workers.t just fine

Actions

Copy link

Updated by andriinikitin over 5 years ago

You can "retry with ssh" unstable job in circleci and try the same command there - I believe it will fail quickly.

Actions

Copy link

Updated by okurz over 5 years ago

Status changed from In Progress to New
Assignee deleted (~~okurz~~)

what Andrii wrote could be tried. Did not yet do it myself.

Actions

Copy link

Updated by okurz over 5 years ago

Status changed from New to Feedback
Assignee set to okurz
Target version set to Current Sprint

https://github.com/os-autoinst/openQA/pull/2485

EDIT: Does not seem to be enough.

So I restarted a failed circle CI test using SSH and in there did:

for i in {1..100}; do echo "TRY: $i" && make PROVE_ARGS="-v t/ui/15-admin-workers.t" test-with-database || break; done

which fails on the first or second retry.

With changes in three locations like:

diff --git a/t/ui/15-admin-workers.t b/t/ui/15-admin-workers.t
index 492a3acc5..9660dd67b 100644
--- a/t/ui/15-admin-workers.t
+++ b/t/ui/15-admin-workers.t
@@ -115,6 +115,8 @@ subtest 'worker overview' => sub {
     # check worker 1
     is($driver->find_element('tr#worker_1 .worker')->get_text(), 'localhost:1', 'localhost:1 shown');
     $driver->find_element('tr#worker_1 .help_popover')->click();
+    wait_for_element(selector => '.popover', description => 'worker status popover is displayed');
+    wait_for_ajax;
     like($driver->find_element('.popover')->get_text(), qr/Worker status\nJob: 99963/, 'on 99963');

     # close the popover and wait until it is actually gone (seems to have a very short animation)

this is more stable and I need both the wait_for_element and wait_for_ajax but still a failure after 18 runs:

TRY: 18
test -d /dev/shm/tpg && (pg_ctl -D /dev/shm/tpg -s status >&/dev/null || pg_ctl -D /dev/shm/tpg -s start) || ./t/test_postgresql /dev/shm/tpg
…
export GLOBIGNORE=""; prove -l -v t/ui/15-admin-workers.t
t/ui/15-admin-workers.t .. 
…
ok 8 - on worker 1
[debug] [eeb0ec09] Routing to controller "OpenQA::WebAPI::Controller::Admin::Workers" and action "previous_jobs_ajax"
ok 9 - table properties shown
[debug] [eeb0ec09] 200 OK (0.011117s, 89.952/s)
ok 10 - worker status popover is displayed
ok 11 - Wait for jQuery successful
not ok 12 - still on 99963

#   Failed test 'still on 99963'
#   at t/ui/15-admin-workers.t line 165.
#                   ''
#     doesn't match '(?^:Worker status\nJob: 99963)'

only after removing the third part about "still on 99963" which shouldn't actually be needed I could make the test stable.

-> https://github.com/os-autoinst/openQA/pull/2485

Actions

Copy link

Updated by okurz over 5 years ago

The PR or actually the commit is in master. Not sure if I did this by mistake or someone else though.

Actions

Copy link

Updated by okurz over 5 years ago

remove from list of unstable tests with https://github.com/os-autoinst/openQA/pull/2494

Actions

Copy link

Updated by okurz over 5 years ago

Status changed from Feedback to Resolved
Target version changed from Current Sprint to Done

Actions

Copy link

Updated by tinita over 3 years ago

I saw this failure just now:
https://app.circleci.com/pipelines/github/os-autoinst/openQA/8595/workflows/53935253-041c-428a-9855-069db2ee84dd/jobs/81201/steps

[16:35:41] t/ui/15-admin-workers.t .................... 5/? 
    #   Failed test 'on 99963'
    #   at t/ui/15-admin-workers.t line 119.
    #                   ''
    #     doesn't match '(?^:Worker status\nJob: 99963)'
    # Looks like you failed 1 test of 19.
[16:35:41] t/ui/15-admin-workers.t .................... 7/? 
#   Failed test 'worker overview'
#   at t/ui/15-admin-workers.t line 142.
[16:35:41] t/ui/15-admin-workers.t .................... 8/?
[16:35:41] t/ui/15-admin-workers.t .................... 15/?
[16:35:41] t/ui/15-admin-workers.t .................... 17/?
# Looks like you failed 1 test of 17.
[16:35:41] t/ui/15-admin-workers.t .................... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/17 subtests

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #59187

unstable/flaky test: ui/15-admin-workers.t

Observation¶

Updated by okurz over 5 years ago

Updated by andriinikitin over 5 years ago

Updated by okurz over 5 years ago

Updated by okurz over 5 years ago

Updated by okurz over 5 years ago

Updated by okurz over 5 years ago

Updated by okurz over 5 years ago

Updated by tinita over 3 years ago