Project

General

Profile

Actions

action #102578

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #98952: [epic] t/full-stack.t sporadically fails "clickElement: element not interactable" and other errors

[sporadic] t/full-stack.t Failed test 'Expected result for job 1 not found' size:M

Added by livdywan over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-09-21
Due date:
% Done:

0%

Estimated time:

Description

Observation

t/full-stack.t is becoming more and more of a problem, e.g. see https://app.circleci.com/pipelines/github/os-autoinst/openQA/8392/workflows/6748c60b-4966-4db6-806b-415da997d321/jobs/79281 from https://github.com/os-autoinst/openQA/pull/4360 . I suggest more people to collaborate on this ticket.

#   Failed test 'Expected result for job 1 not found'

Acceptance criteria

  • AC1: t/full-stack.t subtest 'schedule job' is stable

Suggestions

Actions #1

Updated by livdywan over 2 years ago

  • Copied from coordination #98952: [epic] t/full-stack.t sporadically fails "clickElement: element not interactable" and other errors added
Actions #2

Updated by livdywan over 2 years ago

  • Parent task set to #98952
Actions #3

Updated by livdywan over 2 years ago

Suggestions

  • a PR to expose worker log in the schedule job subtest

https://github.com/os-autoinst/openQA/pull/4363

Actions #4

Updated by livdywan over 2 years ago

  • Assignee deleted (livdywan)

https://github.com/os-autoinst/openQA/pull/4363

PR got merged. Might be best to come back to this once we see this on CircleCI or locally (I've not managed to produce a genuine failure with this fix yet)

Actions #5

Updated by okurz over 2 years ago

  • Status changed from Workable to New

Wasn't estimated, back to "New"

Actions #6

Updated by livdywan over 2 years ago

  • Subject changed from t/full-stack.t Failed test 'Expected result for job 1 not found' to t/full-stack.t Failed test 'Expected result for job 1 not found' [sporadic] size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #7

Updated by okurz over 2 years ago

  • Subject changed from t/full-stack.t Failed test 'Expected result for job 1 not found' [sporadic] size:M to [sporadic] t/full-stack.t Failed test 'Expected result for job 1 not found' [sporadic] size:M
  • Status changed from Workable to In Progress
  • Assignee set to okurz

I followed the suggestion locally with

$ git diff
diff --git a/t/full-stack.t b/t/full-stack.t
index 7b858e8b5..470eaaf08 100644
--- a/t/full-stack.t
+++ b/t/full-stack.t
@@ -218,6 +218,8 @@ subtest 'schedule job' => sub {
   or bail_with_log 1,
   'Job 1 produced the wrong results';

+  done_testing;exit;
+
 subtest 'clone job that crashes' => sub {
     client_call('-X POST jobs/1/restart', qr|test_url.+1.+tests.+2|, 'client returned new test_url for test 2');
     $driver->refresh();

and called FULLSTACK=1 count_fail_ratio prove -l t/full-stack.t and got

...
## Run 20
t/full-stack.t .. ok
All tests successful.
Files=1, Tests=31, 33 wallclock secs ( 0.07 usr  0.01 sys + 15.37 cusr  2.52 csys = 17.97 CPU)
Result: PASS
## count_fail_ratio: Run: 20. Fails: 0. Fail ratio 0%

so fail-rate is below 1/20 in my local instance. Will test longer and also on CI

Actions #8

Updated by okurz over 2 years ago

  • Subject changed from [sporadic] t/full-stack.t Failed test 'Expected result for job 1 not found' [sporadic] size:M to [sporadic] t/full-stack.t Failed test 'Expected result for job 1 not found' size:M
Actions #9

Updated by okurz over 2 years ago

Local results from FULLSTACK=1 runs=100 count_fail_ratio prove -l t/full-stack.t:

...
All tests successful.
Files=1, Tests=31, 33 wallclock secs ( 0.08 usr  0.01 sys + 15.59 cusr  2.52 csys = 18.20 CPU)
Result: PASS
## count_fail_ratio: Run: 100. Fails: 0. Fail ratio 0%

so fail-rate below 1%. The test in CI failed with another error in the 4th run (of max. 50): https://github.com/os-autoinst/openQA/pull/4373#issuecomment-979248913

Actions #10

Updated by okurz over 2 years ago

  • Status changed from In Progress to Workable
  • Assignee deleted (okurz)

Locally I am at "## count_fail_ratio: Run: 751. Fails: 0. Fail ratio 0%" so a fail rate of at least below 0.13 %. Within circleCI I could reproduce what looks like the error.
https://app.circleci.com/pipelines/github/os-autoinst/openQA/8463/workflows/a8295d0c-d4b5-4c59-8d38-204524a8bcec/jobs/79954
shows

Rerun 4 of 50 …
Calling retry hook ./tools/delete-coverdb-folder
Deleting cover_db_fullstack/
[14:21:16] t/full-stack.t .. 26/?     # Unexpected Javascript console errors: [
    #   {
    #     level     => "SEVERE",
    #     message   => "http://localhost:9526/tests/1/images/thumb/boot-1.png?1637850113525 - Failed to load resource: the server responded with a status of 404 (Not Found)",
    #     source    => "network",
    #     timestamp => 1637850113984,
    #   },
    #   {
    #     level     => "SEVERE",
    #     message   => "http://localhost:9526/tests/1/images/thumb/boot-2.png?1637850113525 - Failed to load resource: the server responded with a status of 404 (Not Found)",
    #     source    => "network",
    #     timestamp => 1637850113985,
    #   },
    # ]
    # full result panel contents:
    # State: running, started less than a minute ago
    # Scheduled product: job has not been created by posting an ISO
    # Assigned worker: f2681b748598:1

    #   Failed test 'Expected result not found'
    #   at /home/squamata/project/t/lib/OpenQA/Test/FullstackUtils.pm line 84.

    #   Failed test 'job 1 passed'
    #   at t/full-stack.t line 197.
[14:21:16] t/full-stack.t .. 27/?     # 0

    #   Failed test 'autoinst log file generated'
    #   at t/full-stack.t line 199.
    # Looks like you failed 3 tests of 5.

#   Failed test 'schedule job'
#   at t/full-stack.t line 218.
# unable to read /tmp/XKQPZFxlD7/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt: Can't open file "/tmp/XKQPZFxlD7/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt": No such file or directory at t/full-stack.t line 161.
# /tmp/XKQPZFxlD7/full-stack.d/openqa/testresults/../pool/1/worker-log.txt:
# [2021-11-25T14:21:29.415011Z] [debug] [pid:1692] Preparing Mojo::IOLoop::ReadWriteProcess::Session
# [2021-11-25T14:21:29.415915Z] [debug] [pid:1692] Job settings:
...
# [2021-11-25T14:21:29.416569Z] [debug] [pid:1692] Linked asset "/tmp/XKQPZFxlD7/full-stack.d/openqa/share/factory/iso/Core-7.2.iso" to "/tmp/XKQPZFxlD7/full-stack.d/openqa/pool/1/Core-7.2.iso"
# [2021-11-25T14:21:29.416889Z] [debug] [pid:1692] Symlinked from "/tmp/XKQPZFxlD7/full-stack.d/openqa/share/tests/tinycore" to "/tmp/XKQPZFxlD7/full-stack.d/openqa/pool/1/tinycore"
# [2021-11-25T14:21:29.417208Z] [info] [pid:1692] Preparing cgroup to start isotovideo
# [2021-11-25T14:21:29.418208Z] [warn] [pid:1692] Disabling cgroup usage because cgroup creation failed: mkdir /sys/fs/cgroup/systemd/docker: Permission denied at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 84.
# [2021-11-25T14:21:29.418338Z] [info] [pid:1692] You can define a custom slice with OPENQA_CGROUP_SLICE or indicating the base mount with MOJO_CGROUP_FS.
# [2021-11-25T14:21:29.418630Z] [info] [pid:1692] Starting isotovideo container
# [2021-11-25T14:21:29.420870Z] [debug] [pid:1692] Registered process:1699
# [2021-11-25T14:21:29.421030Z] [info] [pid:1699] 1699: WORKING 1
# [2021-11-25T14:21:29.421250Z] [info] [pid:1692] isotovideo has been started (PID: 1699)
# [2021-11-25T14:21:29.421762Z] [debug] [pid:1692] Running job 1 from http://localhost:9526: 00000001-tinycore-1-flavor-i386-Build1-core@coolone.
# [2021-11-25T14:21:29.423493Z] [debug] [pid:1692] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2021-11-25T14:21:29.584343Z] [debug] [pid:1692] Upload concluded (no current module)
# [2021-11-25T14:21:31.665487Z] [debug] [pid:1692] Starting livelog
# [2021-11-25T14:21:31.667572Z] [debug] [pid:1692] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2021-11-25T14:21:31.786098Z] [debug] [pid:1692] Upload concluded (no current module)
# [2021-11-25T14:21:32.033447Z] [debug] [pid:1692] Stopping livelog
# [2021-11-25T14:21:32.826565Z] [debug] [pid:1692] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2021-11-25T14:21:32.986391Z] [debug] [pid:1692] Upload concluded (no current module)
# [2021-11-25T14:21:42.994183Z] [debug] [pid:1692] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2021-11-25T14:21:43.090318Z] [debug] [pid:1692] Upload concluded (at boot)
# [2021-11-25T14:21:53.094533Z] [debug] [pid:1692] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2021-11-25T14:21:53.394912Z] [debug] [pid:1927] Optimizing /tmp/XKQPZFxlD7/full-stack.d/openqa/pool/1/testresults/boot-1.png
# [2021-11-25T14:21:53.632212Z] [debug] [pid:1927] Uploading artefact boot-1.png as f4184468dbb276a03240907d0e7e500d
# [2021-11-25T14:21:53.865849Z] [debug] [pid:1692] Starting livelog
# [2021-11-25T14:21:53.867995Z] [debug] [pid:1692] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2021-11-25T14:21:53.996201Z] [debug] [pid:1927] Optimizing /tmp/XKQPZFxlD7/full-stack.d/openqa/pool/1/testresults/.thumbs/boot-1.png
# [2021-11-25T14:21:54.026295Z] [debug] [pid:1927] Uploading artefact boot-1.png as f4184468dbb276a03240907d0e7e500d
Bailout called.  Further testing stopped:  Job 1 produced the wrong results
FAILED--Further testing stopped: Job 1 produced the wrong results

That's the one, right?

Can someone else please try to reproduce locally?

EDIT: 2021-11-26: Based on a suggestion of tinita I ran

make coverage FULLSTACK=1 RETRY=50 STABILITY_TEST=1 TESTS=t/full-stack.t

and encountered

Rerun 16 of 50 …
Calling retry hook ./tools/delete-coverdb-folder
Deleting cover_db/
t/full-stack.t .. 17/? Bailout called.  Further testing stopped:  clickElement: element not interactable at /home/okurz/local/os-autoinst/openQA/t/lib/OpenQA/SeleniumTest.pm:81
FAILED--Further testing stopped: clickElement: element not interactable at /home/okurz/local/os-autoinst/openQA/t/lib/OpenQA/SeleniumTest.pm:81
make[2]: *** [Makefile:196: test-unit-and-integration] Error 255

This does not look like the same and unfortunately there is no stack trace of where in t/full-stack.t that failed itself.

Trying with

diff --git a/t/lib/OpenQA/SeleniumTest.pm b/t/lib/OpenQA/SeleniumTest.pm
index e05f977ed..1ae0f2e11 100644
--- a/t/lib/OpenQA/SeleniumTest.pm
+++ b/t/lib/OpenQA/SeleniumTest.pm
@@ -16,6 +16,7 @@ our @EXPORT = qw(driver_missing check_driver_modules enable_timeout
   wait_until wait_until_element_gone wait_for_element
   element_prop element_prop_by_selector map_elements);

+use Carp;
 use Data::Dump 'pp';
 use IPC::Run qw(start);
 use Mojo::IOLoop::Server;
@@ -81,7 +82,7 @@ sub start_driver {
                 $err .= ' at ' . __FILE__ . ':' . __LINE__;    # uncoverable statement

                 # prevent aborting the complete test when interactively debugging
-                $INC{'perl5db.pl'} ? fail $err : BAIL_OUT($err);    # uncoverable statement
+                $INC{'perl5db.pl'} ? fail $err : confess($err);    # uncoverable statement
             },
         );

then I got

Rerun 2 of 50 …
Calling retry hook ./tools/delete-coverdb-folder
Deleting cover_db/
t/full-stack.t .. 31/?     # Unexpected Javascript console errors: [
    #   {
    #     level     => "SEVERE",
    #     message   => "http://localhost:9526/tests/1/images/thumb/boot-1.png - Failed to load resource: the server responded with a status of 404 (Not Found)",
    #     source    => "network",
    #     timestamp => 1637926799082,
    #   },
    # ]
    # full result panel contents:
    # State: running, started less than a minute ago
    # Scheduled product: job has not been created by posting an ISO
    # Assigned worker: linux-28d7:1

    #   Failed test 'Expected result not found'
    #   at /home/okurz/local/os-autoinst/openQA/t/lib/OpenQA/Test/FullstackUtils.pm line 84.

    #   Failed test 'job 1 passed'
    #   at t/full-stack.t line 197.
    # 0

    #   Failed test 'autoinst log file generated'
    #   at t/full-stack.t line 199.
    # Looks like you failed 3 tests of 6.
t/full-stack.t .. 32/? 
#   Failed test 'schedule job'
#   at t/full-stack.t line 218.
# unable to read /tmp/WlnajfL08s/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt: Can't open file "/tmp/WlnajfL08s/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt": No such file or directory at t/full-stack.t line 161.
# /tmp/WlnajfL08s/full-stack.d/openqa/testresults/../pool/1/worker-log.txt:
# [2021-11-26T12:39:40.799425+01:00] [debug] [pid:7029] Preparing Mojo::IOLoop::ReadWriteProcess::Session
# [2021-11-26T12:39:40.801345+01:00] [debug] [pid:7029] Job settings:
# [2021-11-26T12:39:40.801766+01:00] [debug] [pid:7029] 
#     ARCH=i386
#     BUILD=1
#     DISTRI=tinycore
#     FLAVOR=flavor
#     INTEGRATION_TESTS=1
#     ISO=Core-7.2.iso
#     JOBTOKEN=y3sRD950IaKP0uKl
#     LOG_LEVEL=debug
#     MACHINE=coolone
#     NAME=00000001-tinycore-1-flavor-i386-Build1-core@coolone
#     OPENQA_HOSTNAME=localhost:9526
#     OPENQA_URL=http://localhost:9526
#     PAUSE_AT=shutdown
#     PRJDIR=/tmp/WlnajfL08s/full-stack.d/openqa/share
#     PUBLISH_HDD_1=core-hdd.qcow2
#     QEMU=i386
#     QEMUPORT=20012
#     QEMU_NO_FDC_SET=1
#     QEMU_NO_TABLET=1
#     RETRY_DELAY=5
#     RETRY_DELAY_IF_WEBUI_BUSY=60
#     TEST=core
#     UEFI_PFLASH_VARS=/usr/share/qemu/ovmf-x86_64.bin
#     VERSION=1
#     VNC=91
#     WORKER_CLASS=qemu_i386,qemu_x86_64
#     WORKER_HOSTNAME=127.0.0.1
#     WORKER_ID=1
#     WORKER_INSTANCE=1
# [2021-11-26T12:39:40.803070+01:00] [debug] [pid:7029] Linked asset "/tmp/WlnajfL08s/full-stack.d/openqa/share/factory/iso/Core-7.2.iso" to "/tmp/WlnajfL08s/full-stack.d/openqa/pool/1/Core-7.2.iso"
# [2021-11-26T12:39:40.803850+01:00] [debug] [pid:7029] Symlinked from "/tmp/WlnajfL08s/full-stack.d/openqa/share/tests/tinycore" to "/tmp/WlnajfL08s/full-stack.d/openqa/pool/1/tinycore"
# [2021-11-26T12:39:40.804481+01:00] [info] [pid:7029] Preparing cgroup to start isotovideo
# [2021-11-26T12:39:40.806333+01:00] [warn] [pid:7029] Disabling cgroup usage because cgroup creation failed: mkdir /sys/fs/cgroup/systemd/user.slice/user-1000.slice/session-2.scope/1: Permission denied at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 84.
# [2021-11-26T12:39:40.806620+01:00] [info] [pid:7029] You can define a custom slice with OPENQA_CGROUP_SLICE or indicating the base mount with MOJO_CGROUP_FS.
# [2021-11-26T12:39:40.807416+01:00] [info] [pid:7029] Starting isotovideo container
# [2021-11-26T12:39:40.810056+01:00] [debug] [pid:7029] Registered process:7064
# [2021-11-26T12:39:40.810286+01:00] [info] [pid:7064] 7064: WORKING 1
# [2021-11-26T12:39:40.810724+01:00] [info] [pid:7029] isotovideo has been started (PID: 7064)
# [2021-11-26T12:39:40.811639+01:00] [debug] [pid:7029] Running job 1 from http://localhost:9526: 00000001-tinycore-1-flavor-i386-Build1-core@coolone.
# [2021-11-26T12:39:40.814680+01:00] [debug] [pid:7029] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2021-11-26T12:39:40.968688+01:00] [debug] [pid:7029] Upload concluded (no current module)
# [2021-11-26T12:39:46.186887+01:00] [debug] [pid:7029] Starting livelog
# [2021-11-26T12:39:46.189172+01:00] [debug] [pid:7029] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2021-11-26T12:39:46.457583+01:00] [debug] [pid:7029] Upload concluded (no current module)
# [2021-11-26T12:39:46.812095+01:00] [debug] [pid:7029] Stopping livelog
# [2021-11-26T12:39:47.459390+01:00] [debug] [pid:7029] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2021-11-26T12:39:47.578622+01:00] [debug] [pid:7029] Upload concluded (no current module)
# [2021-11-26T12:39:57.591256+01:00] [debug] [pid:7029] Unable to read result-shutdown.json: Can't open file "/tmp/WlnajfL08s/full-stack.d/openqa/pool/1/testresults/result-shutdown.json": No such file or directory at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 132.
#       Mojo::File::slurp(Mojo::File=SCALAR(0x5559bd2b7740)) called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 1098
#       eval {...} called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 1098
#       OpenQA::Worker::Job::_read_json_file(OpenQA::Worker::Job=HASH(0x5559bd264330), "result-shutdown.json") called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 1154
#       OpenQA::Worker::Job::_read_module_result(OpenQA::Worker::Job=HASH(0x5559bd264330), "shutdown") called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 1116
#       OpenQA::Worker::Job::_read_result_file(OpenQA::Worker::Job=HASH(0x5559bd264330), "shutdown", ARRAY(0x5559bd126d78)) called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 786
#       OpenQA::Worker::Job::_upload_results_step_0_prepare(OpenQA::Worker::Job=HASH(0x5559bd264330), CODE(0x5559bd126c70)) called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 715
#       OpenQA::Worker::Job::_upload_results(OpenQA::Worker::Job=HASH(0x5559bd264330), CODE(0x5559bd126c70)) called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 903
#       OpenQA::Worker::Job::__ANON__(Mojo::IOLoop=HASH(0x5559baf466a0)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop.pm line 220
…
# [2021-11-26T12:39:58.926528+01:00] [debug] [pid:7172] Uploading artefact boot-2.png as e3b43b69c91fe4072dac1037f416c281
# [2021-11-26T12:39:58.992464+01:00] [debug] [pid:7172] Optimizing /tmp/WlnajfL08s/full-stack.d/openqa/pool/1/testresults/.thumbs/boot-2.png
# [2021-11-26T12:39:59.001311+01:00] [debug] [pid:7172] Uploading artefact boot-2.png as e3b43b69c91fe4072dac1037f416c281
# [2021-11-26T12:39:59.070054+01:00] [debug] [pid:7172] Optimizing /tmp/WlnajfL08s/full-stack.d/openqa/pool/1/testresults/boot-1.png
Bailout called.  Further testing stopped:  Job 1 produced the wrong results

so to me from the above logs it looks like the openQA job is about to conclude as there is still uploading and optimizing going on. This is also why autoinst-log.txt can not yet be in the testresults directory:

# unable to read /tmp/WlnajfL08s/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt: Can't open file "/tmp/WlnajfL08s/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt": No such file or directory
Actions #11

Updated by okurz over 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

some ideas where I can continue.

Based on a suggestion of tinita I ran

make coverage FULLSTACK=1 RETRY=50 STABILITY_TEST=1 TESTS=t/full-stack.t

and encountered

Rerun 16 of 50 …
Calling retry hook ./tools/delete-coverdb-folder
Deleting cover_db/
t/full-stack.t .. 17/? Bailout called.  Further testing stopped:  clickElement: element not interactable at /home/okurz/local/os-autoinst/openQA/t/lib/OpenQA/SeleniumTest.pm:81
FAILED--Further testing stopped: clickElement: element not interactable at /home/okurz/local/os-autoinst/openQA/t/lib/OpenQA/SeleniumTest.pm:81
make[2]: *** [Makefile:196: test-unit-and-integration] Error 255

This does not look like the same and unfortunately there is no stack trace of where in t/full-stack.t that failed itself.

Trying with

diff --git a/t/lib/OpenQA/SeleniumTest.pm b/t/lib/OpenQA/SeleniumTest.pm
index e05f977ed..1ae0f2e11 100644
--- a/t/lib/OpenQA/SeleniumTest.pm
+++ b/t/lib/OpenQA/SeleniumTest.pm
@@ -16,6 +16,7 @@ our @EXPORT = qw(driver_missing check_driver_modules enable_timeout
   wait_until wait_until_element_gone wait_for_element
   element_prop element_prop_by_selector map_elements);

+use Carp;
 use Data::Dump 'pp';
 use IPC::Run qw(start);
 use Mojo::IOLoop::Server;
@@ -81,7 +82,7 @@ sub start_driver {
                 $err .= ' at ' . __FILE__ . ':' . __LINE__;    # uncoverable statement

                 # prevent aborting the complete test when interactively debugging
-                $INC{'perl5db.pl'} ? fail $err : BAIL_OUT($err);    # uncoverable statement
+                $INC{'perl5db.pl'} ? fail $err : confess($err);    # uncoverable statement
             },
         );

then I got

Rerun 2 of 50 …
Calling retry hook ./tools/delete-coverdb-folder
Deleting cover_db/
t/full-stack.t .. 31/?     # Unexpected Javascript console errors: [
    #   {
    #     level     => "SEVERE",
    #     message   => "http://localhost:9526/tests/1/images/thumb/boot-1.png - Failed to load resource: the server responded with a status of 404 (Not Found)",
    #     source    => "network",
    #     timestamp => 1637926799082,
    #   },
    # ]
    # full result panel contents:
    # State: running, started less than a minute ago
    # Scheduled product: job has not been created by posting an ISO
    # Assigned worker: linux-28d7:1

    #   Failed test 'Expected result not found'
    #   at /home/okurz/local/os-autoinst/openQA/t/lib/OpenQA/Test/FullstackUtils.pm line 84.

    #   Failed test 'job 1 passed'
    #   at t/full-stack.t line 197.
    # 0

    #   Failed test 'autoinst log file generated'
    #   at t/full-stack.t line 199.
    # Looks like you failed 3 tests of 6.
t/full-stack.t .. 32/? 
#   Failed test 'schedule job'
#   at t/full-stack.t line 218.
# unable to read /tmp/WlnajfL08s/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt: Can't open file "/tmp/WlnajfL08s/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt": No such file or directory at t/full-stack.t line 161.
# /tmp/WlnajfL08s/full-stack.d/openqa/testresults/../pool/1/worker-log.txt:
# [2021-11-26T12:39:40.799425+01:00] [debug] [pid:7029] Preparing Mojo::IOLoop::ReadWriteProcess::Session
# [2021-11-26T12:39:40.801345+01:00] [debug] [pid:7029] Job settings:
# [2021-11-26T12:39:40.801766+01:00] [debug] [pid:7029] 
#     ARCH=i386
#     BUILD=1
#     DISTRI=tinycore
#     FLAVOR=flavor
#     INTEGRATION_TESTS=1
#     ISO=Core-7.2.iso
#     JOBTOKEN=y3sRD950IaKP0uKl
#     LOG_LEVEL=debug
#     MACHINE=coolone
#     NAME=00000001-tinycore-1-flavor-i386-Build1-core@coolone
#     OPENQA_HOSTNAME=localhost:9526
#     OPENQA_URL=http://localhost:9526
#     PAUSE_AT=shutdown
#     PRJDIR=/tmp/WlnajfL08s/full-stack.d/openqa/share
#     PUBLISH_HDD_1=core-hdd.qcow2
#     QEMU=i386
#     QEMUPORT=20012
#     QEMU_NO_FDC_SET=1
#     QEMU_NO_TABLET=1
#     RETRY_DELAY=5
#     RETRY_DELAY_IF_WEBUI_BUSY=60
#     TEST=core
#     UEFI_PFLASH_VARS=/usr/share/qemu/ovmf-x86_64.bin
#     VERSION=1
#     VNC=91
#     WORKER_CLASS=qemu_i386,qemu_x86_64
#     WORKER_HOSTNAME=127.0.0.1
#     WORKER_ID=1
#     WORKER_INSTANCE=1
# [2021-11-26T12:39:40.803070+01:00] [debug] [pid:7029] Linked asset "/tmp/WlnajfL08s/full-stack.d/openqa/share/factory/iso/Core-7.2.iso" to "/tmp/WlnajfL08s/full-stack.d/openqa/pool/1/Core-7.2.iso"
# [2021-11-26T12:39:40.803850+01:00] [debug] [pid:7029] Symlinked from "/tmp/WlnajfL08s/full-stack.d/openqa/share/tests/tinycore" to "/tmp/WlnajfL08s/full-stack.d/openqa/pool/1/tinycore"
# [2021-11-26T12:39:40.804481+01:00] [info] [pid:7029] Preparing cgroup to start isotovideo
# [2021-11-26T12:39:40.806333+01:00] [warn] [pid:7029] Disabling cgroup usage because cgroup creation failed: mkdir /sys/fs/cgroup/systemd/user.slice/user-1000.slice/session-2.scope/1: Permission denied at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 84.
# [2021-11-26T12:39:40.806620+01:00] [info] [pid:7029] You can define a custom slice with OPENQA_CGROUP_SLICE or indicating the base mount with MOJO_CGROUP_FS.
# [2021-11-26T12:39:40.807416+01:00] [info] [pid:7029] Starting isotovideo container
# [2021-11-26T12:39:40.810056+01:00] [debug] [pid:7029] Registered process:7064
# [2021-11-26T12:39:40.810286+01:00] [info] [pid:7064] 7064: WORKING 1
# [2021-11-26T12:39:40.810724+01:00] [info] [pid:7029] isotovideo has been started (PID: 7064)
# [2021-11-26T12:39:40.811639+01:00] [debug] [pid:7029] Running job 1 from http://localhost:9526: 00000001-tinycore-1-flavor-i386-Build1-core@coolone.
# [2021-11-26T12:39:40.814680+01:00] [debug] [pid:7029] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2021-11-26T12:39:40.968688+01:00] [debug] [pid:7029] Upload concluded (no current module)
# [2021-11-26T12:39:46.186887+01:00] [debug] [pid:7029] Starting livelog
# [2021-11-26T12:39:46.189172+01:00] [debug] [pid:7029] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2021-11-26T12:39:46.457583+01:00] [debug] [pid:7029] Upload concluded (no current module)
# [2021-11-26T12:39:46.812095+01:00] [debug] [pid:7029] Stopping livelog
# [2021-11-26T12:39:47.459390+01:00] [debug] [pid:7029] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
# [2021-11-26T12:39:47.578622+01:00] [debug] [pid:7029] Upload concluded (no current module)
# [2021-11-26T12:39:57.591256+01:00] [debug] [pid:7029] Unable to read result-shutdown.json: Can't open file "/tmp/WlnajfL08s/full-stack.d/openqa/pool/1/testresults/result-shutdown.json": No such file or directory at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 132.
#       Mojo::File::slurp(Mojo::File=SCALAR(0x5559bd2b7740)) called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 1098
#       eval {...} called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 1098
#       OpenQA::Worker::Job::_read_json_file(OpenQA::Worker::Job=HASH(0x5559bd264330), "result-shutdown.json") called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 1154
#       OpenQA::Worker::Job::_read_module_result(OpenQA::Worker::Job=HASH(0x5559bd264330), "shutdown") called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 1116
#       OpenQA::Worker::Job::_read_result_file(OpenQA::Worker::Job=HASH(0x5559bd264330), "shutdown", ARRAY(0x5559bd126d78)) called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 786
#       OpenQA::Worker::Job::_upload_results_step_0_prepare(OpenQA::Worker::Job=HASH(0x5559bd264330), CODE(0x5559bd126c70)) called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 715
#       OpenQA::Worker::Job::_upload_results(OpenQA::Worker::Job=HASH(0x5559bd264330), CODE(0x5559bd126c70)) called at /home/okurz/local/os-autoinst/openQA/script/../lib/OpenQA/Worker/Job.pm line 903
#       OpenQA::Worker::Job::__ANON__(Mojo::IOLoop=HASH(0x5559baf466a0)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop.pm line 220
…
# [2021-11-26T12:39:58.926528+01:00] [debug] [pid:7172] Uploading artefact boot-2.png as e3b43b69c91fe4072dac1037f416c281
# [2021-11-26T12:39:58.992464+01:00] [debug] [pid:7172] Optimizing /tmp/WlnajfL08s/full-stack.d/openqa/pool/1/testresults/.thumbs/boot-2.png
# [2021-11-26T12:39:59.001311+01:00] [debug] [pid:7172] Uploading artefact boot-2.png as e3b43b69c91fe4072dac1037f416c281
# [2021-11-26T12:39:59.070054+01:00] [debug] [pid:7172] Optimizing /tmp/WlnajfL08s/full-stack.d/openqa/pool/1/testresults/boot-1.png
Bailout called.  Further testing stopped:  Job 1 produced the wrong results

so to me from the above logs it looks like the openQA job is about to conclude as there is still uploading and optimizing going on. This is also why autoinst-log.txt can not yet be in the testresults directory:

# unable to read /tmp/WlnajfL08s/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt: Can't open file "/tmp/WlnajfL08s/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt": No such file or directory

so I am proposing some improvements, see the updated https://github.com/os-autoinst/openQA/pull/4373

Actions #12

Updated by openqa_review over 2 years ago

  • Due date set to 2021-12-11

Setting due date based on mean cycle time of SUSE QE Tools

Actions #13

Updated by okurz over 2 years ago

  • Status changed from In Progress to Feedback

Updated https://github.com/os-autoinst/openQA/pull/4373 with more. That should fix our recent problems because we fail too early on javascript errors.

Actions #14

Updated by okurz over 2 years ago

Another improvement PR https://github.com/os-autoinst/openQA/pull/4379 , merged.

Actions #15

Updated by okurz over 2 years ago

  • Status changed from Feedback to In Progress
Actions #16

Updated by okurz over 2 years ago

  • Due date deleted (2021-12-11)
  • Status changed from In Progress to Resolved

I have not found further problems for the further specific mentioned subtest. I assume after #103422 we will find more

Actions #17

Updated by livdywan over 2 years ago

okurz wrote:

I have not found further problems for the further specific mentioned subtest. I assume after #103422 we will find more

Can you clarify how the failure in os-autoinst / 13-osutils.t is related here?

Actions #18

Updated by okurz over 2 years ago

cdywan wrote:

okurz wrote:

I have not found further problems for the further specific mentioned subtest. I assume after #103422 we will find more

Can you clarify how the failure in os-autoinst / 13-osutils.t is related here?

well, the mentioned error showed problems due to the update of perl-Mojo-IOLoop-ReadWriteProcess which in the meantime we have addressed with basically taking over the project, fixing the bugs, improving the tests, etc. And that can also impact the full-stack test.

Actions

Also available in: Atom PDF