Project

General

Profile

Actions

action #59294

closed

[openqa-in-openqa] test fails in test_running - worker does not pick up jobs

Added by okurz about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2019-11-11
Due date:
% Done:

0%

Estimated time:

Description

Observation

openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install+publish@64bit-2G fails in
test_running as no test is running. Job stays in scheduled.
https://openqa.opensuse.org/tests/1081970/file/test_running-openqa_services.log shows what looks like the worker does not get a valid default class anymore. This could actually be an old issue since the according tests were showing false positives for months, possibly the "worker rework".

Reproducible

Every time

Expected result

Job should be running on the worker using a "default" fall back worker class.

Impact

This blocks http://jenkins.qa.suse.de/view/openQA-in-openQA/ and hence prevents any further automatic updates of openQA and os-autoinst packages to openSUSE:Factory

Further details

Always latest result in this scenario: latest

Actions #1

Updated by okurz about 5 years ago

  • Description updated (diff)
Actions #2

Updated by okurz about 5 years ago

  • Status changed from New to Resolved
  • Assignee set to okurz
  • Target version set to Current Sprint
git show
commit ff3c8a0 (HEAD -> fix/runner, okurz/fix/runner)
Author: Oliver Kurz <okurz@suse.de>
Date:   Wed Nov 13 20:01:23 2019 +0100

    [WIP] -- debugging why test does not start

diff --git a/tests/osautoinst/test_running.pm b/tests/osautoinst/test_running.pm
index 278eea5..525fbfa 100644
--- a/tests/osautoinst/test_running.pm
+++ b/tests/osautoinst/test_running.pm
@@ -3,6 +3,7 @@ use base "openQAcoretest";
 use testapi;

 sub run {
+    wait_serial 'CONTINUE', 86400;
     assert_script_run 'command -v ack >/dev/null || zypper --no-refresh -n in ack';
     assert_script_run 'ret=false; for i in {1..5} ; do openqa-client jobs state=running | ack --passthru --color running && ret=true && break ; sleep 30 ; done ; [ "$ret" = "true" ]', 300;
     save_screenshot;
git ci -m "[WIP] -- debugging why test does not start" -a && git push okurz -f && openqa-clone-job --within-instance https://openqa.opensuse.org --parental-inheritance --skip-chained-deps 1084488 BUILD= _GROUP=0 CASEDIR=https://github.com/okurz/os-autoinst-distri-openQA.git#fix/runner MAX_JOB_TIME=87400

-> https://openqa.opensuse.org/t1084509

Debugged manually and found multiple issues:

  1. The worker complains that the file workers.ini is empty (only commented content) which it "treats as error"
  2. The scheduler was actually never started
  3. Tests based on os-autoinst-distri-opensuse would fail early on missing dependencies so os-autoinst-distri-opensuse-deps are needed

https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/52

Actions

Also available in: Atom PDF