openQA-in-openQA tests always fail and results do not impact submission pipeline
https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=openqa&flavor=dev&machine=64bit-2G&test=openqa_from_containers&version=Tumbleweed#next_previous shows a longer history of jobs that are most often failed in various steps but the result is completely ignored in http://jenkins.qa.suse.de/view/openQA-in-openQA/
- pipeline reliable and mostly green
- failures in tests prevent the submission of new packages
- https://github.com/os-autoinst/scripts/blob/master/monitor-openqa_job#L26 JOB_ID uses one job ID only rather than all found job ID's
- Switching from
openqa-cliwould provide JSON output where we can easily handle multiple jobs e.g. via
jq(see also filter_id)
- Adjust or manually run Build command in http://jenkins.qa.suse.de/job/monitor-openQA_in_openQA-TW/configure
- Priority changed from Normal to High
hm, actually it seems that sometimes or maybe always the submission pipeline actually is blocked.
I removed the schedule part from https://openqa.opensuse.org/admin/job_templates/24 to allow the important fix for perl-Mojolicious to be submitted:
- openqa_from_containers: testsuite: null settings: OPENQA_CONTAINERS: '1' OPENQA_FROM_GIT: '1' # see: main.pm, avoid load_osautoinst_tests description: >- Maintainer: firstname.lastname@example.org Test for running openQA itself from containers. To be used with "openqa" distri.
I can see a failure in the 3 most recent worker tests:
# Test died: command 'docker logs openqa_worker 2>&1 | grep "API key and secret are needed" >/dev/null' failed at /var/lib/openqa/cache/openqa1-opensuse/tests/openqa/lib/utils.pm line 100.
Jane, Ivan and I were discussing this together a bit, some notes from that:
- in the logs you (don't) find this:
[debug] --------------------------[0m[2021-03-11T10:29:03.624 CET] [debug] /tests/containers/worker.pm:10 called utils::wait_for_container_log -> lib/utils.pm:95
$cmd log ...returns no logs
- Can we conditionally output all logs if the
groupmod GID '0' already exists
- 0 is passed via
groupmod -g 0 kvmwhich may not be the kvm group
- shouldn't we do
groupmod kvmwith no ID?
- 0 is passed via
- Assignee changed from Xiaojing_liu to ilausuch
#8 Updated by Xiaojing_liu 7 months ago
- Due date changed from 2021-03-25 to 2021-04-01
Moving up the due date due to hackweek
The new pr has been merged. I did a test if there is no
groupmod GID '0' already exists, the job will pass. See an example: https://openqa.opensuse.org/tests/1672546#
So after https://github.com/os-autoinst/openQA/pull/3787 got merged, we could add the test back.
ilausuch Are you going to add the test back?
- Due date changed from 2021-04-01 to 2021-04-09
https://github.com/os-autoinst/openQA/pull/3787 is under review
The PR got merged - what's the status on the openQA tests now? Could you please comment here on what, if anything is still to be done here, and update the status as needed?
I created a test to prove that this works now
Running this PR https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/65
I created this test with the env variable OPENQA_CONTAINERS=1
I found that fails eventually in the same way than #90614. I am preparing the same solution to retry when build the container images
I created this PR to check an alternative https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/66
Running test https://openqa.opensuse.org/tests/1700457
I a training session with Oliver and Christian we identify a problem that was affecting to the container tests. This was the first time it failed https://openqa.opensuse.org/tests/overview?distri=openqa&version=Tumbleweed&build=%3ATW.7835&groupid=24
And we created a needle to solve that
This is the running test with the new needle
- Status changed from In Progress to Resolved
I have tested the change from the PR locally against multiple jobs from o3 and it seemed to work, e.g. if one of the jobs fails it'll exit with a non-zero return code.
I've also re-triggered the Jeninks job and it failed (as expected as one of the previously triggered openQA jobs failed) leaving a comment on OBS, see:
- https://build.opensuse.org/project/show/devel:openQA#comment-1477717 (
openQA-in-openQA test(s) failed (job IDs: 1802764), see https://openqa.opensuse.org/tests/overview?version=Tumbleweed&groupid=24)
Note that copying the file with the job IDs from trigger-openQA_in_openQA-TW works. It is currently not shown under http://jenkins.qa.suse.de/job/monitor-openQA_in_openQA-TW/ because since the change there hasn't been a successful run and it shows only the artifact produced by the last successful run. The console log shows clearly that the expected jobs have been considered (
+ echo 'Result of job 1802764: failed',
+ echo 'Result of job 1802766: passed',
+ echo 'Result of job 1802766: passed'), though.