action #120441
closedQA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
QA - coordination #116623: [epic] Migration of SUSE Nbg based openQA+QA+QAM systems to new security zones
OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow
100%
Description
Observation¶
A few multi-machine parallel jobs began to fail in build #40.1. We suspect it is related to the worker securiy zone migration.
https://openqa.suse.de/tests/9949098/logfile?filename=autoinst-log.txt
[2022-11-14T04:46:20.162957+01:00] [debug] get_job_autoinst_url: No worker info for job 9949097 available.
[2022-11-14T04:46:25.037839+01:00] [debug] get_job_autoinst_url: No worker info for job 9949097 available.
[2022-11-14T04:46:28.025981+01:00] [debug] get_job_autoinst_url: No worker info for job 9949097 available.
failing jobs:
https://openqa.suse.de/tests/9955076
https://openqa.suse.de/tests/9949098
https://openqa.suse.de/tests/9944079
https://openqa.suse.de/tests/9926626
Steps to reproduce¶
Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#120441
Acceptance criteria¶
- AC1: No more similar failures
Suggestions¶
- Use auto-review to find similar issues and if it can be reproduced
- Call openqa-label-known-issues to find and retrigger all unhandled occurences
- Review the code related to the error strings
- Investigate warnings visible in the logs. If not related report as separate tickets, e.g. to test maintainers
Updated by okurz about 2 years ago
- Priority changed from Normal to High
- Target version set to Ready
Updated by okurz about 2 years ago
- Related to coordination #116623: [epic] Migration of SUSE Nbg based openQA+QA+QAM systems to new security zones added
Updated by okurz about 2 years ago
- Related to deleted (coordination #116623: [epic] Migration of SUSE Nbg based openQA+QA+QAM systems to new security zones)
Updated by livdywan about 2 years ago
- Subject changed from OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" to OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow
- Description updated (diff)
- Status changed from New to Workable
Updated by xlai about 2 years ago
- Priority changed from High to Urgent
@okurz @cdywan This issue is blocking VT team from testing the P1 tests , guest migration , which requires 2 machines to do parallel jobs. Suggest to bump the priority to Urgent.
Updated by okurz about 2 years ago
- Status changed from Workable to Feedback
- Assignee set to okurz
Traffic between .oqa.suse.de and .qa.suse.de unblocked, see #120264#note-9. I retriggered a job, https://openqa.suse.de/tests/10003496 is scheduled.
However
https://openqa.suse.de/tests/9949098/logfile?filename=autoinst-log.txt
clearly states
2022-11-14 04:47:00 +0100 INFO qa_db_report set_user: Attempting to log in qadb_report to database qadb on dbproxy.suse.de
DBI connect('qadb:dbproxy.suse.de','qadb_report',...) failed: Access denied, this account is locked at /usr/share/qa/lib/db_common.pm line 131.
ERROR: could not connect to: DBI:mysql:qadb:dbproxy.suse.de:
Access denied, this account is locked
which does not look like it's related to a change in network config. Reported in #120789
Updated by okurz about 2 years ago
- Copied to action #120789: [virtualization] tests fail to upload to qadb on dbproxy.suse.de with "Access denied, this account is locked" added
Updated by Julie_CAO about 2 years ago
thank you, @okurz I'll monitor the tests.
About that 'qadb_report' failure, we were aware of this issue. We have not handled it because we are busy with sle15sp5 beta1 and snapshots testing, and lack of working workers to verify changes.
Updated by Julie_CAO about 2 years ago
Traffic between .oqa.suse.de and .qa.suse.de unblocked, see #120264#note-9. I retriggered a job, https://openqa.suse.de/tests/10003496 is scheduled.
the test failed with
[2022-11-21T21:06:33.302647+01:00] [debug] init needles from sle/products/sle/needles
[2022-11-21T21:06:34.525093+01:00] [debug] loaded 11779 needles
Need variable WORKER_HOSTNAME at /usr/lib/os-autoinst/backend/ipmi.pm line 14.
backend::ipmi::new("backend::ipmi") called at /usr/lib/os-autoinst/backend/driver.pm line 34
backend::driver::new("backend::driver", "ipmi") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Backend.pm line 14
OpenQA::Isotovideo::Backend::new("OpenQA::Isotovideo::Backend") called at /usr/bin/isotovideo line 260
[2022-11-21T21:06:34.573393+01:00] [debug] stopping command server 10320 because test execution ended through exception
Updated by livdywan about 2 years ago
Julie_CAO wrote:
the test failed with
Need variable WORKER_HOSTNAME at /usr/lib/os-autoinst/backend/ipmi.pm line 14.
This is hitting #120261#note-29 which is being investigated atm - it seems like some old version wasn't updated there which led to the issue
Updated by livdywan about 2 years ago
- Related to action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meow added
Updated by mkittler about 2 years ago
@Julie_CAO This is a different issue and handled by #120261. It should be good now as worker2 was deployed this morning.
Updated by mkittler about 2 years ago
- Related to deleted (action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meow)
Updated by okurz almost 2 years ago
- Due date set to 2022-12-08
$ openqa-query-for-job-label poo#120441
9972605|2022-11-16 07:11:41|done|failed|virt-guest-migration-developing-from-developing-to-developing-kvm-src||worker2
9960985|2022-11-15 11:15:28|done|parallel_failed|virt-guest-migration-sles12sp5-from-sles12sp5-to-developing-kvm-src||worker2
9960984|2022-11-15 11:15:17|done|failed|virt-guest-migration-sles12sp5-from-sles12sp5-to-developing-kvm-dst||worker2
9951736|2022-11-14 19:17:30|done|failed|virt-guest-migration-sles15sp4-from-sles15sp4-to-developing-xen-dst||worker2
9955076|2022-11-14 15:10:40|done|failed|virt-guest-migration-developing-from-developing-to-developing-kvm-src||worker2
9949099|2022-11-14 07:51:44|done|failed|virt-guest-migration-sles12sp5-from-sles12sp5-to-developing-xen-dst||worker2
9949098|2022-11-14 03:51:03|done|failed|virt-guest-migration-sles15sp4-from-sles15sp4-to-developing-kvm-src||worker2
9949152|2022-11-14 01:56:10|done|failed|virt-guest-migration-developing-from-developing-to-developing-xen-dst||worker2
So no more recent mentions.
Let's await results from currently scheduled https://openqa.suse.de/tests/10029577
Updated by Julie_CAO almost 2 years ago
- Status changed from Feedback to Resolved
- % Done changed from 0 to 100
The issus should have been resolved as the parallel tests in latest build #50.1 did not meet this problem.
The scheduled test has not been run yet, so cancle it. Thank you for fixing.