action #120441
OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow
100%
Description
Observation¶
A few multi-machine parallel jobs began to fail in build #40.1. We suspect it is related to the worker securiy zone migration.
https://openqa.suse.de/tests/9949098/logfile?filename=autoinst-log.txt
[2022-11-14T04:46:20.162957+01:00] [debug] get_job_autoinst_url: No worker info for job 9949097 available. [2022-11-14T04:46:25.037839+01:00] [debug] get_job_autoinst_url: No worker info for job 9949097 available. [2022-11-14T04:46:28.025981+01:00] [debug] get_job_autoinst_url: No worker info for job 9949097 available.
failing jobs:
https://openqa.suse.de/tests/9955076
https://openqa.suse.de/tests/9949098
https://openqa.suse.de/tests/9944079
https://openqa.suse.de/tests/9926626
Steps to reproduce¶
Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#120441
Acceptance criteria¶
- AC1: No more similar failures
Suggestions¶
- Use auto-review to find similar issues and if it can be reproduced
- Call openqa-label-known-issues to find and retrigger all unhandled occurences
- Review the code related to the error strings
- Investigate warnings visible in the logs. If not related report as separate tickets, e.g. to test maintainers
Related issues
History
#5
Updated by cdywan 3 months ago
- Subject changed from OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" to OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow
- Description updated (diff)
- Status changed from New to Workable
#7
Updated by okurz 3 months ago
- Status changed from Workable to Feedback
- Assignee set to okurz
Traffic between .oqa.suse.de and .qa.suse.de unblocked, see #120264#note-9. I retriggered a job, https://openqa.suse.de/tests/10003496 is scheduled.
However
https://openqa.suse.de/tests/9949098/logfile?filename=autoinst-log.txt
clearly states
2022-11-14 04:47:00 +0100 INFO qa_db_report set_user: Attempting to log in qadb_report to database qadb on dbproxy.suse.de DBI connect('qadb:dbproxy.suse.de','qadb_report',...) failed: Access denied, this account is locked at /usr/share/qa/lib/db_common.pm line 131. ERROR: could not connect to: DBI:mysql:qadb:dbproxy.suse.de: Access denied, this account is locked
which does not look like it's related to a change in network config. Reported in #120789
#8
Updated by okurz 3 months ago
- Copied to action #120789: [virtualization] tests fail to upload to qadb on dbproxy.suse.de with "Access denied, this account is locked" added
#10
Updated by Julie_CAO 3 months ago
Traffic between .oqa.suse.de and .qa.suse.de unblocked, see #120264#note-9. I retriggered a job, https://openqa.suse.de/tests/10003496 is scheduled.
the test failed with
[2022-11-21T21:06:33.302647+01:00] [debug] init needles from sle/products/sle/needles [2022-11-21T21:06:34.525093+01:00] [debug] loaded 11779 needles Need variable WORKER_HOSTNAME at /usr/lib/os-autoinst/backend/ipmi.pm line 14. backend::ipmi::new("backend::ipmi") called at /usr/lib/os-autoinst/backend/driver.pm line 34 backend::driver::new("backend::driver", "ipmi") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Backend.pm line 14 OpenQA::Isotovideo::Backend::new("OpenQA::Isotovideo::Backend") called at /usr/bin/isotovideo line 260 [2022-11-21T21:06:34.573393+01:00] [debug] stopping command server 10320 because test execution ended through exception
#12
Updated by cdywan 3 months ago
Julie_CAO wrote:
the test failed with
Need variable WORKER_HOSTNAME at /usr/lib/os-autoinst/backend/ipmi.pm line 14.
This is hitting #120261#note-29 which is being investigated atm - it seems like some old version wasn't updated there which led to the issue
#13
Updated by cdywan 3 months ago
- Related to action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meow added
#15
Updated by mkittler 3 months ago
- Related to deleted (action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meow)
#16
Updated by okurz 2 months ago
- Due date set to 2022-12-08
$ openqa-query-for-job-label poo#120441
9972605|2022-11-16 07:11:41|done|failed|virt-guest-migration-developing-from-developing-to-developing-kvm-src||worker2
9960985|2022-11-15 11:15:28|done|parallel_failed|virt-guest-migration-sles12sp5-from-sles12sp5-to-developing-kvm-src||worker2
9960984|2022-11-15 11:15:17|done|failed|virt-guest-migration-sles12sp5-from-sles12sp5-to-developing-kvm-dst||worker2
9951736|2022-11-14 19:17:30|done|failed|virt-guest-migration-sles15sp4-from-sles15sp4-to-developing-xen-dst||worker2
9955076|2022-11-14 15:10:40|done|failed|virt-guest-migration-developing-from-developing-to-developing-kvm-src||worker2
9949099|2022-11-14 07:51:44|done|failed|virt-guest-migration-sles12sp5-from-sles12sp5-to-developing-xen-dst||worker2
9949098|2022-11-14 03:51:03|done|failed|virt-guest-migration-sles15sp4-from-sles15sp4-to-developing-kvm-src||worker2
9949152|2022-11-14 01:56:10|done|failed|virt-guest-migration-developing-from-developing-to-developing-xen-dst||worker2
So no more recent mentions.
Let's await results from currently scheduled https://openqa.suse.de/tests/10029577