Project

General

Profile

action #120441

OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow

Added by Julie_CAO 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
Start date:
2022-11-15
Due date:
2022-12-08
% Done:

100%

Estimated time:

Description

Observation

A few multi-machine parallel jobs began to fail in build #40.1. We suspect it is related to the worker securiy zone migration.

https://openqa.suse.de/tests/9949098/logfile?filename=autoinst-log.txt

[2022-11-14T04:46:20.162957+01:00] [debug] get_job_autoinst_url: No worker info for job 9949097 available.
[2022-11-14T04:46:25.037839+01:00] [debug] get_job_autoinst_url: No worker info for job 9949097 available.
[2022-11-14T04:46:28.025981+01:00] [debug] get_job_autoinst_url: No worker info for job 9949097 available.

failing jobs:
https://openqa.suse.de/tests/9955076
https://openqa.suse.de/tests/9949098
https://openqa.suse.de/tests/9944079
https://openqa.suse.de/tests/9926626

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#120441

Acceptance criteria

  • AC1: No more similar failures

Suggestions

  • Use auto-review to find similar issues and if it can be reproduced
  • Call openqa-label-known-issues to find and retrigger all unhandled occurences
  • Review the code related to the error strings
  • Investigate warnings visible in the logs. If not related report as separate tickets, e.g. to test maintainers

Related issues

Copied to openQA Tests - action #120789: [virtualization] tests fail to upload to qadb on dbproxy.suse.de with "Access denied, this account is locked"Resolved

History

#1 Updated by okurz 3 months ago

  • Priority changed from Normal to High
  • Target version set to Ready

#4 Updated by okurz 3 months ago

  • Parent task set to #116623

#5 Updated by cdywan 3 months ago

  • Subject changed from OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" to OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow
  • Description updated (diff)
  • Status changed from New to Workable

#6 Updated by xlai 3 months ago

  • Priority changed from High to Urgent

okurz cdywan This issue is blocking VT team from testing the P1 tests , guest migration , which requires 2 machines to do parallel jobs. Suggest to bump the priority to Urgent.

#7 Updated by okurz 3 months ago

  • Status changed from Workable to Feedback
  • Assignee set to okurz

Traffic between .oqa.suse.de and .qa.suse.de unblocked, see #120264#note-9. I retriggered a job, https://openqa.suse.de/tests/10003496 is scheduled.

However
https://openqa.suse.de/tests/9949098/logfile?filename=autoinst-log.txt
clearly states

2022-11-14 04:47:00 +0100   INFO    qa_db_report    set_user: Attempting to log in qadb_report to database qadb on dbproxy.suse.de
DBI connect('qadb:dbproxy.suse.de','qadb_report',...) failed: Access denied, this account is locked at /usr/share/qa/lib/db_common.pm line 131.
ERROR: could not connect to: DBI:mysql:qadb:dbproxy.suse.de:
Access denied, this account is locked

which does not look like it's related to a change in network config. Reported in #120789

#8 Updated by okurz 3 months ago

  • Copied to action #120789: [virtualization] tests fail to upload to qadb on dbproxy.suse.de with "Access denied, this account is locked" added

#9 Updated by Julie_CAO 3 months ago

thank you, okurz I'll monitor the tests.

About that 'qadb_report' failure, we were aware of this issue. We have not handled it because we are busy with sle15sp5 beta1 and snapshots testing, and lack of working workers to verify changes.

#10 Updated by Julie_CAO 3 months ago

Traffic between .oqa.suse.de and .qa.suse.de unblocked, see #120264#note-9. I retriggered a job, https://openqa.suse.de/tests/10003496 is scheduled.

the test failed with

[2022-11-21T21:06:33.302647+01:00] [debug] init needles from sle/products/sle/needles
[2022-11-21T21:06:34.525093+01:00] [debug] loaded 11779 needles
Need variable WORKER_HOSTNAME at /usr/lib/os-autoinst/backend/ipmi.pm line 14.
    backend::ipmi::new("backend::ipmi") called at /usr/lib/os-autoinst/backend/driver.pm line 34
    backend::driver::new("backend::driver", "ipmi") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Backend.pm line 14
    OpenQA::Isotovideo::Backend::new("OpenQA::Isotovideo::Backend") called at /usr/bin/isotovideo line 260
[2022-11-21T21:06:34.573393+01:00] [debug] stopping command server 10320 because test execution ended through exception

#12 Updated by cdywan 3 months ago

Julie_CAO wrote:

the test failed with

Need variable WORKER_HOSTNAME at /usr/lib/os-autoinst/backend/ipmi.pm line 14.

This is hitting #120261#note-29 which is being investigated atm - it seems like some old version wasn't updated there which led to the issue

#13 Updated by cdywan 3 months ago

  • Related to action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meow added

#14 Updated by mkittler 3 months ago

@Julie_CAO This is a different issue and handled by #120261. It should be good now as worker2 was deployed this morning.

#15 Updated by mkittler 3 months ago

  • Related to deleted (action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meow)

#16 Updated by okurz 2 months ago

  • Due date set to 2022-12-08

$ openqa-query-for-job-label poo#120441
9972605|2022-11-16 07:11:41|done|failed|virt-guest-migration-developing-from-developing-to-developing-kvm-src||worker2
9960985|2022-11-15 11:15:28|done|parallel_failed|virt-guest-migration-sles12sp5-from-sles12sp5-to-developing-kvm-src||worker2
9960984|2022-11-15 11:15:17|done|failed|virt-guest-migration-sles12sp5-from-sles12sp5-to-developing-kvm-dst||worker2
9951736|2022-11-14 19:17:30|done|failed|virt-guest-migration-sles15sp4-from-sles15sp4-to-developing-xen-dst||worker2
9955076|2022-11-14 15:10:40|done|failed|virt-guest-migration-developing-from-developing-to-developing-kvm-src||worker2
9949099|2022-11-14 07:51:44|done|failed|virt-guest-migration-sles12sp5-from-sles12sp5-to-developing-xen-dst||worker2
9949098|2022-11-14 03:51:03|done|failed|virt-guest-migration-sles15sp4-from-sles15sp4-to-developing-kvm-src||worker2
9949152|2022-11-14 01:56:10|done|failed|virt-guest-migration-developing-from-developing-to-developing-xen-dst||worker2

So no more recent mentions.

Let's await results from currently scheduled https://openqa.suse.de/tests/10029577

#17 Updated by Julie_CAO 2 months ago

  • Status changed from Feedback to Resolved
  • % Done changed from 0 to 100

The issus should have been resolved as the parallel tests in latest build #50.1 did not meet this problem.

The scheduled test has not been run yet, so cancle it. Thank you for fixing.

Also available in: Atom PDF