action #91163
closedMany jobs on OSD and o3 are incomplete because of auto_review:"backend died: missing input at /usr/lib/os-autoinst/bmwqemu.pm line 202"
Description
Observation¶
Many jobs on OSD and o3 are incomplete since the last deployment. The reason is backend died: missing input at /usr/lib/os-autoinst/bmwqemu.pm line 202.
The log message showed:
[0m[37m[2021-04-15T02:47:43.747 CEST] [debug] led state 0 1 1 -261
[0mUse of uninitialized value $message in scalar chomp at /usr/lib/os-autoinst/backend/qemu.pm line 155.
Use of uninitialized value $rt in numeric eq (==) at /usr/lib/os-autoinst/backend/qemu.pm line 156.
Use of uninitialized value $message in scalar chomp at /usr/lib/os-autoinst/backend/qemu.pm line 155.
Use of uninitialized value $rt in numeric eq (==) at /usr/lib/os-autoinst/backend/qemu.pm line 156.
[37m[2021-04-15T02:47:43.873 CEST] [debug] Open vSwitch networking status:
[0m[33m[2021-04-15T02:47:43.874 CEST] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
missing input at /usr/lib/os-autoinst/bmwqemu.pm line 202.
bmwqemu::diag(undef) called at /usr/lib/os-autoinst/backend/qemu.pm line 1068
backend::qemu::start_qemu(backend::qemu=HASH(0x55bee5a31e88)) called at /usr/lib/os-autoinst/backend/qemu.pm line 125
backend::qemu::do_start_vm(backend::qemu=HASH(0x55bee5a31e88)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 430
backend::baseclass::start_vm(backend::qemu=HASH(0x55bee5a31e88), undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 89
backend::baseclass::handle_command(backend::qemu=HASH(0x55bee5a31e88), HASH(0x55bee499fac0)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 616
backend::baseclass::check_socket(backend::qemu=HASH(0x55bee5a31e88), IO::Handle=GLOB(0x55bee49871b0)) called at /usr/lib/os-autoinst/backend/qemu.pm line 1183
backend::qemu::check_socket(backend::qemu=HASH(0x55bee5a31e88), IO::Handle=GLOB(0x55bee49871b0), 0) called at /usr/lib/os-autoinst/backend/baseclass.pm line 273
eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 190
backend::baseclass::run_capture_loop(backend::qemu=HASH(0x55bee5a31e88)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 146
backend::baseclass::run(backend::qemu=HASH(0x55bee5a31e88), 14, 17) called at /usr/lib/os-autoinst/backend/driver.pm line 86
backend::driver::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x55bee0316460)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x55bee0316460), CODE(0x55bee4157940)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 477
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x55bee0316460)) called at /usr/lib/os-autoinst/backend/driver.pm line 87
backend::driver::start(backend::driver=HASH(0x55bee0316508)) called at /usr/lib/os-autoinst/backend/driver.pm line 52
backend::driver::new("backend::driver", "qemu") called at /usr/bin/isotovideo line 225
main::init_backend() called at /usr/bin/isotovideo line 276
[0m[33m[2021-04-15T02:47:43.874 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[0mUse of uninitialized value $message in scalar chomp at /usr/lib/os-autoinst/backend/qemu.pm line 155.
Use of uninitialized value $rt in numeric eq (==) at /usr/lib/os-autoinst/backend/qemu.pm line 156.
[37m[2021-04-15T02:47:44.922 CEST] [debug] Passing remaining frames to the video encoder
[0m[37m[2021-04-15T02:47:45.037 CEST] [debug] Waiting for video encoder to finalize the video
Not sure if it's a regression issue caused by https://github.com/os-autoinst/os-autoinst/pull/1641
Example:
https://openqa.suse.de/tests/5823910#dependencies
https://openqa.opensuse.org/tests/1699452#
Updated by AdamWill over 3 years ago
Oh, damn, yes, it probably is. This is perl's stupid "return the result of the last expression by default" thing. Before the PR, the last line of the function was $self->{dbus_object}->$fn(@args);
, so I think we were returning the result of that. Now the last line is the dbus disconnect call.
I'll send a PR with what ought to be the fix, and see if it's possible to make the tests test it :(
Note as a quick workaround, this crash probably only happens if OVS_DEBUG is set. So unset it for now.
Updated by AdamWill over 3 years ago
https://github.com/os-autoinst/os-autoinst/pull/1644 sent, I tested and was able to recreate the bug and confirm that fixes it. Very sorry for the trouble.
Updated by okurz over 3 years ago
- Subject changed from Many jobs on OSD and o3 are incomplete because of ' backend died: missing input at /usr/lib/os-autoinst/bmwqemu.pm line 202.' to Many jobs on OSD and o3 are incomplete because of auto_review:"backend died: missing input at /usr/lib/os-autoinst/bmwqemu.pm line 202"
- Category set to Regressions/Crashes
- Status changed from New to In Progress
- Assignee set to okurz
- Priority changed from High to Immediate
- Target version set to Ready
Xiaojing_liu doing a rollback on osd. me doing rollback on o3
Updated by okurz over 3 years ago
- Priority changed from Immediate to High
- We rolled back the change on o3 and subsequently installed the new fixed version. On osd we also did a rollback for now.
- Added a new section https://progress.opensuse.org/projects/openqav3/wiki#Rollback-of-updates with description of how to rollback on o3.
- osd automatic rollback failed as we only looked for the subpackage openQA-worker, missing openQA-common, fixed in https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/29
Realized the following open points:
- Xiaojing_liu does not yet have access to o3
- We do not have proper tests covering https://github.com/os-autoinst/os-autoinst/pull/1644 , e.g. see the notes from codecov about missing coverage in https://github.com/os-autoinst/os-autoinst/pull/1644/files
Updated by AdamWill over 3 years ago
Yeah, I did mention that in the PR. The tests never actually set up or mock a working dbus server and check the 'success' paths. They only check various different failure cases - the calls "really" failing because the service doesn't exist, and a mocked-up case of _dbus_do_call
returning an error. It's not that easy with the current test setup to assert that we properly pass through the return values all the way to _dbus_call
in 'normal operation'.
Updated by okurz over 3 years ago
- Status changed from In Progress to Resolved
right. But as I stated in the PR, I guess that's ok. This is also why we have extracted these _do_dbus_call
methods because at least we could mock these for tests of the other code :)
I have invited Xiaojing_liu to have a user account on o3 and added that point to https://progress.opensuse.org/projects/qa/wiki#Onboarding-for-new-joiners