Project

General

Profile

Actions

action #117925

closed

generalhw workers running on Tumbleweed are currently broken (Tests on Raspberry Pi 2, 3, 4 on o3)

Added by ggardet_arm almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Support
Target version:
Start date:
2022-10-11
Due date:
% Done:

0%

Estimated time:

Description

generalhw workers running on Tumbleweed are currently broken.
The error message reported on WebUI for the broken workers is:

Unable to lock pool directory: /var/lib/openqa/pool/3 already locked
 at /usr/share/openqa/script/../lib/OpenQA/Worker.pm line 808.
    OpenQA::Worker::_lock_pool_directory(OpenQA::Worker=HASH(0xaaaad6bc3528)) called at /usr/share/openqa/script/../lib/OpenQA/Worker.pm line 794
    eval {...} called at /usr/share/openqa/script/../lib/OpenQA/Worker.pm line 794
    OpenQA::Worker::_setup_pool_directory(OpenQA::Worker=HASH(0xaaaad6bc3528)) called at /usr/share/openqa/script/../lib/OpenQA/Worker.pm line 665
    OpenQA::Worker::check_availability(OpenQA::Worker=HASH(0xaaaad6bc3528)) called at /usr/share/openqa/script/../lib/OpenQA/Worker.pm line 234
    OpenQA::Worker::status(OpenQA::Worker=HASH(0xaaaad6bc3528)) called at /usr/share/openqa/script/../lib/OpenQA/Worker/WebUIConnection.pm line 379
    OpenQA::Worker::WebUIConnection::send_status(OpenQA::Worker::WebUIConnection=HASH(0xaaaad6bca3c8)) called at /usr/share/openqa/script/../lib/OpenQA/Worker/WebUIConnection.pm line 204
    OpenQA::Worker::WebUIConnection::__ANON__(OpenQA::Client=HASH(0xaaaad6bc9f78), Mojo::Transaction::WebSocket=HASH(0xaaaad6c540e8)) called at /usr/lib/perl5/vendor_perl/5.36.0/Mojo/UserAgent.pm line 242
    Mojo::UserAgent::_finish(OpenQA::Client=HASH(0xaaaad6bc9f78), "d77db911300857c027dede0ec5211b3d") called at /usr/lib/perl5/vendor_perl/5.36.0/Mojo/UserAgent.pm line 276
    Mojo::UserAgent::_read(OpenQA::Client=HASH(0xaaaad6bc9f78), "d77db911300857c027dede0ec5211b3d", "HTTP/1.1 101 Switching Protocols\x{d}\x{a}date: Tue, 11 Oct 2022 06:0"...) called at /usr/lib/perl5/vendor_perl/5.36.0/Mojo/UserAgent.pm line 136
    Mojo::UserAgent::__ANON__(Mojo::IOLoop::Stream=HASH(0xaaaad6c54118)) called at /usr/lib/perl5/vendor_perl/5.36.0/Mojo/EventEmitter.pm line 15
    Mojo::EventEmitter::emit(Mojo::IOLoop::Stream=HASH(0xaaaad6c54118), "read", "HTTP/1.1 101 Switching Protocols\x{d}\x{a}date: Tue, 11 Oct 2022 06:0"...) called at /usr/lib/perl5/vendor_perl/5.36.0/Mojo/IOLoop/Stream.pm line 109
    Mojo::IOLoop::Stream::_read(Mojo::IOLoop::Stream=HASH(0xaaaad6c54118)) called at /usr/lib/perl5/vendor_perl/5.36.0/Mojo/IOLoop/Stream.pm line 57
    Mojo::IOLoop::Stream::__ANON__(Mojo::Reactor::Poll=HASH(0xaaaad3bdf3d8)) called at /usr/lib/perl5/vendor_perl/5.36.0/Mojo/Reactor/Poll.pm line 141
    eval {...} called at /usr/lib/perl5/vendor_perl/5.36.0/Mojo/Reactor/Poll.pm line 141
    Mojo::Reactor::Poll::_try(Mojo::Reactor::Poll=HASH(0xaaaad3bdf3d8), "I/O watcher", CODE(0xaaaad6bfa8d0), 0) called at /usr/lib/perl5/vendor_perl/5.36.0/Mojo/Reactor/Poll.pm line 60
    Mojo::Reactor::Poll::one_tick(Mojo::Reactor::Poll=HASH(0xaaaad3bdf3d8)) called at /usr/lib/perl5/vendor_perl/5.36.0/Mojo/Reactor/Poll.pm line 101
    Mojo::Reactor::Poll::start(Mojo::Reactor::Poll=HASH(0xaaaad3bdf3d8)) called at /usr/lib/perl5/vendor_perl/5.36.0/Mojo/IOLoop.pm line 134
    Mojo::IOLoop::start(Mojo::IOLoop=HASH(0xaaaad51925d8)) called at /usr/share/openqa/script/../lib/OpenQA/Worker.pm line 374
    OpenQA::Worker::exec(OpenQA::Worker=HASH(0xaaaad6bc3528)) called at /usr/share/openqa/script/worker line 125
Actions #1

Updated by Guillaume_G almost 2 years ago

I tried a downgrade of perl-IO-Socket-SSL from 2.075 to 2.074, but it had no effect.

Actions #2

Updated by okurz almost 2 years ago

  • Category set to Support
  • Target version set to Ready

We need more information here. Since when does that happen? Where do you see this reproduced? The error message "Unable to lock pool directory: /var/lib/openqa/pool/3 already locked" means that something is already using the directory. Did a previous worker instance crash? Did you check the journal of the according worker processes? If you see this message reproduced could you check open file handles to that directory with lsof /var/lib/openqa/pool/3 please?

Actions #3

Updated by mkittler almost 2 years ago

Also make sure you don't start multiple different systemd-services at the same time (e.g. auto-restart vs. normal worker unit).

Actions #4

Updated by ggardet_arm almost 2 years ago

  • Status changed from New to Resolved

mkittler wrote:

Also make sure you don't start multiple different systemd-services at the same time (e.g. auto-restart vs. normal worker unit).

It looks like the problem was indeed openqa-worker@ and openqa-worker-auto-restart@ both enabled.
Thanks for your help!

Actions #5

Updated by livdywan almost 2 years ago

  • Assignee set to ggardet_arm
Actions

Also available in: Atom PDF