Project

General

Profile

Actions

action #122458

closed

O3 ipmi worker rebel:5 is broken size:M

Added by Julie_CAO over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-12-26
Due date:
% Done:

0%

Estimated time:

Description

Observation

In O3 web UI, initially rebel:5 shows 'offline' status, I systemctl restart openqa-worker@5 on rebel, then the worker status in web UI changed to 'broken' as below.

rebel:5    rebel    64bit-ipmi_rebel,64bit-ipmi-large-mem_rebel,64bit-ipmi-amd_rebel,blackbauhinia_rebel    x86_64    **Broken**     1    34

Here is the failure:

 422138 Dec 26 08:52:57 rebel worker[24670]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/382
 422139 Dec 26 08:52:57 rebel worker[6598]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/382 finished by remote side with code 10>
 422140 Dec 26 08:52:57 rebel worker[24670]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 382
 422141 Dec 26 08:52:57 rebel worker[24670]: [warn] Unable to lock pool directory: /var/lib/openqa/pool/5 already locked
 422142 Dec 26 08:52:57 rebel worker[24670]:  at /usr/share/openqa/script/../lib/OpenQA/Worker.pm line 757.
 422143 Dec 26 08:52:57 rebel worker[24670]:         OpenQA::Worker::_lock_pool_directory(OpenQA::Worker=HASH(0x560fa90f2828)) called at /usr/share/o>
 422144 Dec 26 08:52:57 rebel worker[24670]:         eval {...} called at /usr/share/openqa/script/../lib/OpenQA/Worker.pm line 745
...
 422161 Dec 26 08:52:57 rebel worker[24670]:         OpenQA::Worker::exec(OpenQA::Worker=HASH(0x560fa90f2828)) called at /usr/share/openqa/script/wor>
 422162 Dec 26 08:52:57 rebel worker[24670]:  - checking again for web UI 'http://openqa1-opensuse' in 100.00 s

Could you help to fix the failure, or could you point me how to fix it?

Acceptance criteria

  • AC1: The openQA worker instance rebel:5 passes openQA jobs

Suggestions

  • Log into the machine "rebel" part of the o3 infrastructure, check process table, check logs, check files in pool directory. Try a reboot of the machine, monitor openQA jobs on the instance. Look for crashes of isotovideo or the openQA worker and any left-over lock files.

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #123028: A/C broken in TAM lab size:MResolvednicksinger2023-01-12

Actions
Actions

Also available in: Atom PDF