Project

General

Profile

Actions

action #44105

closed

if workercache dies, we get *tons* of incompletes

Added by coolo over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2018-11-21
Due date:
% Done:

0%

Estimated time:

Description

I guess if the workercache service is unavailable, the worker should stop accepting jobs - otherwise it can enqueue a lot of incompletes
really quickly.

● openqa-worker-cacheservice.service - OpenQA Worker Cache Service
   Loaded: loaded (/usr/lib/systemd/system/openqa-worker-cacheservice.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2018-11-21 03:33:35 CET; 4h 36min ago
  Process: 1962 ExecStart=/usr/share/openqa/script/openqa-workercache daemon -m production (code=exited, status=22)
 Main PID: 1962 (code=exited, status=22)

Nov 21 03:33:35 openqaworker4 openqa-workercache[1962]: [DEBUG] CACHE: removed /var/lib/openqa/cache/old/openSUSE-13.2-x86_64.qcow2
Nov 21 03:33:35 openqaworker4 openqa-workercache[1962]: [INFO] CACHE: Purging non registered /var/lib/openqa/cache/old/openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20181113-Media.iso
Nov 21 03:33:35 openqaworker4 openqa-workercache[1962]: [ERROR] CACHE: Could not remove /var/lib/openqa/cache/old/openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20181113-Media.iso
Nov 21 03:33:35 openqaworker4 openqa-workercache[1962]: [DEBUG] CACHE: removed /var/lib/openqa/cache/old/openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20181113-Media.iso
Nov 21 03:33:35 openqaworker4 openqa-workercache[1962]: [DEBUG] CACHE: Health: Real size: 52798166016, Configured limit: 53687091200
Nov 21 03:33:35 openqaworker4 openqa-workercache[1962]: [INFO] OpenQA::Worker::Cache: Initialized with localhost at /var/lib/openqa/cache, current size is 52798166016
Nov 21 03:33:35 openqaworker4 openqa-workercache[1962]: Can't create listen socket: Address family for hostname not supported at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop.pm line 124.
Nov 21 03:33:35 openqaworker4 systemd[1]: openqa-worker-cacheservice.service: Main process exited, code=exited, status=22/n/a
Nov 21 03:33:35 openqaworker4 systemd[1]: openqa-worker-cacheservice.service: Unit entered failed state.
Nov 21 03:33:35 openqaworker4 systemd[1]: openqa-worker-cacheservice.service: Failed with result 'exit-code'.

Related issues 3 (0 open3 closed)

Related to openQA Project - action #44162: Various tests stayed 'running' for ~ 4 hours or longerResolvedokurz2018-11-21

Actions
Related to openQA Project - action #44693: Caching issue on new snapshots synced to o3 - no cache minion workers availableResolvedokurz2018-12-04

Actions
Related to openQA Project - action #62567: openqa services can fail when network is not up (yet) "Can't create listen socket: Address family for hostname not supported"Resolvedokurz2020-01-172020-03-06

Actions
Actions

Also available in: Atom PDF