Project

General

Profile

Actions

action #75220

closed

all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry

Added by okurz about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

again many incompletes on openqaworker8 due to malformed sqlite database. But right now it seems like many jobs are actually running fine but I think no one changed anything. I just triggered an explicit pipeline for "auto-review". At the time of writing it is still running.


Related issues 2 (0 open2 closed)

Related to openQA Infrastructure (public) - action #78058: [Alerting] Incomplete jobs of last 24h alert - again many incompletes due to corrupted cache, on openqaworker8Resolvedokurz2020-11-162020-11-18

Actions
Copied from openQA Infrastructure (public) - action #73342: all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retryResolvedokurz2020-10-142020-10-16

Actions
Actions #1

Updated by okurz about 4 years ago

  • Copied from action #73342: all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry added
Actions #2

Updated by okurz about 4 years ago

All incompletes were labeled with #67000 and retriggered and auto-review passed this step but then more severe problems have piled up and I have not seen the alert about "minion jobs" on that worker in before: https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker8/worker-dashboard-openqaworker8?from=1603455577704&to=1603521237406&fullscreen&panelId=65104

I will just reboot the machine and see what happens.

EDIT: Machine is back up but suffering from missing or borked network connection same as in #75016

Actions #3

Updated by okurz about 4 years ago

  • Status changed from In Progress to Resolved

At least the problem of cache file was resolved with rebooting which reformated the complete NVMe based pool+cache partition

Actions #4

Updated by okurz about 4 years ago

  • Related to action #78058: [Alerting] Incomplete jobs of last 24h alert - again many incompletes due to corrupted cache, on openqaworker8 added
Actions

Also available in: Atom PDF