Project

General

Profile

action #75220

all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry

Added by okurz 9 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

again many incompletes on openqaworker8 due to malformed sqlite database. But right now it seems like many jobs are actually running fine but I think no one changed anything. I just triggered an explicit pipeline for "auto-review". At the time of writing it is still running.


Related issues

Related to openQA Infrastructure - action #78058: [Alerting] Incomplete jobs of last 24h alert - again many incompletes due to corrupted cache, on openqaworker8Resolved2020-11-162020-11-18

Copied from openQA Infrastructure - action #73342: all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retryResolved2020-10-142020-10-16

History

#1 Updated by okurz 9 months ago

  • Copied from action #73342: all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry added

#2 Updated by okurz 9 months ago

All incompletes were labeled with #67000 and retriggered and auto-review passed this step but then more severe problems have piled up and I have not seen the alert about "minion jobs" on that worker in before: https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker8/worker-dashboard-openqaworker8?from=1603455577704&to=1603521237406&fullscreen&panelId=65104

I will just reboot the machine and see what happens.

EDIT: Machine is back up but suffering from missing or borked network connection same as in #75016

#3 Updated by okurz 9 months ago

  • Status changed from In Progress to Resolved

At least the problem of cache file was resolved with rebooting which reformated the complete NVMe based pool+cache partition

#4 Updated by okurz 8 months ago

  • Related to action #78058: [Alerting] Incomplete jobs of last 24h alert - again many incompletes due to corrupted cache, on openqaworker8 added

Also available in: Atom PDF