action #73342
closedall jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry
0%
Description
jobs run on openqaworker8 incomplete, the reason is Cache service status error from API: Minion job #46203 failed: Couldn't add download: DBD::SQLite::st execute failed: database disk image is malformed at /usr/share/openqa/script/../lib/OpenQA/CacheService/Model/Downloads.pm line 34. at /usr/share/openqa/script/../lib/OpenQA/CacheSer…
Please see more details on:
https://10.160.0.207/tests/4823712
https://10.160.0.207/tests/4821829
Updated by okurz almost 4 years ago
- Copied from action #73321: all jobs run on openqaworker8 incomplete:"Cache service status error from API: Minion job #46203 failed: Couldn't add download: DBD::SQLite::st execute failed: database disk image is malformed*" added
Updated by okurz almost 4 years ago
copied the sqlite database file so that one can investigate further, see #73321 for this. To stop any further problematic jobs produced by this worker I did:
systemctl stop openqa-worker* salt-minion telegraf
I see that the worker was not rebooted for longer but I expected that to have happened automatically. Will check package versions and reboot which should clean out the pool and cache automatically anyway and fix the reported issue.
EDIT: Found out that many workers have not automatically rebooted since 31 days. Manually triggering the "auto-update" service on openqaworker8 and taking a look into the corresponding journal shows that there is an "interactive" patch pending which seems to be a kernel upgrade. So trying with the flag "--non-interactive-include-reboot-patches" now: systemctl daemon-reload; systemctl start auto-update ; journalctl -f -u auto-update
. This triggered the kernel upgrade and asked the rebootmgr to reboot during the next maintenance window. For the above reasons I triggered a reboot now.
Updated by okurz almost 4 years ago
- Due date set to 2020-10-16
- Status changed from In Progress to Feedback
Machine is rebooted, has a clean cache db and is working on jobs fine for now. Will monitor over the next time.
Updated by okurz almost 4 years ago
- Related to action #67000: Job incompletes due to malformed worker cache database disk image with auto_review:"Cache service status error.*(database disk image is malformed|Specified job ID is invalid).*":retry added
Updated by okurz almost 4 years ago
- Subject changed from all jobs run on openqaworker8 incomplete:"Cache service status error from API: Minion job #46203 failed: Couldn't add download: DBD::SQLite::st execute failed: database disk image is malformed*" to all jobs run on openqaworker8 incomplete: auto_review:"Cache service status error from API: Minion job .*failed: .*database disk image is malformed*":retry
Updated by okurz almost 4 years ago
- Subject changed from all jobs run on openqaworker8 incomplete: auto_review:"Cache service status error from API: Minion job .*failed: .*database disk image is malformed*":retry to all jobs run on openqaworker8 incomplete: auto_review:"Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry
Updated by okurz almost 4 years ago
- Priority changed from Urgent to Normal
I monitored all openQA worker instances since the reboot and when the cache database was re-initialized and so far I have seen a lot of passed jobs on all worker instances. So it seems the problem of corrupt cache database was resolved as well as the problem that the machine(s) did not apply "interactive" patches: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/380
EDIT: Seems like the alerts for "incompletes" are also related to this so I paused the alerts for https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&panelId=17&fullscreen&edit&tab=alert and https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&panelId=14&fullscreen&edit&tab=alert for now until the situation is repaired.
Updated by okurz almost 4 years ago
- Status changed from Feedback to Resolved
merged, alerts reactivated, done.
Updated by okurz almost 4 years ago
- Subject changed from all jobs run on openqaworker8 incomplete: auto_review:"Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry to all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry
Updated by okurz almost 4 years ago
- Copied to action #75220: all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry added
Updated by okurz almost 4 years ago
- Related to action #78058: [Alerting] Incomplete jobs of last 24h alert - again many incompletes due to corrupted cache, on openqaworker8 added