action #73342
closed
all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry
Added by okurz over 4 years ago.
Updated over 4 years ago.
Description
jobs run on openqaworker8 incomplete, the reason is Cache service status error from API: Minion job #46203 failed: Couldn't add download: DBD::SQLite::st execute failed: database disk image is malformed at /usr/share/openqa/script/../lib/OpenQA/CacheService/Model/Downloads.pm line 34. at /usr/share/openqa/script/../lib/OpenQA/CacheSer…
Please see more details on:
https://10.160.0.207/tests/4823712
https://10.160.0.207/tests/4821829
- Copied from action #73321: all jobs run on openqaworker8 incomplete:"Cache service status error from API: Minion job #46203 failed: Couldn't add download: DBD::SQLite::st execute failed: database disk image is malformed*" added
copied the sqlite database file so that one can investigate further, see #73321 for this. To stop any further problematic jobs produced by this worker I did:
systemctl stop openqa-worker* salt-minion telegraf
I see that the worker was not rebooted for longer but I expected that to have happened automatically. Will check package versions and reboot which should clean out the pool and cache automatically anyway and fix the reported issue.
EDIT: Found out that many workers have not automatically rebooted since 31 days. Manually triggering the "auto-update" service on openqaworker8 and taking a look into the corresponding journal shows that there is an "interactive" patch pending which seems to be a kernel upgrade. So trying with the flag "--non-interactive-include-reboot-patches" now: systemctl daemon-reload; systemctl start auto-update ; journalctl -f -u auto-update
. This triggered the kernel upgrade and asked the rebootmgr to reboot during the next maintenance window. For the above reasons I triggered a reboot now.
- Due date set to 2020-10-16
- Status changed from In Progress to Feedback
Machine is rebooted, has a clean cache db and is working on jobs fine for now. Will monitor over the next time.
- Related to action #67000: Job incompletes due to malformed worker cache database disk image with auto_review:"Cache service status error.*(database disk image is malformed|Specified job ID is invalid).*":retry added
- Subject changed from all jobs run on openqaworker8 incomplete:"Cache service status error from API: Minion job #46203 failed: Couldn't add download: DBD::SQLite::st execute failed: database disk image is malformed*" to all jobs run on openqaworker8 incomplete: auto_review:"Cache service status error from API: Minion job .*failed: .*database disk image is malformed*":retry
- Subject changed from all jobs run on openqaworker8 incomplete: auto_review:"Cache service status error from API: Minion job .*failed: .*database disk image is malformed*":retry to all jobs run on openqaworker8 incomplete: auto_review:"Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry
- Priority changed from Urgent to Normal
- Status changed from Feedback to Resolved
merged, alerts reactivated, done.
- Subject changed from all jobs run on openqaworker8 incomplete: auto_review:"Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry to all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry
- Copied to action #75220: all jobs run on openqaworker8 incomplete: "Cache service status error from API: Minion job .*failed: .*(database disk image is malformed|not a database)":retry added
- Related to action #78058: [Alerting] Incomplete jobs of last 24h alert - again many incompletes due to corrupted cache, on openqaworker8 added
Also available in: Atom
PDF