action #107074
closederror on openqaworker-arm-2 failing osd-deployment size:M
0%
Description
Observation¶
https://gitlab.suse.de/openqa/osd-deployment/-/jobs/845929#L609 shows what looks like an rpm database problem:
( 9/12) Installing: libvirglrenderer0-0.6.0-4.9.1.aarch64 [.......
error: db4 error(-30986) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found
error: error(-30986) getting "System/Libraries" records from Group index: DB_PAGE_NOTFOUND: Requested page not found
error: libvirglrenderer0-0.6.0-4.9.1.aarch64: install failed
error: libvirglrenderer0-0.6.0-4.6.1.aarch64: erase skipped
error]
Installation of libvirglrenderer0-0.6.0-4.9.1.aarch64 failed:
Error: Subprocess failed. Error: RPM failed: Command exited with status 1.
Abort, retry, ignore? [a/r/i] (a): a
Problem occurred during or after installation or removal of packages:
Installation has been aborted as directed.
Please see the above error message for a hint.
Suggestions¶
- I think we already try to prune the rpm database if installation fails on a machine. Maybe we can extend that approach to cover such cases like the above as well. The existing recovery happens in https://gitlab.suse.de/openqa/osd-deployment/-/blob/master/.gitlab-ci.yml#L230 but only if we find the string "inconsistent rpm database" in the zypper log and only on zypper refresh. I suggest to take a look into the machine, try to reproduce if it can be reproduced, take look into the zypper log and also research in web and bugzilla.opensuse.org if the problem is known.
Updated by okurz over 2 years ago
- Description updated (diff)
- Priority changed from High to Urgent
Updated by okurz over 2 years ago
- Subject changed from error on openqaworker-arm-2 failing osd-deployment to error on openqaworker-arm-2 failing osd-deployment size:M
- Status changed from New to Workable
Updated by mkittler over 2 years ago
According to some Google results rpm -v --rebuilddb
will help in this case as well. Considering that arm workers are often crashing it isn't a big surprise to find broken rpm databases at some point.
arm-2 has been offline since 10:09 AM so I cannot login to check whether the problem is still apparent and whether rebuilding the rpm db actually fixes it. However, I can extend the grep command in our salt repo.
Updated by mkittler over 2 years ago
arm-2 is back again after involing a power cycle via ipmi. Not sure why our automatic recovery didn't do the trick. Maybe I was just too impatient. (The last related job was https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/849331 and it seems to work generally.)
It doesn't seem like the rpm database is currently broken on arm-2 (so I cannot reproduce the specific issue). So I'll just change the grep command.
Updated by mkittler over 2 years ago
- Status changed from Workable to Feedback
Updated by mkittler over 2 years ago
- Status changed from Feedback to Resolved
The PR has been merged. I've tested the regex locally which should be good enough.
Updated by okurz 6 months ago
- Related to action #159270: openqaworker-arm-1 is Unreachable size:S added