Project

General

Profile

Actions

action #107932

closed

Handling broken RPM databases does not handle certain cases

Added by mkittler almost 3 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2022-03-07
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Today OSD's deployment failed because the rpm database on arm-1 is broken again:

    Problem occurred during or after installation or removal of packages:
    Failed to cache rpm database (1).
    History:
     - 'rpmdb2solv' '-r' '/' '-D' '/usr/lib/sysimage/rpm' '-X' '-p' '/etc/products.d' '/var/cache/zypp/solv/@System/solv' '-o' '/var/cache/zypp/solv/@System/solvqST4AK'
       rpmdb2solv: inconsistent rpm database, key 2106 not found. run 'rpm --rebuilddb' to fix.

    Please see the above error message for a hint.
ERROR: Minions returned with non-zero exit code

(from https://gitlab.suse.de/openqa/osd-deployment/-/jobs/870107)

We actually try to rebuild the rpm database automatically in when we spot the corresponding error message and it has indeed be logged:

2022-03-07 11:15:41 <2> openqaworker-arm-1(13721) [zypp] TargetImpl.cc(buildCache):1068   rpmdb2solv: inconsistent rpm database, key 2106 not found. run 'rpm --rebuilddb' to fix.
2022-03-07 11:15:41 <2> openqaworker-arm-1(13721) [zypp::exec] abstractspawnengine.cc(checkStatus):176 Pid 27052 exited with status 1
2022-03-07 11:15:41 <5> openqaworker-arm-1(13721) [zypp-core] Exception.cc(log):186 TargetImpl.cc(buildCache):1081 THROW:    Failed to cache rpm database (1).
…
2022-03-07 11:15:42 <1> openqaworker-arm-1(13721) [zypp::plugin++] PluginScript.cc(close):251 PluginScript[13786] /usr/lib/zypp/plugins/commit/zyppnotify -> [0] Disconnect
2022-03-07 11:15:42 <2> openqaworker-arm-1(13721) [zypp::exec] abstractspawnengine.cc(checkStatus):197 Pid 13777 was killed by signal 9 (Killed; Out of memory?)
2022-03-07 11:15:42 <5> openqaworker-arm-1(13721) [zypp-core] Exception.cc(log):186 solve-commit.cc(solve_and_commit):950 CAUGHT:   Failed to cache rpm database (1).
2022-03-07 11:15:42 <5> openqaworker-arm-1(13721) [zypp-core] Exception.cc(log):186 History:
2022-03-07 11:15:42 <5> openqaworker-arm-1(13721) [zypp-core] Exception.cc(log):186  - 'rpmdb2solv' '-r' '/' '-D' '/usr/lib/sysimage/rpm' '-X' '-p' '/etc/products.d' '/var/cache/zypp/solv/@System/solv' '-o' '/var/cache/zypp/solv/@System/solvqST4AK'
2022-03-07 11:15:42 <5> openqaworker-arm-1(13721) [zypp-core] Exception.cc(log):186    rpmdb2solv: inconsistent rpm database, key 2106 not found. run 'rpm --rebuilddb' to fix.
2022-03-07 11:15:42 <5> openqaworker-arm-1(13721) [zypp-core] Exception.cc(log):186
2022-03-07 11:15:42 <2> openqaworker-arm-1(13721) [zypper] Zypper.h(setExitCode):162 setExitCode 8
2022-03-07 11:15:42 <2> openqaworker-arm-1(13721) [zypper] Zypper.h(setExitCode):162 setExitCode 8
2022-03-07 11:15:42 <1> openqaworker-arm-1(13721) [zypper] Zypper.cc(doCommand):677 Done
2022-03-07 11:15:42 <1> openqaworker-arm-1(13721) [zypper] Zypper.cc(cleanup):729 START
2022-03-07 11:15:42 <1> openqaworker-arm-1(13721) [zypper] main.cc(~Bye):98 ===== Exiting main(8) =====

I assume the problem is that the log messages only occurred within the zypper call made in the deployment step but we only check whether such errors were logged before. (There are no earlier messages in the log than the one mentioned above.)


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #131249: [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:MResolvedokurz2023-06-22

Actions
Actions #1

Updated by mkittler almost 3 years ago

  • Assignee set to mkittler
Actions #2

Updated by mkittler almost 3 years ago

  • Status changed from New to Feedback
Actions #3

Updated by mkittler almost 3 years ago

  • Status changed from Feedback to Resolved

The SR has been merged.

Actions #4

Updated by tinita over 1 year ago

  • Status changed from Resolved to Feedback
Actions #5

Updated by okurz over 1 year ago

  • Related to action #131249: [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:M added
Actions #6

Updated by okurz over 1 year ago

  • Status changed from Feedback to Resolved

I manually called rpm --rebuilddb on openqaworker-arm-1 now. https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1659435#L4490 reads to me as if the database was actually already repaired automatically and that the error failing the pipeline is actually from further above which was due to #128528 which was solved in the meantime. I think we are ok again.

Actions

Also available in: Atom PDF