Project

General

Profile

Actions

action #107074

closed

error on openqaworker-arm-2 failing osd-deployment size:M

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2022-02-18
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://gitlab.suse.de/openqa/osd-deployment/-/jobs/845929#L609 shows what looks like an rpm database problem:

    ( 9/12) Installing: libvirglrenderer0-0.6.0-4.9.1.aarch64 [.......
    error: db4 error(-30986) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found
    error: error(-30986) getting "System/Libraries" records from Group index: DB_PAGE_NOTFOUND: Requested page not found
    error: libvirglrenderer0-0.6.0-4.9.1.aarch64: install failed
    error: libvirglrenderer0-0.6.0-4.6.1.aarch64: erase skipped
    error]
    Installation of libvirglrenderer0-0.6.0-4.9.1.aarch64 failed:
    Error: Subprocess failed. Error: RPM failed: Command exited with status 1.
    Abort, retry, ignore? [a/r/i] (a): a
    Problem occurred during or after installation or removal of packages:
    Installation has been aborted as directed.
    Please see the above error message for a hint.

Suggestions

  • I think we already try to prune the rpm database if installation fails on a machine. Maybe we can extend that approach to cover such cases like the above as well. The existing recovery happens in https://gitlab.suse.de/openqa/osd-deployment/-/blob/master/.gitlab-ci.yml#L230 but only if we find the string "inconsistent rpm database" in the zypper log and only on zypper refresh. I suggest to take a look into the machine, try to reproduce if it can be reproduced, take look into the zypper log and also research in web and bugzilla.opensuse.org if the problem is known.

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #159270: openqaworker-arm-1 is Unreachable size:SResolvedybonatakis2024-04-19

Actions
Actions #1

Updated by okurz over 2 years ago

  • Description updated (diff)
  • Priority changed from High to Urgent
Actions #3

Updated by okurz over 2 years ago

  • Subject changed from error on openqaworker-arm-2 failing osd-deployment to error on openqaworker-arm-2 failing osd-deployment size:M
  • Status changed from New to Workable
Actions #4

Updated by mkittler over 2 years ago

  • Assignee set to mkittler
Actions #5

Updated by mkittler over 2 years ago

According to some Google results rpm -v --rebuilddb will help in this case as well. Considering that arm workers are often crashing it isn't a big surprise to find broken rpm databases at some point.

arm-2 has been offline since 10:09 AM so I cannot login to check whether the problem is still apparent and whether rebuilding the rpm db actually fixes it. However, I can extend the grep command in our salt repo.

Actions #6

Updated by mkittler over 2 years ago

arm-2 is back again after involing a power cycle via ipmi. Not sure why our automatic recovery didn't do the trick. Maybe I was just too impatient. (The last related job was https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/849331 and it seems to work generally.)

It doesn't seem like the rpm database is currently broken on arm-2 (so I cannot reproduce the specific issue). So I'll just change the grep command.

Actions #7

Updated by mkittler over 2 years ago

  • Status changed from Workable to Feedback
Actions #8

Updated by mkittler over 2 years ago

  • Status changed from Feedback to Resolved

The PR has been merged. I've tested the regex locally which should be good enough.

Actions #9

Updated by okurz about 1 month ago

  • Related to action #159270: openqaworker-arm-1 is Unreachable size:S added
Actions

Also available in: Atom PDF