Project

General

Profile

Actions

action #115484

closed

[alert] OSD deployment failed on 18.08.22 size:M

Added by mkittler over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2022-08-18
Due date:
2022-09-02
% Done:

0%

Estimated time:
Tags:

Description

Observation

Jobs in the pipeline failed. Apparently it cannot find the salt command:

++ echo '$ ssh $TARGET \ # collapsed multi-line command'
++ ssh openqa.suse.de 'set -xo pipefail;      sudo salt -C '\''G@roles:worker'\'' cmd.run '\''for i in {1..7}; do zypper -n dup --download-only --details && break || (echo Retry' after sleep '... && sleep 30); done'\'''
+ sudo salt -C G@roles:worker cmd.run 'for i in {1..7}; do zypper -n dup --download-only --details && break || (echo Retry after sleep ... && sleep 30); done'
sudo: salt: command not found
+++ kill %1
Cleaning up project directory and file based variables 00:03
Using docker image sha256:81cd3442a3502ceb7f516b7ce7a0f02634449600ddbb28a2d5c0e192564d0dcf for registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper:x86_64-latest with digest registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper@sha256:6dc8b8a636f4dde3ae54e781e487219ce27c5ee46b8376b495c2bc66050222ba ...

Acceptance criteria

  • AC1: salt can be found again
  • AC2: The cause of the issue is known

Suggestions

Actions #1

Updated by livdywan over 2 years ago

  • Subject changed from [alert] OSD deployment failed on 18.08.22 to [alert] OSD deployment failed on 18.08.22 size:M
  • Description updated (diff)
  • Status changed from New to Workable

Marius mitigated the issue by installing salt{,-master} (but we don't know what caused this)

Actions #2

Updated by mkittler over 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
Actions #3

Updated by mkittler over 2 years ago

/var/log/zypp/history:

2022-08-18 03:00:41|command|root@openqa|'zypper' '-n' '--non-interactive-include-reboot-patches' 'patch' '--replacefiles' '--auto-agree-with-licenses' '--force-resolution' '--download-in-advance'|
# 2022-08-18 03:00:46 salt-minion-3004-150400.8.8.1.x86_64 removed ok
# Additional rpm output:
# Removed /etc/systemd/system/multi-user.target.wants/salt-minion.service.
# warning: /etc/salt/minion saved as /etc/salt/minion.rpmsave
# 
2022-08-18 03:00:46|remove |salt-minion|3004-150400.8.8.1|x86_64||
2022-08-18 03:00:47|remove |salt-ssh|3004-150400.8.8.1|x86_64||
# 2022-08-18 03:00:54 salt-master-3004-150400.8.8.1.x86_64 removed ok
# Additional rpm output:
# Removed /etc/systemd/system/multi-user.target.wants/salt-master.service.
# warning: /etc/salt/master saved as /etc/salt/master.rpmsave
# 
2022-08-18 03:00:54|remove |salt-master|3004-150400.8.8.1|x86_64||
2022-08-18 03:00:56|remove |salt|3004-150400.8.8.1|x86_64||
2022-08-18 03:01:01|remove |python3-salt|3004-150400.8.8.1|x86_64||
2022-08-18 03:01:01|remove |python3-requests|2.24.0-1.24|noarch||
2022-08-18 03:01:01|remove |python3-py|1.8.1-5.6.1|noarch||
2022-08-18 03:01:01|patch  |openSUSE-SLE-15.4-2022-2119|1|noarch|repo-sle-update|important|recommended|applied|not-needed|
2022-08-18 03:01:01|patch  |openSUSE-SLE-15.4-2022-2831|1|noarch|repo-sle-update|moderate|security|needed|not-needed|
2022-08-18 03:01:01|patch  |openSUSE-SLE-15.4-2022-2304|1|noarch|repo-sle-update|important|security|applied|not-needed|
2022-08-18 11:55:43|command|root@openqa|'zypper' 'in' 'salt'|
2022-08-18 11:55:43|install|python3-py|1.8.1-5.6.1|noarch||repo-oss|31ddc63bc7178278d7d72abe6bdeb2a97c486cdd56234292f56a0fa1a562c324|
2022-08-18 11:55:43|install|python3-requests|2.24.0-1.24|noarch||repo-oss|89d71c20ba199d5ab01e24ca8d9d8ad8d64e95f16983c118399a7b4eb93f272c|
2022-08-18 11:55:46|install|python3-salt|3004-150400.8.8.1|x86_64||repo-sle-update|cd0f786c1fc1be720eaa4c98f371f9ec4dfb9d8f70426cfa1b9477f9665f3110|
2022-08-18 11:55:46|install|salt|3004-150400.8.8.1|x86_64|root@openqa|repo-sle-update|39a52e8809f6cf55c7add5408136e57b55a1cc799dca212c51d818496c5f2349|
2022-08-18 11:55:46|patch  |openSUSE-SLE-15.4-2022-2119|1|noarch|repo-sle-update|important|recommended|not-needed|applied|
2022-08-18 11:55:46|patch  |openSUSE-SLE-15.4-2022-2831|1|noarch|repo-sle-update|moderate|security|not-needed|needed|
2022-08-18 11:55:46|patch  |openSUSE-SLE-15.4-2022-2304|1|noarch|repo-sle-update|important|security|not-needed|applied|
2022-08-18 11:55:58|command|root@openqa|'zypper' 'in' 'salt-master'|
2022-08-18 11:56:01|install|salt-master|3004-150400.8.8.1|x86_64|root@openqa|repo-sle-update|8a88fab48d176667c908ee6e248d4a82a5089d7abbe2a1f3122894be52e8de7c|
Actions #4

Updated by mkittler over 2 years ago

That's the more detailed log from the uninstallation:

2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131   normal: 28427, 79923 literals
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 pkg rule memory used: 1160 K
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 pkg rule creation took 155 ms
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133 job: blacklist providing retracted-patch-package()
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133 job: blacklist providing ptf()
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133 job: install patch:openSUSE-SLE-15.4-2022-2831-1.noarch
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133   - job Rule #52783:
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     patch:openSUSE-SLE-15.4-2022-2831-1.noarch [180057] (w1)
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133 job: install providing glibc
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133   - job Rule #52784:
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     glibc-2.31-150300.20.7.x86_64 [90300] (w1)
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     glibc-2.31-150300.26.5.x86_64 [171973] (w2)
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     glibc-2.31-150300.31.2.x86_64 [171978]
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     glibc-2.31-150300.37.1.x86_64 [171983]
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     glibc-2.31-150300.37.1.x86_64 [180578]I
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133 job: erase openQA-local-db-4.6.1660648257.0cc7a55-lp154.5211.5.noarch
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133   - job Rule #52785:
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     !openQA-local-db-4.6.1660648257.0cc7a55-lp154.5211.5.noarch [25] (w1)
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133 job: erase openQA-local-db-4.6.1639414134.aa9bed13e-bp154.1.64.noarch
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133   - job Rule #52786:
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     !openQA-local-db-4.6.1639414134.aa9bed13e-bp154.1.64.noarch [130809] (w1)
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 choice rule creation took 22 ms
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 49520 pkg rules, 2 * 1631 update rules, 4 job rules, 1 infarch rules, 0 dup rules, 2 choice rules, 0 best rules, 0 yumobs rules
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 0 black rules, 0 recommends rules, 0 repo priority rules
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 overall rule memory used: 1237 K
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 solver statistics: 0 learned rules, 7 unsolvable, 0 minimization steps
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 done solving.
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 solver took 21 ms
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 final solver statistics: 0 problems, 0 learned rules, 7 unsolvable
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 solver_solve took 255 ms
2022-08-18 03:00:39 <1> openqa(30384) [zypp::solver] SATResolver.cc(solving):556 ....Solver end
2022-08-18 03:00:39 <1> openqa(30384) [zypp::solver] SATResolver.cc(resolvePool):816 SATResolver::resolvePool() done. Ret:1
2022-08-18 03:00:39 <1> openqa(30384) [zypper] solve-commit.cc(solve_and_commit):675 got solution, showing summary
2022-08-18 03:00:39 <1> openqa(30384) [zypper] Summary.cc(readPool):103 Pool contains 66307 items.
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):104 Install summary:
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <install>   UNTu_(180057)patch:openSUSE-SLE-15.4-2022-2831-1.noarch(repo-sle-update)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181756)python3-py-1.8.1-5.6.1.noarch(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181771)python3-requests-2.24.0-1.24.noarch(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181773)python3-salt-3004-150400.8.8.1.x86_64(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181849)salt-3004-150400.8.8.1.x86_64(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181850)salt-master-3004-150400.8.8.1.x86_64(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181851)salt-minion-3004-150400.8.8.1.x86_64(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181852)salt-ssh-3004-150400.8.8.1.x86_64(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper] Summary.cc(readPool):327 package update candidates: 5
2022-08-18 03:00:39 <1> openqa(30384) [zypper] Summary.cc(readPool):328 to be actually updated: 0
2022-08-18 03:00:39 <1> openqa(30384) [zypper] Summary.cc(readPool):327 product update candidates: 0
2022-08-18 03:00:39 <1> openqa(30384) [zypper] Summary.cc(readPool):328 to be actually updated: 0

I couldn't find any interesting clues in the logs before (like a repo download error).

Actions #5

Updated by mkittler over 2 years ago

  • Tags deleted (alert)
  • Assignee deleted (mkittler)
  • Target version deleted (Ready)

SR for removing --force-resolution: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/726

Btw, maybe this upstream issue is related: https://github.com/openSUSE/zypper/issues/446
However, this time I couldn't spot any repository refreshing errors in logs. Unfortunately journalctl -u auto-update.service only shows messages from May but not current ones.

Actions #6

Updated by mkittler over 2 years ago

  • Assignee set to mkittler
  • Target version set to Ready
Actions #7

Updated by mkittler over 2 years ago

  • Tags set to alert
Actions #8

Updated by mkittler over 2 years ago

I installed all packages again that have been accidentally removed and retried the deployment which worked now.

Not sure why only these few packages were uninstalled. Unfortunately the logs only state what packages have been removed but not why.

Actions #9

Updated by openqa_review over 2 years ago

  • Due date set to 2022-09-02

Setting due date based on mean cycle time of SUSE QE Tools

Actions #10

Updated by tinita over 2 years ago

Regarding the missing journal logs: we fixed this in #115208 so from today on we should have more logs again.

Actions #11

Updated by mkittler over 2 years ago

  • Status changed from In Progress to Resolved

The SR has been merged and I think https://github.com/openSUSE/zypper/issues/446 is still good enough as upstream bug.

Actions

Also available in: Atom PDF