Project

General

Profile

action #115484

[alert] OSD deployment failed on 18.08.22 size:M

Added by mkittler about 1 month ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2022-08-18
Due date:
2022-09-02
% Done:

0%

Estimated time:
Tags:

Description

Observation

Jobs in the pipeline failed. Apparently it cannot find the salt command:

++ echo '$ ssh $TARGET \ # collapsed multi-line command'
++ ssh openqa.suse.de 'set -xo pipefail;      sudo salt -C '\''G@roles:worker'\'' cmd.run '\''for i in {1..7}; do zypper -n dup --download-only --details && break || (echo Retry' after sleep '... && sleep 30); done'\'''
+ sudo salt -C G@roles:worker cmd.run 'for i in {1..7}; do zypper -n dup --download-only --details && break || (echo Retry after sleep ... && sleep 30); done'
sudo: salt: command not found
+++ kill %1
Cleaning up project directory and file based variables 00:03
Using docker image sha256:81cd3442a3502ceb7f516b7ce7a0f02634449600ddbb28a2d5c0e192564d0dcf for registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper:x86_64-latest with digest registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper@sha256:6dc8b8a636f4dde3ae54e781e487219ce27c5ee46b8376b495c2bc66050222ba ...

Acceptance criteria

  • AC1: salt can be found again
  • AC2: The cause of the issue is known

Suggestions

History

#1 Updated by cdywan about 1 month ago

  • Subject changed from [alert] OSD deployment failed on 18.08.22 to [alert] OSD deployment failed on 18.08.22 size:M
  • Description updated (diff)
  • Status changed from New to Workable

Marius mitigated the issue by installing salt{,-master} (but we don't know what caused this)

#2 Updated by mkittler about 1 month ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler

#3 Updated by mkittler about 1 month ago

/var/log/zypp/history:

2022-08-18 03:00:41|command|root@openqa|'zypper' '-n' '--non-interactive-include-reboot-patches' 'patch' '--replacefiles' '--auto-agree-with-licenses' '--force-resolution' '--download-in-advance'|
# 2022-08-18 03:00:46 salt-minion-3004-150400.8.8.1.x86_64 removed ok
# Additional rpm output:
# Removed /etc/systemd/system/multi-user.target.wants/salt-minion.service.
# warning: /etc/salt/minion saved as /etc/salt/minion.rpmsave
# 
2022-08-18 03:00:46|remove |salt-minion|3004-150400.8.8.1|x86_64||
2022-08-18 03:00:47|remove |salt-ssh|3004-150400.8.8.1|x86_64||
# 2022-08-18 03:00:54 salt-master-3004-150400.8.8.1.x86_64 removed ok
# Additional rpm output:
# Removed /etc/systemd/system/multi-user.target.wants/salt-master.service.
# warning: /etc/salt/master saved as /etc/salt/master.rpmsave
# 
2022-08-18 03:00:54|remove |salt-master|3004-150400.8.8.1|x86_64||
2022-08-18 03:00:56|remove |salt|3004-150400.8.8.1|x86_64||
2022-08-18 03:01:01|remove |python3-salt|3004-150400.8.8.1|x86_64||
2022-08-18 03:01:01|remove |python3-requests|2.24.0-1.24|noarch||
2022-08-18 03:01:01|remove |python3-py|1.8.1-5.6.1|noarch||
2022-08-18 03:01:01|patch  |openSUSE-SLE-15.4-2022-2119|1|noarch|repo-sle-update|important|recommended|applied|not-needed|
2022-08-18 03:01:01|patch  |openSUSE-SLE-15.4-2022-2831|1|noarch|repo-sle-update|moderate|security|needed|not-needed|
2022-08-18 03:01:01|patch  |openSUSE-SLE-15.4-2022-2304|1|noarch|repo-sle-update|important|security|applied|not-needed|
2022-08-18 11:55:43|command|root@openqa|'zypper' 'in' 'salt'|
2022-08-18 11:55:43|install|python3-py|1.8.1-5.6.1|noarch||repo-oss|31ddc63bc7178278d7d72abe6bdeb2a97c486cdd56234292f56a0fa1a562c324|
2022-08-18 11:55:43|install|python3-requests|2.24.0-1.24|noarch||repo-oss|89d71c20ba199d5ab01e24ca8d9d8ad8d64e95f16983c118399a7b4eb93f272c|
2022-08-18 11:55:46|install|python3-salt|3004-150400.8.8.1|x86_64||repo-sle-update|cd0f786c1fc1be720eaa4c98f371f9ec4dfb9d8f70426cfa1b9477f9665f3110|
2022-08-18 11:55:46|install|salt|3004-150400.8.8.1|x86_64|root@openqa|repo-sle-update|39a52e8809f6cf55c7add5408136e57b55a1cc799dca212c51d818496c5f2349|
2022-08-18 11:55:46|patch  |openSUSE-SLE-15.4-2022-2119|1|noarch|repo-sle-update|important|recommended|not-needed|applied|
2022-08-18 11:55:46|patch  |openSUSE-SLE-15.4-2022-2831|1|noarch|repo-sle-update|moderate|security|not-needed|needed|
2022-08-18 11:55:46|patch  |openSUSE-SLE-15.4-2022-2304|1|noarch|repo-sle-update|important|security|not-needed|applied|
2022-08-18 11:55:58|command|root@openqa|'zypper' 'in' 'salt-master'|
2022-08-18 11:56:01|install|salt-master|3004-150400.8.8.1|x86_64|root@openqa|repo-sle-update|8a88fab48d176667c908ee6e248d4a82a5089d7abbe2a1f3122894be52e8de7c|

#4 Updated by mkittler about 1 month ago

That's the more detailed log from the uninstallation:

2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131   normal: 28427, 79923 literals
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 pkg rule memory used: 1160 K
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 pkg rule creation took 155 ms
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133 job: blacklist providing retracted-patch-package()
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133 job: blacklist providing ptf()
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133 job: install patch:openSUSE-SLE-15.4-2022-2831-1.noarch
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133   - job Rule #52783:
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     patch:openSUSE-SLE-15.4-2022-2831-1.noarch [180057] (w1)
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133 job: install providing glibc
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133   - job Rule #52784:
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     glibc-2.31-150300.20.7.x86_64 [90300] (w1)
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     glibc-2.31-150300.26.5.x86_64 [171973] (w2)
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     glibc-2.31-150300.31.2.x86_64 [171978]
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     glibc-2.31-150300.37.1.x86_64 [171983]
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     glibc-2.31-150300.37.1.x86_64 [180578]I
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133 job: erase openQA-local-db-4.6.1660648257.0cc7a55-lp154.5211.5.noarch
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133   - job Rule #52785:
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     !openQA-local-db-4.6.1660648257.0cc7a55-lp154.5211.5.noarch [25] (w1)
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133 job: erase openQA-local-db-4.6.1639414134.aa9bed13e-bp154.1.64.noarch
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133   - job Rule #52786:
2022-08-18 03:00:39 <1> openqa(30384) [libsolv] PoolImpl.cc(logSat):133     !openQA-local-db-4.6.1639414134.aa9bed13e-bp154.1.64.noarch [130809] (w1)
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 choice rule creation took 22 ms
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 49520 pkg rules, 2 * 1631 update rules, 4 job rules, 1 infarch rules, 0 dup rules, 2 choice rules, 0 best rules, 0 yumobs rules
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 0 black rules, 0 recommends rules, 0 repo priority rules
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 overall rule memory used: 1237 K
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 solver statistics: 0 learned rules, 7 unsolvable, 0 minimization steps
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 done solving.
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 solver took 21 ms
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 final solver statistics: 0 problems, 0 learned rules, 7 unsolvable
2022-08-18 03:00:39 <1> openqa(30384) [libsolv++] PoolImpl.cc(logSat):131 solver_solve took 255 ms
2022-08-18 03:00:39 <1> openqa(30384) [zypp::solver] SATResolver.cc(solving):556 ....Solver end
2022-08-18 03:00:39 <1> openqa(30384) [zypp::solver] SATResolver.cc(resolvePool):816 SATResolver::resolvePool() done. Ret:1
2022-08-18 03:00:39 <1> openqa(30384) [zypper] solve-commit.cc(solve_and_commit):675 got solution, showing summary
2022-08-18 03:00:39 <1> openqa(30384) [zypper] Summary.cc(readPool):103 Pool contains 66307 items.
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):104 Install summary:
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <install>   UNTu_(180057)patch:openSUSE-SLE-15.4-2022-2831-1.noarch(repo-sle-update)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181756)python3-py-1.8.1-5.6.1.noarch(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181771)python3-requests-2.24.0-1.24.noarch(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181773)python3-salt-3004-150400.8.8.1.x86_64(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181849)salt-3004-150400.8.8.1.x86_64(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181850)salt-master-3004-150400.8.8.1.x86_64(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181851)salt-minion-3004-150400.8.8.1.x86_64(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper++] Summary.cc(readPool):144 <uninstall> I_Ts_(181852)salt-ssh-3004-150400.8.8.1.x86_64(@System)
2022-08-18 03:00:39 <1> openqa(30384) [zypper] Summary.cc(readPool):327 package update candidates: 5
2022-08-18 03:00:39 <1> openqa(30384) [zypper] Summary.cc(readPool):328 to be actually updated: 0
2022-08-18 03:00:39 <1> openqa(30384) [zypper] Summary.cc(readPool):327 product update candidates: 0
2022-08-18 03:00:39 <1> openqa(30384) [zypper] Summary.cc(readPool):328 to be actually updated: 0

I couldn't find any interesting clues in the logs before (like a repo download error).

#5 Updated by mkittler about 1 month ago

  • Tags deleted (alert)
  • Assignee deleted (mkittler)
  • Target version deleted (Ready)

SR for removing --force-resolution: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/726

Btw, maybe this upstream issue is related: https://github.com/openSUSE/zypper/issues/446
However, this time I couldn't spot any repository refreshing errors in logs. Unfortunately journalctl -u auto-update.service only shows messages from May but not current ones.

#6 Updated by mkittler about 1 month ago

  • Assignee set to mkittler
  • Target version set to Ready

#7 Updated by mkittler about 1 month ago

  • Tags set to alert

#8 Updated by mkittler about 1 month ago

I installed all packages again that have been accidentally removed and retried the deployment which worked now.

Not sure why only these few packages were uninstalled. Unfortunately the logs only state what packages have been removed but not why.

#9 Updated by openqa_review about 1 month ago

  • Due date set to 2022-09-02

Setting due date based on mean cycle time of SUSE QE Tools

#10 Updated by tinita about 1 month ago

Regarding the missing journal logs: we fixed this in #115208 so from today on we should have more logs again.

#11 Updated by mkittler about 1 month ago

  • Status changed from In Progress to Resolved

The SR has been merged and I think https://github.com/openSUSE/zypper/issues/446 is still good enough as upstream bug.

Also available in: Atom PDF