Project

General

Profile

Actions

action #158020

closed

QA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

QA - coordination #129280: [epic] Move from SUSE NUE1 (Maxtorhof) to new NBG Datacenters

salt-states-openqa pipeline times out

Added by livdywan about 1 month ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2024-03-26
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2425611

          ID: SUSE:SLE-15-SP6:Update:BCI
    Function: cmd.run
        Name: su geekotest -c 'mkdir -p SUSE:SLE-15-SP6:Update:BCI && python3 script/sctimeout: sending signal TERM to command 'ssh'

https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2425891

          ID: stop_and_disable_all_not_configured_workers
    Function: cmd.run
        Name: services=$(systemctl list-units --all 'openqa-worker-auto-restart@*.service' | sed -e '/.*openqa-worker-auto-restart@.*\.service.*/!d' -e 's|.*openqa-worker-auto-restart@\(.*\)\.service.*|\1|' | awk '{ if($0 > 16) print "openqa-worker-auto-restart@" $0 ".service openqa-reload-worker-auto-restart@" $0 ".path" }' | tr '\n' ' '); [ -z "$services" ] || systemctl disable --ntimeout: sending signal TERM to command 'ssh'

Suggestions


Related issues 2 (0 open2 closed)

Related to QA - action #133748: Move of openqaworker-arm-1 to FC Basement size:MResolvedybonatakis

Actions
Copied to openQA Infrastructure - action #158041: grenache needs upgrade to 15.5Resolvedokurz2024-03-262024-04-09

Actions
Actions #2

Updated by livdywan about 1 month ago

  • Tags changed from infra, FC Basement, rpi, reactive work to infra, reactive work
  • Due date deleted (2024-04-08)
  • Parent task deleted (#129280)
Actions #3

Updated by livdywan about 1 month ago

  • Description updated (diff)
Actions #5

Updated by okurz about 1 month ago

Actions #6

Updated by okurz about 1 month ago

  • Status changed from New to In Progress
  • Assignee set to okurz
Actions #7

Updated by okurz about 1 month ago

Right now I see a problem with openqaworker-arm-1 in the pipelines.

From ps auxf:

root      8869  0.0  0.0 1009464 82404 ?       S    06:16   0:01      \_ /usr/bin/python3 /usr/bin/salt-minion
root      8870  0.0  0.0   4068   960 ?        S    06:16   0:00      |   \_ /bin/sh -c set -x; retry -r 3 -- zypper --no-refresh -n du
root      8871  0.0  0.0   4068  2652 ?        S    06:16   0:00      |       \_ /bin/sh -e /usr/local/bin/retry -r 3 -- zypper --no-re
root      8875  0.1  0.0 328508 233924 ?       Sl   06:16   0:34      |           \_ zypper --no-refresh -n dup --replacefiles
root      8901  0.0  0.0  17368  3792 ?        S    06:16   0:00      |               \_ /usr/bin/systemd-inhibit --what=sleep:shutdown
root      8902  0.0  0.0   2168   492 ?        S    06:16   0:00      |               |   \_ /usr/bin/cat
root      8903  0.0  0.0   4068  2768 ?        S    06:16   0:00      |               \_ /bin/bash /usr/lib/zypp/plugins/commit/btrfs-d
root      8909  0.0  0.0  17232  6844 ?        S    06:16   0:00      |               \_ /usr/lib/zypp/plugins/commit/snapper-zypp-plug
root      8910  0.0  0.0  15120 11204 ?        S    06:16   0:00      |               \_ /usr/bin/python3 /usr/lib/zypp/plugins/commit/
root      8916  0.0  0.0  39892 34340 ?        D    06:16   0:00      |               \_ rpm --root / --dbpath /usr/lib/sysimage/rpm -…

so rpm is stuck for long. Seems like also df is stuck on I/O wait. Triggering a reboot.

Actions #8

Updated by okurz about 1 month ago

  • Related to action #133748: Move of openqaworker-arm-1 to FC Basement size:M added
Actions #9

Updated by okurz about 1 month ago

  • Parent task set to #129280

After reboot I did zypper dup to ensure a clean state and then sudo salt 'openqaworker-arm-1*' state.apply . I also had to unmask some services that seem to have been overlooked in

Actions

Also available in: Atom PDF