Project

General

Profile

Actions

action #105379

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #80908: [epic] Continuous deployment (package upgrade or config update) without interrupting currently running openQA jobs

Continuous deployment of o3 workers - one worker first size:M

Added by okurz almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2022-01-24
Due date:
% Done:

0%

Estimated time:

Description

Acceptance criteria

  • AC1: o3 workers automatically deploy after every update to the package os-autoinst
  • AC2: No significant downtime due to updates

Suggestions

  • Just do this on one machine, extend to others in a later ticket* We can likely change our transactional workers to have a read-writable root partition while still doing a nightly transactional update and reboot. We effectively already do this on openqaworker7 which has a r/w root. So likely just a change in /etc/fstab while keeping the services transactional-update and rebootmgr in place. If you don't want to do that then just use one of the machines with r/w root
  • Try out zypper -n ref -r devel:openQA | grep -q 'is up to date' && zypper -n dup -r devel:openQA ||: in a systemd timer every 5 minutes
  • Include the change in github.com/os-autoinst/openQA/, updating or relating to https://github.com/os-autoinst/openQA/blob/master/systemd/opensuse/openqa-auto-update.service
  • Write down the exact commands being used so that we can extend the approach to other machines
  • Optional: Include the package openQA-worker
  • Optional: First try on one of our o3 workers, then extend to others

Related issues 4 (0 open4 closed)

Related to openQA Project - action #109851: os-autoinst was removed from o3 openqaworker7Resolvedmkittler2022-04-12

Actions
Related to openQA Infrastructure - action #111758: o3 jobs exceeding MAX_SETUP_TIME auto_review:"(?s)openqaworker4.*timeout: setup exceeded MAX_SETUP_TIME":retry size:MResolvedfavogt2022-05-30

Actions
Copied to openQA Project - action #105885: Continuous deployment of o3 workers - all the other o3 workers size:MResolvedmkittler

Actions
Copied to openQA Project - action #111989: Seems like o3 machines do not automatically reboot anymore, likely because we continuously call `zypper dup` so that the nightly upgrades don't find any changes? size:MResolvedokurz2022-01-24

Actions
Actions #1

Updated by livdywan over 2 years ago

  • Subject changed from Continuous deployment of o3 workers to Continuous deployment of o3 workers size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #2

Updated by okurz over 2 years ago

  • Copied to action #105885: Continuous deployment of o3 workers - all the other o3 workers size:M added
Actions #3

Updated by okurz over 2 years ago

  • Subject changed from Continuous deployment of o3 workers size:M to Continuous deployment of o3 workers - one worker first size:M
Actions #4

Updated by mkittler over 2 years ago

  • Assignee set to mkittler
Actions #5

Updated by okurz over 2 years ago

  • Description updated (diff)
Actions #6

Updated by mkittler over 2 years ago

  • Status changed from Workable to In Progress
Actions #8

Updated by mkittler over 2 years ago

Created a PR and deployed it on openqaworker7:

git clone https://github.com/Martchus/openQA.git
cd openQA
git checkout origin/systemd
cp -v --target-directory=/usr/lib/systemd/system systemd/opensuse/openqa-cont*
cp -v --target-directory=/usr/share/openqa/script script/openqa-cont*
systemctl daemon-reload
systemctl start openqa-continuous-update.timer start openqa-continuous-update.service
journalctl  --since '5 minutes ago' -fu openqa-continuous-update.service

It looks like it works:

Apr 11 16:56:09 openqaworker7 systemd[1]: Started Continuously deploys openQA, see https://progress.opensuse.org/issues/105379.
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31739]: devel:openQA looks good for Leap 15.3 (x86_64)
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31750]: Loading repository data...
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31750]: Reading installed packages...
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31750]: Computing distribution upgrade...
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31750]: The following 4 packages are going to be upgraded:
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31750]:   os-autoinst os-autoinst-devel os-autoinst-distri-opensuse-deps os-autoinst-openvswitch
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31750]: The following 55 packages are going to be REMOVED:
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31750]:   Mesa-libva acl at-spi2-atk-common at-spi2-atk-gtk2 brltty-driver-at-spi2 brltty-driver-brlapi brltty-driver-xwindow cmake-man cups-filters ethtool fuse htop ipmitool libaudiofile1 libblas3 libhwloc15 liblapack3 libmozjs-52 libqt5-qtimageformats monitoring-plugins-common monitoring-plugins-disk monitoring-plugins-load monitoring-plugins-ntp_time monitoring-plugins-procs monitoring-plugins-swap monitoring-plugins-users net-snmp nvme-cli openssh-askpass-gnome os-prober patch patterns-base-base patterns-base-minimal_base perl-SNMP python2-numpy python2-pycurl python3-numpy python3-pycurl python3-salt qemu-block-curl qemu-ksm qemu-ui-curses qemu-ui-gtk qemu-ui-spice-app salt salt-minion sensors tmux unzip-doc x3270 xbrlapi yast2-printer yast2-snapper yast2-vm zeromq-tools
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31750]: The following 2 patterns are going to be REMOVED:
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31750]:   base minimal_base
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31750]: 4 packages to upgrade, 55 to remove.
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31750]: Overall download size: 573.2 KiB. Already cached: 0 B. After the operation, 137.0 MiB will be freed.
Apr 11 16:56:09 openqaworker7 openqa-continuous-update[31750]: Continue? [y/n/v/...? shows all options] (y): y
Apr 11 16:56:14 openqaworker7 openqa-continuous-update[31750]: Retrieving package os-autoinst-4.6.1649676520.bb242777-lp153.1199.1.x86_64 (1/4), 330.3 KiB (832.1 KiB unpacked)
Apr 11 16:56:15 openqaworker7 openqa-continuous-update[31750]: Retrieving: os-autoinst-4.6.1649676520.bb242777-lp153.1199.1.x86_64.rpm [.done (45.0 KiB/s)]
Apr 11 16:56:15 openqaworker7 openqa-continuous-update[31750]: Retrieving package os-autoinst-devel-4.6.1649676520.bb242777-lp153.1199.1.x86_64 (2/4), 114.6 KiB (    0   B unpacked)
Apr 11 16:56:15 openqaworker7 openqa-continuous-update[31750]: Retrieving: os-autoinst-devel-4.6.1649676520.bb242777-lp153.1199.1.x86_64.rpm [done]
Apr 11 16:56:15 openqaworker7 openqa-continuous-update[31750]: Retrieving package os-autoinst-openvswitch-4.6.1649676520.bb242777-lp153.1199.1.x86_64 (3/4), 120.1 KiB (  9.6 KiB unpacked)
Apr 11 16:56:15 openqaworker7 openqa-continuous-update[31750]: Retrieving: os-autoinst-openvswitch-4.6.1649676520.bb242777-lp153.1199.1.x86_64.rpm [done]
Apr 11 16:56:15 openqaworker7 openqa-continuous-update[31750]: Retrieving package os-autoinst-distri-opensuse-deps-1.1649687466.791b743fdb-lp153.9703.1.noarch (4/4),   8.2 KiB (    0   B unpacked)
Apr 11 16:56:15 openqaworker7 openqa-continuous-update[31750]: Retrieving: os-autoinst-distri-opensuse-deps-1.1649687466.791b743fdb-lp153.9703.1.noarch.rpm [done]
Apr 11 16:56:20 openqaworker7 openqa-continuous-update[31750]: Checking for file conflicts: [.......done]
Apr 11 16:56:20 openqaworker7 [RPM][33367]: Transaction ID 62544194 started
Apr 11 16:56:20 openqaworker7 [RPM][33367]: erase Mesa-libva-20.2.4-57.12.x86_64: success
Apr 11 16:56:20 openqaworker7 [RPM][33367]: erase Mesa-libva-20.2.4-57.12.x86_64: success
Apr 11 16:56:20 openqaworker7 [RPM][33367]: Transaction ID 62544194 finished: 0
Apr 11 16:56:20 openqaworker7 openqa-continuous-update[31750]: ( 1/59) Removing Mesa-libva-20.2.4-57.12.x86_64 [.....done]
…
Apr 11 16:57:04 openqaworker7 openqa-continuous-update[31750]: (59/59) Installing: os-autoinst-distri-opensuse-deps-1.1649687466.791b743fdb-lp153.9703.1.noarch [......done]
Apr 11 16:57:13 openqaworker7 openqa-continuous-update[31750]: There are running programs which still use files and libraries deleted or updated by recent upgrades. They should be restarted to benefit from the latest updates. Run 'zypper ps -s' to list these programs.
Apr 11 16:57:13 openqaworker7 openqa-continuous-update[31750]:
Apr 11 16:57:13 openqaworker7 systemd[1]: openqa-continuous-update.service: Succeeded.
Apr 11 16:57:48 openqaworker7 systemd[1]: Started Continuously deploys openQA, see https://progress.opensuse.org/issues/105379.
Apr 11 16:57:48 openqaworker7 openqa-continuous-update[38600]: devel:openQA looks good for Leap 15.3 (x86_64)
Apr 11 16:57:48 openqaworker7 openqa-continuous-update[38611]: Loading repository data...
Apr 11 16:57:48 openqaworker7 openqa-continuous-update[38611]: Reading installed packages...
Apr 11 16:57:48 openqaworker7 openqa-continuous-update[38611]: Computing distribution upgrade...
Apr 11 16:57:49 openqaworker7 openqa-continuous-update[38611]: Nothing to do.
Apr 11 16:57:49 openqaworker7 systemd[1]: openqa-continuous-update.service: Succeeded.
Apr 11 17:08:37 openqaworker7 systemd[1]: Started Continuously deploys openQA, see https://progress.opensuse.org/issues/105379.
Apr 11 17:08:37 openqaworker7 openqa-continuous-update[3339]: devel:openQA looks good for Leap 15.3 (x86_64)
Apr 11 17:08:37 openqaworker7 openqa-continuous-update[3350]: Loading repository data...
Apr 11 17:08:37 openqaworker7 openqa-continuous-update[3350]: Reading installed packages...
Apr 11 17:08:37 openqaworker7 openqa-continuous-update[3350]: Computing distribution upgrade...
Apr 11 17:08:38 openqaworker7 openqa-continuous-update[3350]: Nothing to do.
Apr 11 17:08:38 openqaworker7 systemd[1]: openqa-continuous-update.service: Succeeded.
Actions #9

Updated by openqa_review over 2 years ago

  • Due date set to 2022-04-26

Setting due date based on mean cycle time of SUSE QE Tools

Actions #10

Updated by mkittler over 2 years ago

Looks like mounting / as rw filesystem is all that's needed so the system is no longer considered a transactional server:

openqaworker1:~ # zypper in tmux
Dies ist ein transaktionaler Server. Verwenden Sie eine transaktionale Aktualisierung, um das System zu aktualisieren oder zu ändern.
openqaworker1:~ # mount -o rw,remount /
openqaworker1:~ # zypper in tmux
Metadaten von Repository 'Providing openQA dependencies (openSUSE_Leap_15.3)' abrufen…

So I suppose replacing / btrfs ro with / btrfs defaults in /etc/fstab would do the trick (1).

Not sure about overlay /etc overlay defaults,upperdir=…. It is likely best to keep it as-is.


I've just did the change (1) on openqaworker1. It persisted after rebooting so I suppose that's all we need to do.

Actions #11

Updated by mkittler over 2 years ago

Looks like zypper ref exited early, maybe related to using a pipe:

2022-04-12 00:05:16 <1> openqaworker7(31986) [zypp] RepoManager.cc(checkIfToRefreshMetadata):1093 repo has not changed
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypp-core] PathInfo.cc(touch):1243 touch /var/cache/zypp/raw/devel_openQA/repodata/repomd.xml
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper] refresh.cc(refreshRepository):162 calling buildCache
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypp] RepoManager.cc(buildCache):1301 devel_openQA is already cached.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypp] RepoManager.cc(buildCache):1306 devel_openQA cache is up to date with metadata.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypp::satpool] PoolImpl.cc(setDirty):255 _createRepo devel_openQA
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypp::satpool] PoolImpl.cc(setDirty):255 _addSolv devel_openQA
2022-04-12 00:05:16 <1> openqaworker7(31986) [libsolv++] PoolImpl.cc(logSat):127 repo_add_solv took 0 ms
2022-04-12 00:05:16 <1> openqaworker7(31986) [libsolv++] PoolImpl.cc(logSat):127 repo size: 51 solvables
2022-04-12 00:05:16 <1> openqaworker7(31986) [libsolv++] PoolImpl.cc(logSat):127 repo memory used: 6 K incore, 7 K idarray
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypp] Repository.cc(addSolv):336 sat::repo(devel_openQA){prio 0.0, size 37} after adding /var/cache/zypp/solv/devel_openQA/solv
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypp::satpool] PoolImpl.cc(setDirty):255 setRepoInfo devel_openQA
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypp] Repository.cc(setInfo):288 sat::repo(devel_openQA){prio -95.2, size 37}
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):223 devel_openQA_Modules(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):243 repo-backports-debug-update(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):223 repo-backports-update(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):243 repo-debug(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):243 repo-debug-non-oss(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):243 repo-debug-update(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):243 repo-debug-update-non-oss(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):223 repo-non-oss(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):223 repo-oss(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):243 repo-sle-debug-update(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):223 repo-sle-update(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):243 repo-source(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):243 repo-source-non-oss(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):223 repo-update(#) not specified, skipping.
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper++] refresh.cc(refreshRepositories):223 repo-update-non-oss(#) not specified, skipping.
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(testPipe):72 FD(1) pipe is broken
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 Exiting on SIGPIPE...
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [hd]: (-3) /usr/lib64/libzypp.so.1722 : zypp::dumpBacktrace(std::ostream&)+0x39 [0x7ff235d2e6b9]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [hd]: (-2) zypper : signal_nopipe(int)+0x73 [0x55ac377b3173]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [hd]: (-1) /lib64/libc.so.6 : +0x4ad70 [0x7ff234782d70]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 vvvvvvvvvv----------------------------------------
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (0) /lib64/libc.so.6 : __write+0x49 [0x7ff23483eb47]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (1) /lib64/libc.so.6 : _IO_file_write+0x2f [0x7ff2347cadcd]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (2) /lib64/libc.so.6 : +0x9202f [0x7ff2347ca02f]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (3) /lib64/libc.so.6 : _IO_do_write+0x1b [0x7ff2347cbf89]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (4) /lib64/libc.so.6 : _IO_file_sync+0xba [0x7ff2347c9e28]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (5) /lib64/libc.so.6 : fflush+0x84 [0x7ff2347bd5f2]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (6) /usr/lib64/libstdc++.so.6 : std::ostream::flush()+0x23 [0x7ff234e7aa53]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (7) zypper : OutNormal::info(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Out::Verbosity, zypp::base::Flags<Out::TypeBit>)+0x114 [0x55ac378ff1c4]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (8) zypper : RefreshRepoCmd::refreshRepositories(Zypper&, zypp::base::Flags<RefreshRepoCmd::RefreshFlagsBits>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)+0x976 [0x55ac3788cdf6]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (9) zypper : RefreshRepoCmd::execute(Zypper&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+0x374 [0x55ac3788e254]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (10) zypper : ZypperBaseCommand::run(Zypper&)+0x153 [0x55ac3783c303]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (11) zypper : Zypper::doCommand(int, char**, int)+0xd10 [0x55ac377dc620]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (12) zypper : Zypper::main(int, char**)+0x49 [0x55ac377b0549]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (13) zypper : main+0x467 [0x55ac377af7d7]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (14) /lib64/libc.so.6 : __libc_start_main+0xef [0x7ff23476d2bd]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] main.cc(signal_nopipe):88 [bt]: (15) zypper : _start+0x2a [0x55ac377b28fa]
2022-04-12 00:05:16 <2> openqaworker7(31986) [zypper] Zypper.h(setExitCode):162 setExitCode 0
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper] Zypper.cc(doCommand):677 Done
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper] Zypper.cc(cleanup):729 START
2022-04-12 00:05:16 <1> openqaworker7(31986) [zypper] main.cc(~Bye):98 ===== Exiting main(0) =====

Due to that it removed os-autoinst:

    Apr 12 03:47:02 openqaworker7 openqa-continuous-update[3847]: devel:openQA looks good for Leap 15.3 (x86_64)
    Apr 12 03:47:02 openqaworker7 openqa-continuous-update[3858]: Loading repository data...
    Apr 12 03:47:02 openqaworker7 openqa-continuous-update[3858]: Reading installed packages...
    Apr 12 03:47:03 openqaworker7 openqa-continuous-update[3858]: Computing distribution upgrade...
    Apr 12 03:47:03 openqaworker7 openqa-continuous-update[3858]: The following 4 packages are going to be REMOVED:
    Apr 12 03:47:03 openqaworker7 openqa-continuous-update[3858]:   os-autoinst os-autoinst-devel os-autoinst-openvswitch yast2-snapper
    Apr 12 03:47:03 openqaworker7 openqa-continuous-update[3858]: 4 packages to remove.
    Apr 12 03:47:03 openqaworker7 openqa-continuous-update[3858]: After the operation, 948.8 KiB will be freed.
    Apr 12 03:47:03 openqaworker7 openqa-continuous-update[3858]: Continue? [y/n/v/...? shows all options] (y): y
    Apr 12 03:47:15 openqaworker7 [RPM][5651]: Transaction ID 6254da23 started
    Apr 12 03:47:22 openqaworker7 [RPM][5651]: erase os-autoinst-4.6.1649692679.6d936fdc-lp153.1200.1.x86_64: success
    Apr 12 03:47:22 openqaworker7 [RPM][5651]: erase os-autoinst-4.6.1649692679.6d936fdc-lp153.1200.1.x86_64: success
    Apr 12 03:47:22 openqaworker7 [RPM][5651]: Transaction ID 6254da23 finished: 0
    Apr 12 03:47:22 openqaworker7 openqa-continuous-update[3858]: (1/4) Removing os-autoinst-4.6.1649692679.6d936fdc-lp153.1200.1.x86_64 [.....done]
    Apr 12 03:47:22 openqaworker7 [RPM][5654]: Transaction ID 6254da2a started
    Apr 12 03:47:22 openqaworker7 [RPM][5654]: erase os-autoinst-devel-4.6.1649692679.6d936fdc-lp153.1200.1.x86_64: success
    Apr 12 03:47:22 openqaworker7 [RPM][5654]: erase os-autoinst-devel-4.6.1649692679.6d936fdc-lp153.1200.1.x86_64: success
    Apr 12 03:47:22 openqaworker7 [RPM][5654]: Transaction ID 6254da2a finished: 0
    Apr 12 03:47:22 openqaworker7 openqa-continuous-update[3858]: (2/4) Removing os-autoinst-devel-4.6.1649692679.6d936fdc-lp153.1200.1.x86_64 [.....done]
    Apr 12 03:47:22 openqaworker7 [RPM][5655]: Transaction ID 6254da2a started
    Apr 12 03:47:22 openqaworker7 [RPM][5655]: erase os-autoinst-openvswitch-4.6.1649692679.6d936fdc-lp153.1200.1.x86_64: success
    Apr 12 03:47:22 openqaworker7 openqa-continuous-update[3858]: (3/4) Removing os-autoinst-openvswitch-4.6.1649692679.6d936fdc-lp153.1200.1.x86_64 [...
    Apr 12 03:47:22 openqaworker7 openqa-continuous-update[3858]: Removed /etc/systemd/system/multi-user.target.wants/os-autoinst-openvswitch.service.
    Apr 12 03:47:22 openqaworker7 [RPM][5655]: erase os-autoinst-openvswitch-4.6.1649692679.6d936fdc-lp153.1200.1.x86_64: success
    Apr 12 03:47:22 openqaworker7 [RPM][5655]: Transaction ID 6254da2a finished: 0
    Apr 12 03:47:22 openqaworker7 openqa-continuous-update[3858]: ..done]
    Apr 12 03:47:22 openqaworker7 [RPM][5699]: Transaction ID 6254da2a started
    Apr 12 03:47:26 openqaworker7 [RPM][5699]: erase yast2-snapper-4.2.0-1.152.x86_64: success
    Apr 12 03:47:26 openqaworker7 [RPM][5699]: erase yast2-snapper-4.2.0-1.152.x86_64: success
    Apr 12 03:47:26 openqaworker7 [RPM][5699]: Transaction ID 6254da2a finished: 0
    Apr 12 03:47:26 openqaworker7 openqa-continuous-update[3858]: (4/4) Removing yast2-snapper-4.2.0-1.152.x86_64 [.....done]

According to DimStar -r limits the dependency resolution to just that repository so we must not use it and better just upgrade everything.


For now I've disabled the timer on openqaworker7 again.

Actions #12

Updated by mkittler over 2 years ago

  • Related to action #109851: os-autoinst was removed from o3 openqaworker7 added
Actions #13

Updated by mkittler over 2 years ago

Note that @fvogt created echo "requires:openQA-worker" > /etc/zypp/systemCheck.d/openqa.check on openqaworker7 (see https://doc.opensuse.org/projects/libzypp/HEAD/group__ZyppConfig.html). However, it wouldn't have helped here because os-autoinst was still installed, at least some version. Thus the dependencies of openQA-worker were not broken (as far as rpm/zypper is concerned) and I also had to force-install os-autoinst again¹ to restore the files missing on disk.


¹ This is the version which was still installed:

Kein Aktualisierungskandidat für 'os-autoinst-4.6.1649718105.f9aef8f1-lp153.1201.1.x86_64'. Die neueste Version ist bereits installiert
Actions #14

Updated by mkittler over 2 years ago

Since the timer/service didn't do any harm so far I'll leave it enabled. I also don't expect any further problems considering our main mistake was using -r and that's fixed.

Actions #15

Updated by mkittler over 2 years ago

  • Status changed from In Progress to Feedback

Following up the log from yesterday everything looks still good and the fail/incomplete-rate of openqaworker7 is also normal:

openqa=> with finished as (select result, t_finished, host from jobs left join workers on jobs.assigned_worker_id = workers.id where result != 'none') select host, round(count(*) filter (where result='failed' or result='incomplete') * 100. / count(*), 2)::numeric(5,2)::float as ratio_failed_by_host, count(*) total from finished where t_finished >= '2022-04-13' group by host order by ratio_failed_by_host desc;
      host       | ratio_failed_by_host | total 
-----------------+----------------------+-------
 power8          |                44.24 |   269
 siodtw01        |                   40 |    15
 openqa-aarch64  |                36.59 |    82
 openqaworker4   |                 27.4 |   500
 openqaworker1   |                27.31 |   553
 imagetester     |                24.53 |    53
 openqaworker7   |                20.62 |   417
 ip-10-252-32-98 |                19.35 |    62
 ip-10-252-32-90 |                18.75 |    16
                 |                    0 |   660

I have also checked whether updates cause forceful restarts of openqa-worker-auto-restart@.service units but it doesn't seem to be the case. So jobs running jobs are not disturbed and thus AC2 is fulfilled. (Actually no surprise considering that's being taken care of in openQA.spec; so those units are only reloaded. We only need to take care that openqa-worker.target is not used.)


I'm keeping the timer running during the Easter vacation and will resolve the ticket if there are no further problems until Wednesday.

Then we can extend the approach to other workers. As mentioned in #105379#note-10 we can easily reconfigure the transactional workers (already done for openqaworker1).

Actions #16

Updated by mkittler over 2 years ago

  • Status changed from Feedback to Resolved

I've checked the log of openqaworker7 again and the statistics still look good:

openqa=> with finished as (select result, t_finished, host from jobs left join workers on jobs.assigned_worker_id = workers.id where result != 'none') select host, round(count(*) filter (where result='failed' or result='incomplete') * 100. / count(*), 2)::numeric(5,2)::float as ratio_failed_by_host, count(*) total from finished where t_finished >= '2022-04-13' group by host order by ratio_failed_by_host desc;
          host           | ratio_failed_by_host | total 
-------------------------+----------------------+-------
 localhost               |                  100 |    18
 siodtw01                |                85.95 |   121
 oss-cobbler-03          |                84.21 |    19
 openqa-aarch64          |                49.07 |  1506
 power8                  |                43.54 |   379
 openqaworker1_container |                42.37 |    59
 ip-10-252-32-90         |                37.11 |   194
 ip-10-252-32-98         |                36.49 |   962
 openqaworker4           |                28.07 |  2451
 openqaworker1           |                25.71 |  2497
 imagetester             |                22.84 |   232
 openqaworker7           |                19.38 |  1904
                         |                    0 |  1447
(13 Zeilen)

So I'm marking the ticket as resolved as it was only about the first worker.

Actions #17

Updated by okurz over 2 years ago

https://github.com/os-autoinst/openQA/pull/4602 was the pull request that we had for the systemd service extension.

Actions #18

Updated by okurz over 2 years ago

  • Due date deleted (2022-04-26)
Actions #19

Updated by okurz over 2 years ago

  • Related to action #111758: o3 jobs exceeding MAX_SETUP_TIME auto_review:"(?s)openqaworker4.*timeout: setup exceeded MAX_SETUP_TIME":retry size:M added
Actions #20

Updated by okurz over 2 years ago

  • Copied to action #111989: Seems like o3 machines do not automatically reboot anymore, likely because we continuously call `zypper dup` so that the nightly upgrades don't find any changes? size:M added
Actions

Also available in: Atom PDF