Project

General

Profile

action #105885

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #80908: [epic] Continuous deployment (package upgrade or config update) without interrupting currently running openQA jobs

Continuous deployment of o3 workers - all the other o3 workers size:M

Added by okurz 5 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Acceptance criteria

  • AC1: All o3 workers automatically deploy after every update to os-autoinst or openQA-worker

Suggestions

  • Since the automatic deployment has been done on openqaworker7 and #105379 contains enough information to apply the approach on other o3 workers we suggest to continue here
  • Ensure the root filesystem is mounted read-write, i.e. mount -o rw,remount /
  • Enable the timer for "openqa-continuous-update.service", i.e. systemctl enable --now openqa-continuous-update.timer
  • For checking call systemctl status openqa-continuous-update and check results, e.g. journalctl -e -u openqa-continuous-update and check for any unforeseen errors and such
  • Monitor the state of the systems after some hours
  • Monitor again on the next day

Related issues

Copied from openQA Project - action #105379: Continuous deployment of o3 workers - one worker first size:MResolved2022-01-24

Copied to openQA Project - action #111377: Continuous deployment of osd workers - similar as on o3 size:MRejected2022-05-20

History

#1 Updated by okurz 5 months ago

  • Copied from action #105379: Continuous deployment of o3 workers - one worker first size:M added

#2 Updated by mkittler 2 months ago

  • Target version changed from future to Ready

Since it has been done on openqaworker7 and #105379 contains enough information to apply the approach on other o3 workers I'd suggest we can continue here.

#3 Updated by cdywan about 2 months ago

  • Subject changed from Continuous deployment of o3 workers - all the other o3 workers to Continuous deployment of o3 workers - all the other o3 workers size:m
  • Description updated (diff)
  • Status changed from New to Workable

#4 Updated by mkittler about 1 month ago

  • Assignee set to mkittler

#5 Updated by mkittler about 1 month ago

Looks like just setting the mount option to rw won't cut it. What I've stated in #105379#note-10 is true, but when one wants to actually install a package one runs into:

error: can't create transaction lock on /usr/lib/sysimage/rpm/.rpm.lock (Read-only file system)

or just:

openqaworker1:~ # touch /foo
touch: '/foo' kann nicht berührt werden: Das Dateisystem ist nur lesbar

Using mount -o rw,remount / doesn't help as well. Not sure on which levels the file system is still set to be read-only.

#6 Updated by mkittler about 1 month ago

Looks like it is also read-only on btrfs-level but one can simply make it read-write:

openqaworker1:~ # btrfs property get -ts / ro
ro=true
openqaworker1:~ # btrfs property set -ts / ro false
openqaworker1:~ # touch /foo
openqaworker1:~ # rm /foo
openqaworker1:~ # zypper in openQA-continuous-update 
Repository-Daten werden geladen...
Installierte Pakete werden gelesen...
Paketabhängigkeiten werden aufgelöst...

Das folgende NEUE Paket wird installiert:
  openQA-continuous-update

1 neues Paket zu installieren.
Gesamtgröße des Downloads: 0 B. Bereits im Cache gespeichert: 341,9 KiB. Nach der Operation werden zusätzlich 1,3 KiB belegt.
Fortfahren? [j/n/v/...? zeigt alle Optionen] (j): 
Im Cache openQA-continuous-update-4.6.1652868008.418a4ec-lp153.4993.1.noarch.rpm                                                                                                                                                                                                       (1/1), 341,9 KiB (  1,3 KiB entpackt)
…
(1/1) Installieren: openQA-continuous-update-4.6.1652868008.418a4ec-lp153.4993.1.noarch
openqaworker1:~ # systemctl enable --now openqa-continuous-update.timer 
Created symlink /etc/systemd/system/timers.target.wants/openqa-continuous-update.timer → /usr/lib/systemd/system/openqa-continuous-update.timer.

I've just invoked openqaworker1:~ # systemctl start transactional-update.service to see whether the setup will persist after rebooting the system via the transactional setup. Unfortunately there's currently nothing to be updated:

Mai 18 14:06:46 openqaworker1 transactional-update[30392]: Calling zypper --no-cd dup
Mai 18 14:06:52 openqaworker1 transactional-update[30392]: zypper: nothing to update
Mai 18 14:06:52 openqaworker1 transactional-update[30392]: Removing snapshot #1336...
Mai 18 14:06:52 openqaworker1 transactional-update[31633]: 2022-05-18 14:06:52 tukit 3.6.2 started
Mai 18 14:06:52 openqaworker1 transactional-update[31633]: 2022-05-18 14:06:52 Options: abort 1336
Mai 18 14:06:53 openqaworker1 transactional-update[31633]: 2022-05-18 14:06:53 Discarding snapshot 1336.
Mai 18 14:06:53 openqaworker1 transactional-update[31633]: 2022-05-18 14:06:53 Transaction completed.
Mai 18 14:06:53 openqaworker1 transactional-update[30392]: transactional-update finished
Mai 18 14:06:53 openqaworker1 systemd[1]: transactional-update.service: Succeeded.
Mai 18 14:06:53 openqaworker1 systemd[1]: Finished Update the system.

So I'm waiting for some updates to test whether it actually works (before applying the btrfs changes on all other workers).

#7 Updated by mkittler about 1 month ago

  • Status changed from Workable to In Progress

#8 Updated by mkittler about 1 month ago

Now there were some updates. I also rebooted the system. The root filesystem is still read-write so I suppose it worked. It also doesn't look like there are any other read-only sub volumes left. So now I could apply the config on all other o3 workers (openqaworker1 and openqaworker7 have been handled).

#9 Updated by mkittler about 1 month ago

  • Status changed from In Progress to Feedback

Edited /etc/fstab and executed mount -o rw,remount / && btrfs property set -ts / ro false && zypper ref && zypper -n in openQA-continuous-update && systemctl enable --now openqa-continuous-update.timer on all o3 workers mentioned on https://progress.opensuse.org/projects/openqav3/wiki/#Manual-command-execution-on-o3-workers. (Of course I left out the file system configuration when no read-only btrfs was used anyways.)


Monitor again on the next day

To have a comparison, currently the fail/incomplete rate looks like this:

openqa=> with finished as (select result, t_finished from jobs) select (extract(YEAR from t_finished)) as year, (extract(MONTH from t_finished)) as month, (extract(DAY from t_finished)) as day, round(count(*) filter (where result = 'failed' or result = 'incomplete') * 100. / count(*), 2)::numeric(5,2)::float as ratio_of_all_failures_or_incompletes, count(*) total from finished where t_finished >= '2022-05-10' group by year, month, day order by year, month, day asc;
 year | month | day | ratio_of_all_failures_or_incompletes | total 
------+-------+-----+--------------------------------------+-------
 2022 |     5 |  10 |                                 39.6 |  2326
 2022 |     5 |  11 |                                59.21 |   983
 2022 |     5 |  12 |                                66.65 |  4893
 2022 |     5 |  13 |                                43.92 |  1152
 2022 |     5 |  14 |                                31.12 |  1401
 2022 |     5 |  15 |                                47.43 |  1908
 2022 |     5 |  16 |                                33.58 |  2543
 2022 |     5 |  17 |                                29.71 |  2198
 2022 |     5 |  18 |                                28.85 |  1234
(9 Zeilen)
openqa=> with finished as (select result, t_finished from jobs) select (extract(YEAR from t_finished)) as year, (extract(MONTH from t_finished)) as month, (extract(DAY from t_finished)) as day, round(count(*) filter (where result = 'incomplete') * 100. / count(*), 2)::numeric(5,2)::float as ratio_of_all_incompletes, count(*) total from finished where t_finished >= '2022-05-10' group by year, month, day order by year, month, day asc;
 year | month | day | ratio_of_all_incompletes | total 
------+-------+-----+--------------------------+-------
 2022 |     5 |  10 |                     6.88 |  2326
 2022 |     5 |  11 |                    13.22 |   983
 2022 |     5 |  12 |                    59.13 |  4893
 2022 |     5 |  13 |                     7.47 |  1152
 2022 |     5 |  14 |                     7.21 |  1401
 2022 |     5 |  15 |                     2.52 |  1908
 2022 |     5 |  16 |                     5.86 |  2543
 2022 |     5 |  17 |                     3.69 |  2198
 2022 |     5 |  18 |                     7.53 |  1235
(9 Zeilen)

#10 Updated by mkittler about 1 month ago

The journal looks good on the x86_64 workers. It seems like the repo wasn't reachable for some time but that didn't lead to any problems (like the service being stuck). The rate of failures/incompletes also didn't change significantly.

Apparently it doesn't work on aarch64 and power8. I'll look into it. Works now on these hosts as well. It was just a differently configured repo name.

#11 Updated by okurz about 1 month ago

  • Subject changed from Continuous deployment of o3 workers - all the other o3 workers size:m to Continuous deployment of o3 workers - all the other o3 workers size:M

#12 Updated by okurz about 1 month ago

  • Copied to action #111377: Continuous deployment of osd workers - similar as on o3 size:M added

#13 Updated by okurz about 1 month ago

  • Status changed from Feedback to Resolved

I checked

for i in openqaworker1 openqaworker4 openqaworker7 imagetester rebel; do echo $i && ssh root@$i "systemctl status openqa-continuous-update.service; rpm -q --changelog openQA | head; rpm -q --changelog os-autoinst | head" ; done

and all looks good.

I updated https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Recurring-automatic-update-of-openQA-workers

Considering this done.

Also available in: Atom PDF