action #157981: Upgrade osd webUI host to openSUSE Leap 15.6 size:S - openQA Project (public) - openSUSE Project Management Tool

Actions

action #157981

closed

coordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6

Upgrade osd webUI host to openSUSE Leap 15.6 size:S

Added by okurz about 1 year ago. Updated 7 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

nicksinger

Category:

Organisational

Target version:

Ready

Start date:

Due date:

% Done:

Estimated time:

Tags:

infra

Description

Motivation¶

Need to upgrade machines before EOL of Leap 15.5 and have a consistent environment

Acceptance criteria¶

AC1: osd webui host runs a clean upgraded openSUSE Leap 15.6 (no failed systemd services, no left over .rpm-new files, etc.)
AC2: The openQA database runs the default version of PostgreSQL in current Leap

Suggestions¶

read https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
Reserve some time when the instance is only executing a few or no openQA test jobs
After upgrade reboot and check everything working as expected
Consider upgrading PostgreSQL according to https://open.qa/docs/#_migrating_postgresql_database_on_opensuse

Further details¶

If we loose access to the machine we need the help of EngineeringInfrastructure as only they have access to the VM

Related issues 5 (1 open — 4 closed)

Actions

Copy link

Updated by okurz about 1 year ago

Copied from action #130594: Upgrade osd webUI host to openSUSE Leap 15.5 added

Actions

Copy link

Updated by okurz about 1 year ago

Subject changed from Upgrade osd webUI host to openSUSE Leap 15.5 to Upgrade osd webUI host to openSUSE Leap 15.6
Description updated (diff)
Assignee deleted (~~okurz~~)
Target version changed from Ready to future

Actions

Copy link

Updated by okurz about 1 year ago

Target version changed from future to Tools - Next

Actions

Copy link

Updated by okurz 10 months ago

Status changed from New to In Progress
Assignee set to okurz

In preparation of the upgrade I am already migrating postgres to 16:

oldver=15 newver=16
zypper in postgresql$newver-server postgresql$newver-contrib
sudo -u postgres /usr/lib/postgresql$newver/bin/initdb --encoding=UTF8 --locale=en_US.UTF-8 --lc-collate=C --lc-ctype=en_US.UTF-8 --lc-messages=C --lc-monetary=C --lc-numeric=C --lc-time=C -D /var/lib/pgsql/data.$newver
sudo -u postgres vimdiff /var/lib/pgsql/data.$oldver/postgresql.conf /var/lib/pgsql/data.$newver/postgresql.conf
sudo -u postgres /usr/lib/postgresql$newver/bin/pg_upgrade --check --link --old-bindir=/usr/lib/postgresql$oldver/bin --new-bindir=/usr/lib/postgresql$newver/bin --old-datadir=/var/lib/pgsql/data.$oldver --new-datadir=/var/lib/pgsql/data.$newver && systemctl stop openqa-webui openqa-scheduler openqa-livehandler openqa-gru postgresql && sudo -u postgres /usr/lib/postgresql$newver/bin/pg_upgrade --link --old-bindir=/usr/lib/postgresql$oldver/bin --new-bindir=/usr/lib/postgresql$newver/bin --old-datadir=/var/lib/pgsql/data.$oldver --new-datadir=/var/lib/pgsql/data.$newver && ln --force --no-dereference --relative --symbolic /var/lib/pgsql/data.$newver /var/lib/pgsql/data && systemctl start postgresql openqa-webui openqa-scheduler openqa-livehandler openqa-gru && sudo -u geekotest psql -c 'select version();' openqa

Actions

Copy link

Updated by okurz 10 months ago · Edited

Just prepared. Want to continue after EOB.

EDIT (2024-07-18 19:27Z): Done. Running pgsql 16 now. zypper se --installed-only postgres showed that we also had postgresql13 installed. I removed that but kept postgresql15 for now. Should delete the old data directory after some days without problems.

Actions

Copy link

Updated by okurz 10 months ago

Status changed from In Progress to New
Assignee deleted (~~okurz~~)

Actions

Copy link

Updated by livdywan 10 months ago

Subject changed from Upgrade osd webUI host to openSUSE Leap 15.6 to Upgrade osd webUI host to openSUSE Leap 15.6 size:S
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by okurz 7 months ago

Target version changed from Tools - Next to Ready

Actions

Copy link

Updated by okurz 7 months ago

Status changed from Workable to Blocked
Assignee set to okurz

#157978

Actions

Copy link

#10

Updated by okurz 7 months ago

Status changed from Blocked to Workable
Assignee deleted (~~okurz~~)

o3 webUI upgrade done. No relevant problems encountered.

Actions

Copy link

#11

Updated by tinita 7 months ago

Status changed from Workable to In Progress
Assignee set to nicksinger

Actions

Copy link

#12

Updated by nicksinger 7 months ago

Status changed from In Progress to Feedback

Upgrade conducted. After running zypper dup openqa-webui and openqa-gru failed with:

openqa:~ # systemctl status --failed
× openqa-gru.service - The openQA daemon for various background tasks like cleanup and saving needles
     Loaded: loaded (/usr/lib/systemd/system/openqa-gru.service; enabled; preset: disabled)
    Drop-In: /etc/systemd/system/openqa-gru.service.d
             └─30-openqa-hook-timeout.conf, override.conf
     Active: failed (Result: exit-code) since Wed 2024-10-16 18:10:59 UTC; 6min ago
   Duration: 2ms
   Main PID: 7394 (code=exited, status=217/USER)

Oct 16 18:10:59 openqa systemd[1]: openqa-gru.service: Scheduled restart job, restart counter is at 5.
Oct 16 18:10:59 openqa systemd[1]: Stopped The openQA daemon for various background tasks like cleanup and saving needles.
Oct 16 18:10:59 openqa systemd[1]: openqa-gru.service: Start request repeated too quickly.
Oct 16 18:10:59 openqa systemd[1]: openqa-gru.service: Failed with result 'exit-code'.
Oct 16 18:10:59 openqa systemd[1]: Failed to start The openQA daemon for various background tasks like cleanup and saving needles.

× openqa-webui.service - The openQA web UI
     Loaded: loaded (/usr/lib/systemd/system/openqa-webui.service; enabled; preset: disabled)
    Drop-In: /etc/systemd/system/openqa-webui.service.d
             └─30-openqa-webui-hook-timeout.conf, storage.conf
     Active: failed (Result: exit-code) since Wed 2024-10-16 18:10:58 UTC; 6min ago
   Duration: 3ms
   Main PID: 7357 (code=exited, status=217/USER)

Oct 16 18:10:58 openqa systemd[1]: Started The openQA web UI.
Oct 16 18:10:58 openqa (i-daemon)[7357]: openqa-webui.service: Failed to determine user credentials: No such process
Oct 16 18:10:58 openqa (i-daemon)[7357]: openqa-webui.service: Failed at step USER spawning /usr/share/openqa/script/openqa-webui-daemon: No such process
Oct 16 18:10:58 openqa systemd[1]: openqa-webui.service: Main process exited, code=exited, status=217/USER
Oct 16 18:10:58 openqa systemd[1]: openqa-webui.service: Failed with result 'exit-code'.
Oct 16 18:14:54 openqa systemd[1]: openqa-webui.service: Unit cannot be reloaded because it is inactive.

But after a reboot all issues went away and systemd reports "State: running".

I will monitor some jobs over the evening and will resolve tomorrow if no problems arise.

Actions

Copy link

#13

Updated by okurz 7 months ago

Related to action #168337: [tools]test fails in bootloader_zkvm - auto_review:"qemu-img.*Failed to get shared.*No locks available" added

Actions

Copy link

#14

Updated by nicksinger 7 months ago

Status changed from Feedback to In Progress

https://suse.slack.com/archives/C02CANHLANP/p1729129131475009 was reported which I looked into and caused https://progress.opensuse.org/issues/168358
Now checking again if the additional restart of the nfs-server might have fixed the original problem already and we can remove the transient workaround (nolocks option to the nfs client) applied from @okurz.

Actions

Copy link

#15

Updated by nicksinger 7 months ago

nicksinger wrote in #note-14:

https://suse.slack.com/archives/C02CANHLANP/p1729129131475009 was reported which I looked into and caused https://progress.opensuse.org/issues/168358
Now checking again if the additional restart of the nfs-server might have fixed the original problem already and we can remove the transient workaround (nolocks option to the nfs client) applied from @okurz.

We can't. But I found that adding nolock apparently also adds local_lock=all which was enough on zl12 to make it work again and I like it more then disabling locking completely (despite not knowing the possible problems). It might also give me a hint on what fails to write remotely and why and how to debug/fix it.

Actions

Copy link

#16

Updated by openqa_review 7 months ago

Due date set to 2024-11-01

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

#17

Updated by jbaier_cz 7 months ago

Related to action #168544: [alert] Failed systemd services alert: check-for-kernel-crash, kdump-notify added

Actions

Copy link

#18

Updated by nicksinger 7 months ago

Status changed from In Progress to Resolved

I consider the upgrade itself done. See linked issues for the related NFS issues and how they got addressed. Consider reopening them instead of this one.

Actions

Copy link

#19

Updated by okurz 7 months ago

Due date deleted (~~2024-11-01~~)

Actions

Copy link

#20

Updated by okurz 7 months ago

Copied to action #168721: OSD openqa.ini grossly incomplete added

Actions

Copy link

#21

Updated by okurz about 2 months ago

Copied to action #180728: Upgrade osd webUI host to openSUSE Leap 16.0 added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #157981

Upgrade osd webUI host to openSUSE Leap 15.6 size:S

Motivation¶

Acceptance criteria¶

Suggestions¶

Further details¶

Updated by okurz about 1 year ago

Updated by okurz about 1 year ago

Updated by okurz about 1 year ago

Updated by okurz 10 months ago

Updated by okurz 10 months ago · Edited

Updated by okurz 10 months ago

Updated by livdywan 10 months ago

Updated by okurz 7 months ago

Updated by okurz 7 months ago

Updated by okurz 7 months ago

Updated by tinita 7 months ago

Updated by nicksinger 7 months ago

Updated by okurz 7 months ago

Updated by nicksinger 7 months ago

Updated by nicksinger 7 months ago

Updated by openqa_review 7 months ago

Updated by jbaier_cz 7 months ago

Updated by nicksinger 7 months ago

Updated by okurz 7 months ago

Updated by okurz 7 months ago

Updated by okurz about 2 months ago