Project

General

Profile

action #99198

coordination #99183: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui, to openSUSE Leap 15.3

Upgrade osd webUI host to openSUSE Leap 15.3 size:M

Added by okurz 4 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

  • Need to upgrade machines before EOL of Leap 15.2 and have a consistent environment

Acceptance criteria

  • AC1: osd webui host runs a clean upgraded openSUSE Leap 15.3 (no failed systemd services, no left over .rpm-new files, etc.)

Suggestions

Further details

  • If we loose access to the machine we need the help of EngineeringInfrastructure as only they have access to the VM

Related issues

Related to openQA Project - action #103771: Retry on rsync errors like "exit code 5" instead of failing the job (which then retriggers)In Progress2021-12-09

Copied from openQA Infrastructure - action #75244: Upgrade osd webUI host to openSUSE Leap 15.2Resolved2020-10-24

History

#1 Updated by okurz 4 months ago

  • Copied from action #75244: Upgrade osd webUI host to openSUSE Leap 15.2 added

#2 Updated by okurz 4 months ago

  • Subject changed from Upgrade osd webUI host to openSUSE Leap 15.2 to Upgrade osd webUI host to openSUSE Leap 15.3
  • Description updated (diff)
  • Assignee deleted (cdywan)
  • Priority changed from High to Normal
  • Start date deleted (2020-10-24)

#3 Updated by okurz 4 months ago

  • Priority changed from Normal to Low

#4 Updated by okurz 4 months ago

  • Target version changed from Ready to future

#5 Updated by okurz about 2 months ago

  • Priority changed from Low to Normal

#6 Updated by okurz about 2 months ago

  • Priority changed from Normal to High
  • Target version changed from future to Ready

#7 Updated by cdywan about 2 months ago

  • Subject changed from Upgrade osd webUI host to openSUSE Leap 15.3 to Upgrade osd webUI host to openSUSE Leap 15.3 size:M
  • Status changed from New to Workable

#8 Updated by okurz about 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

I guess I will upgrade OSD now. I haven't done one of the other 15.3 upgrade tasks but only supported so far so I can go with this one myself I think. https://openqa.suse.de/tests/ shows that OSD is not that busy right now. The latest state of os-autoinst+openQA has been deployed yesterday in https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/271128 . The current uptime of OSD is 11 days. Following https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades

One minor change I needed to apply manually was to in /etc/zypp/repos.d/NON_Public_infrastructure.repo to replace openSUSE_Leap_$releasever with $releasever as the repo paths changed here.

Then continued with

zypper --releasever=$new -n ref && zypper --releasever=$new -n --no-refresh dup --auto-agree-with-licenses --replacefiles --download-in-advance --details

Due to recent events I am a bit careful regarding changes in package versions but for example I see
"perl-Mojo-IOLoop-ReadWriteProcess 0.28-lp152.2.1 -> 0.28-bp153.1.12" so that looks all good

Then I hit https://bugzilla.suse.com/show_bug.cgi?id=1192740 so applying the workaround, did

zypper --releasever=$new -n in suse-module-tools && zypper --releasever=$new -n ref && zypper --releasever=$new -n --no-refresh dup --auto-agree-with-licenses --replacefiles --download-in-advance --details

one systemd service failed, openqa-enqueue-asset-cleanup.service. Journal says

Dec 09 14:00:00 openqa systemd[10389]: openqa-enqueue-asset-cleanup.service: Failed to determine user credentials: No such process
Dec 09 14:00:00 openqa systemd[10389]: openqa-enqueue-asset-cleanup.service: Failed at step USER spawning /usr/share/openqa/script>

rebooted.

#9 Updated by okurz about 2 months ago

  • Status changed from In Progress to Resolved

Sent a notification in internal chat #eng-testing about the upgrade and the resolution.

All good after reboot. No failed services. Did sudo salt -C 'G@roles:webui' state.apply test=True, no failures. Did sudo salt -C 'G@roles:webui' state.apply. All good as well.

#10 Updated by cdywan about 2 months ago

I assume this is related? It was alerting very briefly, meaning it went OK as I was about to save the comment:

openqa_minion_workers.max_max 2.400
  • And again 18.56 and on-going with 2.600. Except https://openqa.suse.de/minion/workers shows 2 workers, not none, albeit Idle. No errors visible in sudo journalctl -fu openqa-gru.service.
  • sudo systemctl restart openqa-gru'ed anyway and now one of the Idle minion workers is gone and not coming back.

#11 Updated by okurz about 2 months ago

  • Related to action #103771: Retry on rsync errors like "exit code 5" instead of failing the job (which then retriggers) added

Also available in: Atom PDF