Project

General

Profile

Actions

tickets #130519

closed

Leap 15.5 upgrade diary

Added by crameleon over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
2023-06-07
Due date:
% Done:

90%

Estimated time:

Description

Let's use this issue to comment on the 15.5 upgrade in our infrastructure.


Checklist

  • Repair Cachet package / PHP dependencies
  • Repair Matomo package / PHP dependencies
  • Clean up repositories on provo-mirror
  • Repair MariaDB on status2
  • Repair Percona Monitoring package
  • Repair Icinga update reporting for 15.5

Related issues 1 (0 open1 closed)

Related to openQA Project (public) - coordination #130582: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.5Resolvedokurz2023-06-09

Actions
Actions #1

Updated by crameleon over 1 year ago

  • Private changed from Yes to No
Actions #2

Updated by crameleon over 1 year ago

paste.i.o.o

-> no redis service was enabled, causing paste-sidekiq to fail, zypper log revealed:

# 2023-06-07 19:23:05 redis7-7.0.8-150500.1.2.x86_64.rpm installed ok
# Additional rpm output:
# See /usr/share/doc/packages/redis/README.SUSE to continue
# Removed /etc/systemd/system/multi-user.target.wants/redis@default.service.
# Removed /etc/systemd/system/redis.target.wants/redis@default.service.

-> solved using systemctl enable --now redis@default

matomo.i.o.o:

-> dependency issues with the matomo packages
-> upgraded manually by using the "keep osbolete ... matomo*..." option
-> issue with redis being disabled, zypper log revealed:

# 2023-06-07 20:42:55 redis7-7.0.8-150500.1.2.x86_64.rpm installed ok
# Additional rpm output:
# See /usr/share/doc/packages/redis/README.SUSE to continue
# Removed /etc/systemd/system/redis.target.wants/redis@matomo.service.
# Removed /etc/systemd/system/multi-user.target.wants/redis@matomo.service.

-> solved using systemctl enable --now redis@matomo

Edit: Matomo package now upgraded, PHP7 replaced with PHP8, very minor dependency issue resolved via https://build.opensuse.org/request/show/1092016. To-do, unrelated to this upgrade project, replace Prefork with Event + FPM.

etherpad.i.o.o:

-> could not find the new signing key to import, manually re-installing it revealed

Problem: the to be installed openSUSE-build-key-1.0-lp154.3.11.1.noarch conflicts with 'suse-build-key' provided by the installed suse-build-key-12.0-150000.8.31.1.noarch
 Solution 1: deinstallation of suse-build-key-12.0-150000.8.31.1.noarch
 Solution 2: do not install openSUSE-build-key-1.0-lp154.3.11.1.noarch

-> solved using zypper rm suse-buildkey && zypper in openSUSE-build-key

limesurvey.i.o.o:

-> php-fpm service not enabled (my fault, switched from prefork to event+fpm the other day)
-> solved using systemctl enable --now php-fpm

moodle.i.o.o:

-> same issue with redis@moodle, not going to repeat myself

Actions #3

Updated by crameleon over 1 year ago

Actions #4

Updated by crameleon over 1 year ago

anna.i.o.o:

-> nothing to report

elsa.i.o.o:

-> nothing to report

minnie.i.o.o:

-> nothing to report

backup.i.o.o:

-> nothing to report

elections2.i.o.o:

-> can someone validate this works as intended?

jenkins-agent.i.o.o:

-> nothing to report

lnt.i.o.o:

-> nothing to report

mx-test.i.o.o:

-> nothing to report

minio.i.o.o:

-> nothing to report

metrics.i.o.o:

-> nothing to report

narwal{4,5,6,7}.i.o.o:

-> nothing to report

mybackup.i.o.o:

-> nothing to report

Actions #5

Updated by crameleon over 1 year ago

new-forum.i.o.o:

-> nothing to report

nue-ns{1,2}.i.o.o:

-> nothing to report

Actions #6

Updated by crameleon over 1 year ago

opi-proxy.i.o.o:

-> nothing to report

status2.i.o.o:

-> php7/8 dependency issues with Cachet:

Problem: the to be installed Cachet-config-apache-2.5.1-lp155.3.4.noarch requires 'php > 8', but this requirement cannot be provided
  not installable providers: php8-8.0.28-150400.4.31.1.x86_64[repo-oss]
 Solution 1: Following actions will be done:
  deinstallation of php7-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-zlib-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-cli-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-ctype-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-curl-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-dom-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-fileinfo-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-gd-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-iconv-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-intl-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-mbstring-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-mysql-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-openssl-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-pdo-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-phar-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-tokenizer-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-xmlreader-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-xmlwriter-7.4.33-150400.4.22.1.x86_64
  deinstallation of php7-zip-7.4.33-150400.4.22.1.x86_64
  deinstallation of apache2-mod_php7-7.4.33-150400.4.22.1.x86_64
 Solution 2: deinstallation of Cachet-config-apache-2.5.1-lp154.2.1.noarch
 Solution 3: keep obsolete Cachet-config-apache-2.5.1-lp154.2.1.noarch
 Solution 4: break Cachet-config-apache-2.5.1-lp155.3.4.noarch by ignoring some of its dependencies

-> upgraded using the "keep obsolete ..." option, who wants to repair the package?
-> MariaDB startup issue:

status2 (provo):~ # systemctl status mariadb
× mariadb.service - MariaDB database server
     Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Wed 2023-06-07 21:52:24 UTC; 29s ago
       Docs: man:mysqld(8)
             https://mariadb.com/kb/en/library/systemd/
    Process: 1529 ExecStartPre=/usr/lib/mysql/mysql-systemd-helper install (code=exited, status=0/SUCCESS)
    Process: 1540 ExecStartPre=/usr/lib/mysql/mysql-systemd-helper upgrade (code=exited, status=1/FAILURE)

Jun 07 21:51:23 status2 mysql-systemd-helper[1540]: Running protected MySQL...
Jun 07 21:51:23 status2 mysql-systemd-helper[1540]: Waiting for MySQL to start
Jun 07 21:51:24 status2 mysql-systemd-helper[1555]: 2023-06-07 21:51:24 0 [Warning] Can't create test file /var/lib/mysql/status2.lower-test
Jun 07 21:51:24 status2 mysql-systemd-helper[1555]: [90B blob data]
Jun 07 21:51:24 status2 mysql-systemd-helper[1555]: 2023-06-07 21:51:24 0 [ERROR] Aborting
Jun 07 21:52:24 status2 mysql-systemd-helper[1540]: MySQL is still dead
Jun 07 21:52:24 status2 mysql-systemd-helper[1540]: MySQL didn't start, can't continue
Jun 07 21:52:24 status2 systemd[1]: mariadb.service: Control process exited, code=exited, status=1/FAILURE
Jun 07 21:52:24 status2 systemd[1]: mariadb.service: Failed with result 'exit-code'.
Jun 07 21:52:24 status2 systemd[1]: Failed to start MariaDB database server.

There are no AppArmor denials for the mentioned /var/lib/mysql/status2.lower-test file. The directory is read/writable by mysql. Upon moving the existing data directory away, and creating a new one with mysql:mysql and 750 permissions, the result is similar:

Jun 10 00:59:52 status2 mysql-systemd-helper[17958]: Creating MySQL privilege database...
Jun 10 00:59:52 status2 mysql-systemd-helper[17993]: 2023-06-10  0:59:52 0 [Warning] Can't create test file /var/lib/mysql/status2.lower-test
Jun 10 00:59:52 status2 mysql-systemd-helper[17993]: /usr/sbin/mariadbd: Cannot change uid/gid (errno: 1)
Jun 10 00:59:52 status2 mysql-systemd-helper[17993]: 2023-06-10  0:59:52 0 [ERROR] Aborting
Jun 10 00:59:52 status2 mysql-systemd-helper[17992]: cat: write error: Broken pipe
Jun 10 00:59:52 status2 mysql-systemd-helper[17991]: cat: write error: Broken pipe
Jun 10 00:59:52 status2 mysql-systemd-helper[17994]: cat: write error: Broken pipe
Jun 10 00:59:52 status2 mysql-systemd-helper[17965]: Installation of system tables failed!  Examine the logs

Now, what is this stray AppArmor profile about?

type=AVC msg=audit(1686358792.586:112623): apparmor="DENIED" operation="capable" profile="/usr/sbin/mariadbd" pid=17993 comm="mariadbd" capability=6  capname="setgid"

rpm -Vf /etc/apparmor.d/usr.sbin.mariadbd
file /etc/apparmor.d/usr.sbin.mariadbd is not owned by any package

After adding

  capability setgid,
  capability setuid,

to this AppArmor profile, the service still complains about the lower-test file, but proceeds to start up as expected. I placed back the original data directory.

provo-ns.i.o.o:

-> nothing to report

rpmlint.i.o.o:

-> nothing to report

Actions #7

Updated by crameleon over 1 year ago

tsp.i.o.o:

-> nothing to report

monitor.i.o.o:

-> ClamAV package conflict, same issue as reported in https://progress.opensuse.org/issues/129187

File /usr/sbin/clamav-milter
  from install of
     clamav-0.103.8-150000.3.44.1.x86_64 (repo-oss)
  conflicts with file from install of
     clamav-milter-0.103.8-lp155.2.4.x86_64 (openSUSE:infrastructure)

-> resolved using zypper rm clamav-milter, keeping the clamav-milter binary from the clamav package in OSS - if someone thinks the version in o:i is better, feel free to replace it by installing both clamav and clamav-milter from o:i

-> systemd-sysctl fails:

crameleon@monitor:/home/crameleon> sudo systemctl status systemd-sysctl
[sudo] password for crameleon:
× systemd-sysctl.service - Apply Kernel Variables
     Loaded: loaded (/usr/lib/systemd/system/systemd-sysctl.service; static)
    Drop-In: /usr/lib/systemd/system/systemd-sysctl.service.d
             └─50-kernel-uname_r.conf
     Active: failed (Result: exit-code) since Wed 2023-06-07 22:18:57 UTC; 3min 56s ago
       Docs: man:systemd-sysctl.service(8)
             man:sysctl.d(5)
    Process: 463 ExecStart=/usr/lib/systemd/systemd-sysctl (code=exited, status=1/FAILURE)
   Main PID: 463 (code=exited, status=1/FAILURE)

Jun 07 22:18:57 monitor systemd-sysctl[463]: Couldn't write '0' to 'net/ipv4/conf/external/log_martians', ignoring: No such file or directory
Jun 07 22:18:57 monitor systemd-sysctl[463]: Couldn't write '0' to 'net/bridge/bridge-nf-call-arptables', ignoring: No such file or directory
Jun 07 22:18:57 monitor systemd-sysctl[463]: Couldn't write '0' to 'net/bridge/bridge-nf-call-ip6tables', ignoring: No such file or directory
Jun 07 22:18:57 monitor systemd-sysctl[463]: Couldn't write '0' to 'net/bridge/bridge-nf-call-iptables', ignoring: No such file or directory
Jun 07 22:18:57 monitor systemd-sysctl[463]: Couldn't write '0' to 'net/bridge/bridge-nf-filter-pppoe-tagged', ignoring: No such file or directory
Jun 07 22:18:57 monitor systemd-sysctl[463]: Couldn't write '0' to 'net/bridge/bridge-nf-filter-vlan-tagged', ignoring: No such file or directory
Jun 07 22:18:57 monitor systemd-sysctl[463]: Couldn't write '1' to 'net/ipv4/tcp_tw_recycle', ignoring: No such file or directory
Jun 07 22:18:57 monitor systemd[1]: systemd-sysctl.service: Main process exited, code=exited, status=1/FAILURE
Jun 07 22:18:57 monitor systemd[1]: systemd-sysctl.service: Failed with result 'exit-code'.
Jun 07 22:18:57 monitor systemd[1]: Failed to start Apply Kernel Variables.

-> unsure if that was already there before the upgrade, not investigating for now

Actions #8

Updated by crameleon over 1 year ago

provo-gate.i.o.o:

-> nothing to report

provo-mirror.i.o.o:

-> first update attempt failed due to no free disk space, same issue as reported in https://progress.opensuse.org/issues/129814, deleted archives from /var/log to proceed

-> second attempt failed, I proceeded with a manual upgrade and opted for the following vendor changes as a resolution:

  debuginfod-client   obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
  debuginfod-profile  obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
  elfutils            obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
  elfutils-lang       obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
  libdebuginfod1      obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
  libdw1              obs://build.opensuse.org/openSUSE -> obs://build.opensuse.org/Base:System
  rsync               obs://build.suse.de/home:david.anes:1204538 -> SUSE LLC <https://www.suse.com/>

-> do we need those devel and home repositories there?

provo-proxy1.i.o.o:

-> nginx module issue:

provo-proxy1:~ # systemctl status nginx
× nginx.service - The nginx HTTP and reverse proxy server
     Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Wed 2023-06-07 22:32:39 UTC; 1min 32s ago
    Process: 1662 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=1/FAILURE)

Jun 07 22:32:39 provo-proxy1 systemd[1]: Starting The nginx HTTP and reverse proxy server...
Jun 07 22:32:39 provo-proxy1 nginx[1662]: nginx: [emerg] dlopen() "/usr//lib64/nginx/modules/ngx_http_headers_more_filter_module.so" failed (/usr//lib64/nginx/modules/ngx_http_headers_more_filte>
Jun 07 22:32:39 provo-proxy1 nginx[1662]: nginx: configuration file /etc/nginx/nginx.conf test failed
Jun 07 22:32:39 provo-proxy1 systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 07 22:32:39 provo-proxy1 systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 07 22:32:39 provo-proxy1 systemd[1]: Failed to start The nginx HTTP and reverse proxy server.
provo-proxy1:~ # nginx -t
nginx: [emerg] dlopen() "/usr//lib64/nginx/modules/ngx_http_headers_more_filter_module.so" failed (/usr//lib64/nginx/modules/ngx_http_headers_more_filter_module.so: cannot open shared object file: No such file or directory) in /etc/nginx/nginx.conf:4
nginx: configuration file /etc/nginx/nginx.conf test failed

-> commented out from /etc/nginx/nginx.conf as the name does not sound like a crucial module for production:

4: # load_module lib64/nginx/modules/ngx_http_headers_more_filter_module.so;
42: # more_clear_headers 'Server';
Actions #9

Updated by crameleon over 1 year ago

ci-container.i.o.o:

-> nothing to report

Actions #10

Updated by crameleon over 1 year ago

jenkins.i.o.o:

-> accepted the following vendor change to proceed:

gettext-runtime  obs://build.opensuse.org/devel:tools -> SUSE LLC <https://www.suse.com/>

ipx-narwal1.i.o.o:

-> nothing to report

login3.i.o.o:

-> nothing to report

Actions #11

Updated by crameleon over 1 year ago

kubic.i.o.o:

-> nothing to report

scar.i.o.o:

-> nothing to report

Actions #12

Updated by crameleon over 1 year ago

Some machines have repositories installed which are not yet available for 15.5. I reached out to the respective OBS project maintainers for this.

Actions #13

Updated by pjessen over 1 year ago

crameleon wrote:

  • mx-test.i.o.o: -> nothing to report

Very good, a highly important machine 😀

  • provo-mirror.i.o.o:

-> second attempt failed, I proceeded with a manual upgrade and opted for the following vendor changes as a resolution:

debuginfod-client obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
debuginfod-profile obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
elfutils obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
elfutils-lang obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
libdebuginfod1 obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
libdw1 obs://build.opensuse.org/openSUSE -> obs://build.opensuse.org/Base:System
rsync obs://build.suse.de/home:david.anes:1204538 -> SUSE LLC https://www.suse.com/

-> do we need those devel and home repositories there?

I think we ought to consider moving this debuginfod stuff to a local machine, I'm not really sure why it is running in provo.

→ Good idea with running this diary.

Actions #14

Updated by crameleon over 1 year ago

  • Checklist item Repair Cachet package / PHP dependencies added
  • Checklist item Repair Matomo package / PHP dependencies added
  • Checklist item Clean up repositories on provo-mirror added

As suggested, I am adding tasks for follow up to-do's.

Actions #15

Updated by crameleon over 1 year ago

  • Checklist item Repair MariaDB on status2 added
Actions #16

Updated by crameleon over 1 year ago

water{,3,4}.i.o.o:

-> nothing to report

pagure01.i.o.o:

-> same issue with redis@default

Actions #17

Updated by crameleon over 1 year ago

I think we ought to consider moving this debuginfod stuff to a local machine, I'm not really sure why it is running in provo.

Aren't the listed ones just client packages?

Actions #18

Updated by crameleon over 1 year ago

community2.i.o.o:

-> nothing to report

svn.i.o.o:

-> nothing to report after getting 15.5 added to Apache:Modules

Actions #19

Updated by crameleon over 1 year ago

mx4.i.o.o:

-> nrpe did not start:

× nrpe.service - Nagios Remote Plugin Executor
     Loaded: loaded (/usr/lib/systemd/system/nrpe.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Thu 2023-06-08 10:11:45 UTC; 29s ago
       Docs: http://www.nagios.org/documentation
    Process: 1404 ExecStart=/usr/sbin/nrpe -c /etc/nrpe.cfg -f (code=exited, status=1/FAILURE)
    Process: 2319 ExecStopPost=/bin/rm -f /run/nrpe/nrpe.pid (code=exited, status=0/SUCCESS)
   Main PID: 1404 (code=exited, status=1/FAILURE)

Jun 08 10:11:43 mx4 systemd[1]: Started Nagios Remote Plugin Executor.
Jun 08 10:11:45 mx4 nrpe[1404]: Starting up daemon
Jun 08 10:11:45 mx4 nrpe[1404]: Bind to port 5666 on 0.0.0.0 failed: Address already in use.
Jun 08 10:11:45 mx4 nrpe[1404]: Bind to port 5666 on :: failed: Address already in use.
Jun 08 10:11:45 mx4 nrpe[1404]: Cannot bind to any address.
Jun 08 10:11:45 mx4 systemd[1]: nrpe.service: Main process exited, code=exited, status=1/FAILURE
Jun 08 10:11:45 mx4 systemd[1]: nrpe.service: Failed with result 'exit-code'.

-> solved by disabling xinetd which did not seem to be used for anything else:

sh-4.4# systemctl disable --now xinetd
Removed /etc/systemd/system/multi-user.target.wants/xinetd.service.
sh-4.4# systemctl restart nrpe

nala2.i.o.o:

-> nothing to report

Actions #20

Updated by crameleon over 1 year ago

slimhat.i.o.o:

-> some cleanup of useless packages (unused firmware, fonts, SAMBA servers)
-> nothing to report

Actions #21

Updated by pjessen over 1 year ago

crameleon wrote:

I think we ought to consider moving this debuginfod stuff to a local machine, I'm not really sure why it is running in provo.

Aren't the listed ones just client packages?

afaik, provo is running the service:

host debuginfod.opensuse.org
debuginfod.opensuse.org is an alias for proxy-prv.opensuse.org.
proxy-prv.opensuse.org has address 91.193.113.65
proxy-prv.opensuse.org has IPv6 address 2a07:de40:401::65

Actions #22

Updated by pjessen over 1 year ago

crameleon wrote:

mx4.i.o.o:

-> nrpe did not start:

This ought to have been a problem on mx3 too. mx[12] have no nrpe running - not sure why mx[34] would have.
On mx12, xinetd is used for starting nrpe, check_mk (No idea what that is) and csync2 (no idea what that is).

Actions #23

Updated by crameleon over 1 year ago

jekyll.i.o.o:

-> nothing to report after getting 15.5 added to o:i:jekyll

Actions #24

Updated by crameleon over 1 year ago

This ought to have been a problem on mx3 too. mx[12] have no nrpe running - not sure why mx[34] would have.

It's needed for our monitoring (Icinga/Nagios).

mx12, xinetd is used for starting nrpe, check_mk (No idea what that is) and csync2 (no idea what that is).

check_mk is another monitoring thing, not sure how it works together with nrpe though, and csync2 is used to synchronize files between cluster nodes.

Personally I think xinetd should be dropped in favor of systemd sockets, but we use it on many systems and I found some of the systemd socket units bundled with our packages to be faulty, hence that project stalled.

Actions #25

Updated by pjessen over 1 year ago

crameleon wrote:

This ought to have been a problem on mx3 too. mx[12] have no nrpe running - not sure why mx[34] would have.

It's needed for our monitoring (Icinga/Nagios).

Sure - I guess mx34 were created from the template, but have not yet had a highstate applied. (I know I haven't done it).

mx12, xinetd is used for starting nrpe, check_mk (No idea what that is) and csync2 (no idea what that is).

check_mk is another monitoring thing, not sure how it works together with nrpe though, and csync2 is used to synchronize files between cluster nodes.

It is either something default, or maybe something Lars installed before I took over mx12.

Personally I think xinetd should be dropped in favor of systemd sockets, but we use it on many systems and I found some of the systemd socket units bundled with our packages to be faulty, hence that project stalled.

Either one is good for me, but it would be semi-nice to standardize on one.

Actions #26

Updated by crameleon over 1 year ago

discourse01.i.o.o:

-> stopped the upgrade, it wants to remove two Discourse related packages which seem important?

The following 9 packages are going to be REMOVED:
  ... discourse-plugin-openid-connect discourse-plugin-rss-polling ...

Edit: eventually upgraded to 15.5, and due to the packages we require not being built for 15.5 further upgraded to Tumblweed.

Actions #27

Updated by crameleon over 1 year ago

For daffy1, daffy2 and ci-opensuse I filed https://jira.suse.com/browse/ENGINFRA-2334.

Actions #28

Updated by luc14n0 over 1 year ago

crameleon wrote:

provo-mirror.i.o.o:

...

-> second attempt failed, I proceeded with a manual upgrade and opted for the following vendor changes as a resolution:

  debuginfod-client   obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
  debuginfod-profile  obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
  elfutils            obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
  elfutils-lang       obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
  libdebuginfod1      obs://build.opensuse.org/home:marxin -> obs://build.opensuse.org/Base:System
  libdw1              obs://build.opensuse.org/openSUSE -> obs://build.opensuse.org/Base:System
  rsync               obs://build.suse.de/home:david.anes:1204538 -> SUSE LLC <https://www.suse.com/>

-> do we need those devel and home repositories there?

I do know that marxin (Martin Liška) was on notice and haven't seen them on #opensuse-factory for a while now. I can't say whether they already left SUSE, but seems like so. They were one of their GCC maintainers. So, this vendor change seems reasonable to me.

Actions #29

Updated by pjessen over 1 year ago

luc14n0 wrote:

I do know that marxin (Martin Liška) was on notice and haven't seen them on #opensuse-factory for a while now. I can't say whether they already left SUSE, but seems like so. They were one of their GCC maintainers. So, this vendor change seems reasonable to me.

I exchanged emails with Martin only about a month ago, about debuginfod in provo.

Actions #30

Updated by livdywan over 1 year ago

  • Related to coordination #130582: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.5 added
Actions #31

Updated by crameleon over 1 year ago

mx{1,2,3}.i.o.o:

-> Accepted the following vendor change on mx1 and 2 to proceed:

postsrsd  obs://build.opensuse.org/server:mail -> openSUSE

-> same issue with redis@rspamd

galera{1,2,3}.i.o.o:

-> set DISABLE_RESTART_ON_UPDATE in /etc/sysconfig/services to "yes" on all nodes as a safety measure, then upgraded and rebooted one by one
-> galera1 and 2 joined and synced immediately after their restart, on galera3 the MySQL migration timed out, solved by modifying /usr/lib/mysql/mysql-systemd-helper to wait 960 instead of 60 seconds in mysql_wait

Actions #32

Updated by crameleon over 1 year ago

mirrordb{1,2}.i.o.o:

-> Aborted upgrade, who knows about this ip4r package? I'd prefer to keep the distribution postgresql-server package over the suggested one from server:database. Maybe @anikitin ?

Problem: the to be installed postgresql13-ip4r-2.4.1+git1.5f9ce88-lp155.12.1.x86_64 requires 'postgresql13-server = 13.11', but this requirement cannot be provided
Problem: the installed postgresql13-server-13.11-150200.5.40.1.x86_64 requires 'postgresql = 13.11', but this requirement cannot be provided

Problem: the to be installed postgresql13-ip4r-2.4.1+git1.5f9ce88-lp155.12.1.x86_64 requires 'postgresql13-server = 13.11', but this requirement cannot be provided
  deleted providers: postgresql13-server-13.11-150200.5.40.1.x86_64
not installable providers: postgresql13-server-13.11-lp155.74.1.x86_64[server_database_postgresql]
 Solution 1: deinstallation of postgresql13-ip4r-2.4.1+git1.5f9ce88-lp154.12.1.x86_64
 Solution 2: install postgresql13-server-13.11-lp155.74.1.x86_64 from vendor obs://build.opensuse.org/server:database
  replacing postgresql13-server-13.11-150200.5.40.1.x86_64 from vendor SUSE LLC <https://www.suse.com/>
 Solution 3: keep obsolete postgresql13-server-13.11-150200.5.40.1.x86_64
 Solution 4: break postgresql13-ip4r-2.4.1+git1.5f9ce88-lp155.12.1.x86_64 by ignoring some of its dependencies

Choose from above solutions by number or skip, retry or cancel [1/2/3/4/s/r/c/d/?] (c): c

Edit: now successfully upgraded in a second attempt - https://progress.opensuse.org/issues/130519?issue_count=250&issue_position=14&next_issue_id=130516&prev_issue_id=130561#note-48

Actions #33

Updated by crameleon over 1 year ago

ipx-galera{1,2,3}.i.o.o:

-> Aborted upgrade:

Problem: nothing provides 'python-boto' needed by the to be installed percona-nagios-plugins-1.1.8-lp155.3.7.noarch
Problem: the to be installed mariadb-galera-10.11.3-lp155.2.1.x86_64 requires 'mariadb = 10.11.3', but this requirement cannot be provided
Problem: the installed mariadb-10.11.2-lp154.3.2.x86_64 requires 'mariadb-errormessages = 10.11.2', but this requirement cannot be provided

Problem: nothing provides 'python-boto' needed by the to be installed percona-nagios-plugins-1.1.8-lp155.3.7.noarch
 Solution 1: deinstallation of percona-nagios-plugins-1.1.8-lp154.3.37.noarch
 Solution 2: keep obsolete percona-nagios-plugins-1.1.8-lp154.3.37.noarch
 Solution 3: break percona-nagios-plugins-1.1.8-lp155.3.7.noarch by ignoring some of its dependencies

Choose from above solutions by number or skip, retry or cancel [1/2/3/s/r/c/d/?] (c): c

-> Submitted https://build.opensuse.org/request/show/1091843, however this should be considered a workaround, as the upstream project is EOL and the repository at https://github.com/percona/percona-monitoring-plugins archived. Best would be to switch to some other MySQL monitoring integration for Nagios.
-> The mariadb-errormessages problem should be solvable by a vendor change of the various mariadb related packages to SUSE LLC.

Actions #34

Updated by crameleon over 1 year ago

  • Checklist item Repair Percona Monitoring package added
Actions #35

Updated by crameleon over 1 year ago

  • Checklist item Repair MariaDB on status2 set to Done
Actions #36

Updated by pjessen over 1 year ago

crameleon wrote:

  • mx{1,2}.i.o.o:

-> Accepted the following vendor change to proceed:
postsrsd obs://build.opensuse.org/server:mail -> openSUSE

No prob.

-> same issue with redis@rspamd

I am 99% certain that is not being used. A left over from the first install.

Actions #37

Updated by pjessen over 1 year ago

crameleon wrote:

mirrordb{1,2}.i.o.o:

-> Aborted upgrade, who knows about this ip4r package? I'd prefer to keep the distribution postgresql-server package over the suggested one from server:database.

My 2 cents: ip4r is some postgres extension, for storing ip addresses in tables. It was used by mirrorbrain, for the pfx2asn table for instance. What seems to be the difference between the distribution postgresql and the one from server:database ?

Actions #38

Updated by crameleon over 1 year ago

difference between the distribution postgresql and the one from server:database

The distribution package is officially supported, whilst server:database is a development project. However, PostgreSQL 13 is dropped from 15.5, upgrading to PostgreSQL 15 would be a good idea, which would need support from this ip4r extension.

Actions #39

Updated by pjessen over 1 year ago

crameleon wrote:

difference between the distribution postgresql and the one from server:database

The distribution package is officially supported, whilst server:database is a development project.

That I did I know :-)

However, PostgreSQL 13 is dropped from 15.5, upgrading to PostgreSQL 15 would be a good idea, which would need support from this ip4r extension.

I guess that extension doesn't come in a version for postgresql 15 ?

Actions #40

Updated by crameleon over 1 year ago

I guess that extension doesn't come in a version for postgresql 15 ?

It says the upcoming release will already support PostgreSQL 16 (https://github.com/RhodiumToad/ip4r), and it seems people are using the current version built against PostgreSQL 15 (https://build.opensuse.org/package/show/server:database:postgresql/postgresql-ip4r), hence I would assume an upgrade should "just work".

Actions #41

Updated by crameleon over 1 year ago

One interesting feature of my OS upgrade marathon is that all upgraded machines report

Updates CRITICAL : discontinued OS Release openSUSE 15.5; no updates available

in our Icinga. However the source code of the installed check plugin matches the one in Lars' repsository, which lists 15.5 as supported:

https://github.com/lrupp/monitoring-plugins-zypper/blob/8796e3efde7068c344a2e42b2339b38372b965db/check_zypper.pl#L98

Edit: some debugging revealed the code is not the same on all machines; it seems there are different versions installed, and machines with a too old version do not have the 15.5 support in their monitoring plugin yet. I'll try to update the plugin on all machines.

Edit 2: Let's try to get monitoring-plugins-zypper updated in stock 15.5 first: https://bugzilla.opensuse.org/show_bug.cgi?id=1212196

Actions #42

Updated by crameleon over 1 year ago

  • Checklist item Repair Icinga update reporting for 15.5 added
Actions #43

Updated by pjessen over 1 year ago

@crameleon : Wrt ip4r for postgresql, I simply don't know if it is being used anymore. I don't know how much is mirrorbrain and how much is mirrorcache.
mirrorbrain had an option for directing clients to the nearest mirror by network proximity, and used the pfx2asn table to look up ASNs.
AFAIK, this is not supported by mirrorcache 🙁. The pfx2asn table is populated by a daily cronjob. If e.g. "mb new" is still used to create new mirrors, that table is also still used. Maybe also cnf. #61789

Actions #44

Updated by crameleon over 1 year ago

  • Checklist item Repair Matomo package / PHP dependencies set to Done
Actions #45

Updated by crameleon over 1 year ago

osc-collab.i.o.o:

-> nothing to report

Actions #46

Updated by crameleon over 1 year ago

mirrorcache-us-db.i.o.o:

-> Nothing to report.

mirrorcache-us.i.o.o:

-> Accepted the following vendor changes to proceed:

  perl-SQL-Abstract-Classic  obs://build.opensuse.org/home:andriinikitin -> openSUSE
  perl-XString               obs://build.opensuse.org/devel:languages:perl -> openSUSE

-> Accepted overwrites of various files from a conflicting and orphaned Perl Date (DateTime?) package.

Actions #47

Updated by crameleon over 1 year ago

olaf.i.o.o:

-> Nothing to report.

Actions #48

Updated by crameleon over 1 year ago

mirrordb{1,2}.i.o.o (take 2):

-> Removed old MirrorBrain databases
-> Upgraded PostgreSQL to version 15
-> Removed Postgres ip4r extension
-> The repmgr package from server:database seems to now require the PostgreSQL 15 version shipped in server:database - I preferred to keep the distribution PostgreSQL server and accepted various "keep obsolete"'s for Postgres and its dependencies. It seems to work fine, for future updates we might want to lower the version requirement in the package.

Actions #49

Updated by crameleon over 1 year ago

ipx-galera{1,2,3}.i.o.o:

-> Accepted vendor change:

  python3-mysqlclient  obs://build.opensuse.org/server:database -> SUSE LLC <https://www.suse.com/>

-> Minor issue with ipx-galera3 (had to increase the timeout in /usr/lib/mysql/mysql-systemd-helper)

Actions #50

Updated by crameleon over 1 year ago

mirrorcache{,2}.i.o.o:

-> Accepted the following vendor changes:

  perl-SQL-Abstract-Classic  obs://build.opensuse.org/home:andriinikitin -> openSUSE
  perl-XString               obs://build.opensuse.org/devel:languages:perl -> openSUSE

-> Accepted overwrites of various files from a conflicting and orphaned Perl DateTime package.

Actions #51

Updated by crameleon over 1 year ago

nala.i.o.o:

-> During the installation of kernel packages, some warnings were produced:

depmod: WARNING: /lib/modules/5.14.21-150400.24.69-default/kernel/drivers/xxx/pci/xxx/xxx.ko.zst needs unknown symbol xxx

nala (mirrordb):~ # grep depmod /var/log/zypp/history|grep WARNING|wc -l
26605

Did not investigate further, upgrade seems to have succeeded.

Actions #52

Updated by crameleon over 1 year ago

riesling3.i.o.o:

-> Nothing to report.

Actions #53

Updated by crameleon over 1 year ago

mirrorcache-stats.i.o.o:

-> Nothing to report.

Actions #54

Updated by crameleon over 1 year ago

mirrorcache-backstage.i.o.o:

-> Nothing to report.

Actions #55

Updated by crameleon about 1 year ago

I opened tickets with the respective maintainers for

nuka.i.o.o: https://progress.opensuse.org/issues/139139
obsreview.i.o.o: https://progress.opensuse.org/issues/151121

For the machines without specific maintainers, I manually identified

ipx-proxy1.i.o.o
stonehat.i.o.o

to be missing. These were not enrolled in Salt and hence did not appear in the scan.

Actions #56

Updated by crameleon about 1 year ago

ipx-proxy1.i.o.o:

-> Nothing to report.

stonehat.i.o.o:

Not sure where to start, machine had an uptime of 473 days, graceful shutdown for VMs on host reboots was not configured, lots of useless packages were installed. After purging about 600 packages (audio utilities, X11 and dependencies, fonts, languages, firmware for imaginary hardware) I upgraded the OS. After the reboot multiple services didn't come back up:

stonehat:~ # systemctl --failed
  UNIT                       LOAD   ACTIVE SUB    DESCRIPTION
● dnsmasq.service            loaded failed failed DNS caching server.
● getty@ttyS1.service        loaded failed failed Getty on ttyS1
● libvirt-guests.service     loaded failed failed Suspend/Resume Running libvirt Guests
● nftables-early.service     loaded failed failed nftables early rules
● serial-getty@ttyS1.service loaded failed failed Serial Getty on ttyS1
● wg-quick@wg0.service       loaded failed failed WireGuard via wg-quick(8) for wg0

-> dnsmasq: complained about the non-existing wg_siterouting interface; I added a systemd override with a "Wants" statement on the wg-quick@wg_siterouting service, however dnsmasq still did not come back up, complaining the address was already in use - indeed, a BIND server runs on the same listener -> disabled
-> {serial-,}getty@ttyS1 -> devices do not seem to exist (SOL console still worked without them) -> disabled
-> libvirt-guests -> (configured before my reboot), failed to bring ipx-status1 back up:

error: Cannot access storage file '/dev/bcraid6/ipx-status1': No such file or directory

not sure what the issue is with this LVM setup:

stonehat:~ # virsh start ipx-status1
error: Failed to start domain 'ipx-status1'
error: Cannot access storage file '/dev/bcraid6/ipx-status1': No such file or directory
stonehat:~ # virsh dumpxml ipx-status1|grep bcraid
      <source dev='/dev/bcraid6/ipx-status1-root'/>
stonehat:~ # ls -l /dev/bcraid6/ipx-status1-root
lrwxrwxrwx 1 root root 8 Nov 20 12:07 /dev/bcraid6/ipx-status1-root -> ../dm-23

-> complained in IRC

-> nftables-early: complains about "private" interface not yet existing -> changed "iif" to "iifname" in /etc/nftables.conf; BUT before the reboot firewalld was managing the nftables rules on this machine - indeed, firewalld tried to load as well, but failed -> disabled/stopped nftables{-early}, restarted firewalld

-> wg-quick@wg0: complains about

Nov 20 12:07:35 stonehat wg-quick[8788]: Name or service not known: `downloadcontent.opensuse.org.:51123'

presumable there were no DNS servers reachable at that point during boot -> manually restarted

Speaking of DNS servers, how many named listeners does a hypervisor need?

stonehat:~ # ss -tulpn|grep :53|wc -l
512
Actions #57

Updated by pjessen about 1 year ago

crameleon wrote in #note-56:

stonehat.i.o.o:

Speaking of DNS servers, how many named listeners does a hypervisor need?

Lots.

stonehat:~ # ss -tulpn|grep :53|wc -l
512

On my main nameserver, I have 101. I see typically 4 per address, including link-local and old deprecated ipv6 addresses. Depending on the network setup for the virtual hosts, that could add a lot of interfaces.
If you specify which addresses to listen on, the number can be reduced to just those.

Actions #58

Updated by pjessen about 1 year ago · Edited

pjessen wrote in #note-57:

On my main nameserver, I have 101. I see typically 4 per address, including link-local and old deprecated ipv6 addresses. Depending on the network setup for the virtual hosts, that could add a lot of interfaces.

For instance, I have a small xen server with 18 hosts, total of 28 virtual interfaces. That would mean minimum 4x28 = 112 listeners just for the LL addresses :-)

Actions #59

Updated by crameleon about 1 year ago · Edited

That's ridiculous. Our new PowerDNS servers open one listener per address they are configured to listen on. Also, a hypervisor should not role play a DNS server. A hypervisor should do one job, hosting VMs.

Actions #60

Updated by pjessen about 1 year ago

crameleon wrote in #note-59:

That's ridiculous. Our new PowerDNS servers open one listener per address they are configured to listen on.

So will bind - as I said "If you specify which addresses to listen on, the number can be reduced to just those.".
The issue is no doubt that named is configured like this:

listen-on port 53 { any; };
listen-on-v6 { any; };

I don't know why bind seems to open four UDP listeners per address. Possibly four threads?

Actions #62

Updated by crameleon about 1 year ago

  • Checklist item Repair Icinga update reporting for 15.5 set to Done

Update for monitoring-plugin-zypper backported.

Actions #63

Updated by crameleon about 1 year ago

  • Checklist item Repair Cachet package / PHP dependencies set to Done

Status machines sorted by @lrupp - thank you.

Actions #64

Updated by crameleon about 1 year ago

  • Checklist item Clean up repositories on provo-mirror set to Done

Devel repositories including packages installed from them removed. System relevant packages vendor changed to OSS.

Actions #65

Updated by crameleon about 1 year ago

The issue is no doubt that named is configured like this:

I see, so one could reconfigure Bind. Let's hope we can ditch it before it comes necessary.

Actions #66

Updated by crameleon about 1 year ago

  • Status changed from In Progress to Resolved
  • Assignee set to crameleon
  • % Done changed from 0 to 90

Only blocker now is nuka.i.o.o where we wait for input in https://progress.opensuse.org/issues/139139. Let's conclude this ticket here already anyways.
Maybe someone knows how to add this "related issue" thing to make the other issue related to this one.

Actions

Also available in: Atom PDF