Project

General

Profile

Actions

action #43976

closed

bring o3 to same OS (or salt) version as workers, e.g. openSUSE Leap 15.0

Added by okurz about 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
2018-11-15
Due date:
2019-07-05
% Done:

0%

Estimated time:

Description


Related issues 2 (0 open2 closed)

Copied from openQA Infrastructure (public) - action #43937: align o3 workers (done: "imagetester" and) "power8" with the others which are currently "transactional-update" hostsResolvedokurz2018-11-15

Actions
Copied to openQA Infrastructure (public) - action #54137: Upgrade osd to a supported Leap version (from 42.3)Resolvedokurz2019-07-11

Actions
Actions #1

Updated by okurz about 6 years ago

  • Copied from action #43937: align o3 workers (done: "imagetester" and) "power8" with the others which are currently "transactional-update" hosts added
Actions #2

Updated by nicksinger about 6 years ago

  • Status changed from New to Workable
Actions #3

Updated by okurz over 5 years ago

  • Priority changed from Normal to Urgent

with Leap 42.3 going EOL like now this is becoming urgent.

Talked with kbabioch, achernikov, tbro, and more: openqa.o.o apparently runs as a VM on the same virtualization host/cluster that also runs many other crucial services, e.g. imap.suse.de and therefore I can not get access. I fear we need to ask someone from Engineering Infrastructure to do any upgrade themselves or we really convince them to stay ready during normal work hours to do any kind of emergency handling in case an upgrade goes wrong.

Actions #4

Updated by okurz over 5 years ago

https://infra.nue.suse.com/SelfService/Display.html?id=141458&results=cb59420ced341dafe1db38d6a434bb75 created to get in contact with anyone that can support e.g. over the reboot of the machine.

Actions #5

Updated by okurz over 5 years ago

  • Due date set to 2019-07-05
  • Status changed from Workable to Feedback
  • Assignee set to okurz
Actions #6

Updated by okurz over 5 years ago

No answer from mcaj yet regarding any VM snapshot or such :(

Experimenting with an upgrade in a leap 42.3 container:

Copied over /etc/ from o3 with `rsync -aHP o3:/etc/ ~/local/tmp/o3_etc/

docker run -it --rm -v /home/okurz/local/tmp/o3_etc:/opt/o3_etc:ro registry.opensuse.org/opensuse/leap:42.3

In there:

zypper -n in postgresql96-server
LEAP_VERSION=42.3
zypper ar -f -p 95 obs://devel:openQA/openSUSE_Leap_$LEAP_VERSION devel-openQA
zypper ar -f -p 90 obs://devel:openQA:Leap:$LEAP_VERSION/openSUSE_Leap_$LEAP_VERSION devel-openQA-perl-modules
zypper --gpg-auto-import-keys ref && zypper -n in openQA-local-db
DIR=/tmp/pg
mkdir -p $DIR
sudo chown geekotest $DIR
sudo -u geekotest initdb --auth-local=peer -N $DIR -U geekotest
echo "listen_addresses=''" >> $DIR/postgresql.conf
echo "unix_socket_directories='$DIR'" >> $DIR/postgresql.conf
echo "fsync=off" >> $DIR/postgresql.conf
echo "full_page_writes=off" >> $DIR/postgresql.conf
sudo -u geekotest pg_ctl -D $DIR -l logfile start
sudo -u geekotest createdb -h $DIR openqa
sudo -u geekotest pg_restore -h $DIR -d openqa /opt/o3_etc/openqa/2019-07-08.dump
cat > /etc/openqa/database.ini <<EOF
[production]
dsn = DBI:Pg:dbname=openqa;host=/tmp/pg
EOF

sudo -u geekotest /usr/share/openqa/script/openqa daemon -m production
zypper -n in curl
curl http://localhost:9526
curl http://localhost:9526/tests
curl http://localhost:9526/tests/latest
curl http://localhost:9526/tests/latest > /tmp/ref.html
zypper -n in postgresql10-server postgresql10-contrib
sudo -u geekotest pg_ctl -D $DIR -l logfile stop
mv /tmp/pg /tmp/pg96
mkdir $DIR
chown geekotest $DIR
sudo -u geekotest initdb --auth-local=peer -N $DIR -U geekotest
sudo -u geekotest pg_upgrade --old-bindir=/usr/lib/postgresql96/bin --new-bindir=/usr/lib/postgresql10/bin --old-datadir=/tmp/pg96/ --new-datadir=/tmp/pg/
sudo -u geekotest pg_ctl -D $DIR -l logfile start
sudo -u geekotest psql -h $DIR -d openqa
sudo -u geekotest "/usr/lib/postgresql10/bin/vacuumdb" --all --analyze-in-stages -h $DIR
sudo -u geekotest ./delete_old_cluster.sh.
sudo -u geekotest /usr/share/openqa/script/openqa daemon -m production
bg
curl http://localhost:9526/tests/latest > /tmp/new.html
diff ref.html new.html

however one has to be careful about roles and encoding when creating the database. This is what I did for a migration on my notebook:

export LANG=en_IE.utf8
export LC_ALL=en_IE.utf8
sudo -u postgres initdb --auth-local=peer /var/lib/pgsql/data
cd /tmp
sudo -u postgres pg_upgrade --old-bindir=/usr/lib/postgresql96/bin --new-bindir=/usr/lib/postgresql10/bin --old-datadir=/var/lib/pgsql/data.96/ --new-datadir=/var/lib/pgsql/data/
sudo -u postgres ./analyze_new_cluster.sh 
systemctl start postgresql
systemctl status postgresql
Actions #7

Updated by okurz over 5 years ago

  • Status changed from Feedback to In Progress

still no response from mcaj, went ahead and started upgrade of o3.

Actions #8

Updated by okurz over 5 years ago

  • Status changed from In Progress to Feedback

What I did as preparations:

  • Check all config files already needing an update:
for i in $(find /etc/ -name '*.rpm*') ; do vimdiff ${i%.rpm*} $i; done
find /etc/ -name '*.rpm*' | grep -v 'rpm-utils' | xargs rm
  • Update the repos:
cd /etc/zypp/repos.d/
sed -i -e 's/42\.3/$releasever/g' *
zypper mr -p 90 -r 3
mv devel_openQA_Leap{_42.3,}.repo 
zypper --releasever=15.1 ref
sudo -u geekotest /opt/openqa-scripts/dump-psql && zypper -n --releasever=15.1 dup --auto-agree-with-licenses --replacefiles --download-in-advance
rpmconfigcheck 
for i in $(cat /var/adm/rpmconfigcheck) ; do vimdiff ${i%.rpm*} $i ; done
for i in $(cat /var/adm/rpmconfigcheck) ; do rm $i ; done
vim /etc/default/grub
# delete the "SLES 12" distributor
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot

at this point the server came up fine and showed no problem by itself, e.g. also systemctl --failed showed nothing wrong however two openQA jobs started showed that they can not reach the rsync server. I guess deprecation of xinetd could be involved? Wasn't there something like this? Anyway, fixed it with:

systemctl enable --now rsyncd

as the same service is also running on osd. Tested with:

sudo systemctl status rsyncd
rsync rsync://localhost

Tests retriggered and seem to run fine. No high load on o3 for now. Should monitor
https://openqa.opensuse.org/tests/?&resultfilter=Failed&resultfilter=Incomplete&resultfilter=timeout_exceeded
for the next hours.

Actions #9

Updated by okurz over 5 years ago

  • Status changed from Feedback to Resolved

Dimstar mentioned that he installed "python2-cmdline" to fix the factory news submission. Seems that was previously included in the python base package maybe. No further negative feedback received, considered done here.

Actions #10

Updated by okurz over 5 years ago

  • Copied to action #54137: Upgrade osd to a supported Leap version (from 42.3) added
Actions #11

Updated by okurz over 5 years ago

Also upgraded postgres now with

DIR=/var/lib/pgsql/data.10
sudo -u postgres mkdir $DIR
sudo -u postgres initdb --auth-local=peer -N $DIR -U postgres
sudo -u geekotest /opt/openqa-scripts/dump-psql
systemctl stop openqa-webui openqa-websockets openqa-scheduler openqa-gru postgresql
sudo -u postgres pg_upgrade --old-bindir=/usr/lib/postgresql96/bin --new-bindir=/usr/lib/postgresql10/bin --old-datadir=/var/lib/pgsql/data/ --new-datadir=$DIR --link --check
sudo -u postgres pg_upgrade --old-bindir=/usr/lib/postgresql96/bin --new-bindir=/usr/lib/postgresql10/bin --old-datadir=/var/lib/pgsql/data/ --new-datadir=$DIR --link
mv data data.96
ln -s data.10 data
systemctl start postgresql
sudo -u postgres ./analyze_new_cluster.sh 
sudo rm delete_old_cluster.sh analyze_new_cluster.sh
systemctl start openqa-webui openqa-websockets openqa-scheduler openqa-gru
zypper rm -U postgresql96 postgresql96-contrib postgresql96-server
Actions

Also available in: Atom PDF