https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842021-09-24T11:11:17ZopenSUSE Project Management ToolopenQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4491032021-09-24T11:11:17Zokurzokurz@suse.com
<ul><li><strong>Copied from</strong> <i><a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" href="/issues/73189">action #73189</a>: Upgrade o3 workers to openSUSE Leap 15.2 after openqa-aarch64 already done</i> added</li></ul> openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4491962021-09-24T11:30:16Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>Upgrade o3 workers to openSUSE Leap 15.2 after openqa-aarch64 already done</i> to <i>Upgrade o3 workers to openSUSE Leap 15.3</i></li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/449196/diff?detail_id=425934">diff</a>)</li><li><strong>Assignee</strong> deleted (<del><i>livdywan</i></del>)</li><li><strong>Priority</strong> changed from <i>High</i> to <i>Normal</i></li></ul> openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4511522021-09-30T09:51:58Zlivdywanliv.dywan@suse.com
<ul><li><strong>Subject</strong> changed from <i>Upgrade o3 workers to openSUSE Leap 15.3</i> to <i>Upgrade o3 workers to openSUSE Leap 15.3 size:M</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>Workable</i></li></ul> openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4582742021-10-25T08:42:53Zggardet_armguillaume.gardet@arm.com
<ul></ul><p><code>oss-cobbler-03</code> is a new remote aarch64 machines for aarch64 (and aarch32) based on <strong>Leap 15.3</strong> already. <br>
This is a Ampere eMag machine with 32 cores, 125G of RAM and 480G of SSD storage. Currently, it runs 7 workers with no problem.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4583042021-10-25T09:10:01Zokurzokurz@suse.com
<ul></ul><p>Cool :)</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4584662021-10-25T12:26:34Zmkittlermarius.kittler@suse.com
<ul><li><strong>Assignee</strong> set to <i>mkittler</i></li></ul> openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4585472021-10-25T14:59:36Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Workable</i> to <i>Feedback</i></li></ul><p>Not sure how to upgrade the transactional workers. I cannot directly follow commands on <a href="https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Distribution-upgrades" class="external">https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Distribution-upgrades</a>. If I try to run them in <code>transactional-update shell</code> the commands don't work because no root certificates exist in that environment. Just enter <code>openqaworker1:~ # transactional-update shell</code> and find that <code>/var/lib/ca-certificates/pem</code> is empty. The problem is also reproducible on <code>openqaworker4</code> and possibly all other workers which are transactional servers. Note that <code>/etc/ssl/certs</code> is a symlink to <code>/var/lib/ca-certificates/pem</code>.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4585682021-10-25T16:07:44Zmkittlermarius.kittler@suse.com
<ul></ul><p>I've been upgrading power8 and everything seemed to work well. However, it didn't came back and I cannot even reach it via <code>ipmitool -I lanplus -C 3 -H openqaworker-power8-ipmi.suse.de -U ADMIN -P ADMIN sol activate</code> now.</p>
<p>I had to create an Infra ticket regarding power8: <a href="https://sd.suse.com/servicedesk/customer/portal/1/SD-64563" class="external">https://sd.suse.com/servicedesk/customer/portal/1/SD-64563</a></p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4587492021-10-26T11:44:35Zmkittlermarius.kittler@suse.com
<ul></ul><p>@ffogt helped recovering <code>power8</code>:</p>
<blockquote>
<p><a href="https://bugzilla.suse.com/show_bug.cgi?id=1174166" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1174166</a><br>
I used the petitboot environment to chroot into the leap install and replaced kernel-kvmsmall with kernel-default<br>
It's up!<br>
nfs-client was not installed, also probably because of kernel-kvmsmall<br>
I just watched it boot after a power reset, which needs some patience</p>
</blockquote>
<p>Apparently the ipmi power commands and sol session worked at some point after all. It now works for me now as well. I'll respond in the Infra ticket.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4587752021-10-26T13:41:30Zmkittlermarius.kittler@suse.com
<ul></ul><p>I've now been upgrading <code>power8</code> and <code>openqaworker7</code>. Both generally work now. So aarch64 openqaworker1 openqaworker4 imagetester and rebel are still remaining.</p>
<hr>
<p>Note that there's a failing service on <code>openqaworker7</code> but it has been failing in that way at least since <code>Sep 05 03:46:26</code> (which is almost as far as the logs go back):</p>
<pre><code>openqaworker7:~ # systemctl status snapper-cleanup.service
● snapper-cleanup.service - Daily Cleanup of Snapper Snapshots
Loaded: loaded (/usr/lib/systemd/system/snapper-cleanup.service; static)
Active: failed (Result: exit-code) since Tue 2021-10-26 15:34:21 CEST; 2min 30s ago
TriggeredBy: ● snapper-cleanup.timer
Docs: man:snapper(8)
man:snapper-configs(5)
Process: 23457 ExecStart=/usr/lib/snapper/systemd-helper --cleanup (code=exited, status=1/FAILURE)
Main PID: 23457 (code=exited, status=1/FAILURE)
Okt 26 15:34:20 openqaworker7 systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Okt 26 15:34:20 openqaworker7 systemd-helper[23457]: running cleanup for 'root'.
Okt 26 15:34:20 openqaworker7 systemd-helper[23457]: running number cleanup for 'root'.
Okt 26 15:34:20 openqaworker7 systemd-helper[23457]: Deleting snapshot failed.
Okt 26 15:34:20 openqaworker7 systemd-helper[23457]: number cleanup for 'root' failed.
Okt 26 15:34:20 openqaworker7 systemd-helper[23457]: running timeline cleanup for 'root'.
Okt 26 15:34:20 openqaworker7 systemd-helper[23457]: running empty-pre-post cleanup for 'root'.
Okt 26 15:34:21 openqaworker7 systemd[1]: snapper-cleanup.service: Main process exited, code=exited, status=1/FAILURE
Okt 26 15:34:21 openqaworker7 systemd[1]: snapper-cleanup.service: Failed with result 'exit-code'.
</code></pre> openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4590032021-10-27T11:37:01Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>In Progress</i></li></ul><p>As mentioned by <a class="user active user-mention" href="https://progress.opensuse.org/users/32207">@andriinikitin</a> the redirection to https can be avoided via <code>sed -i 's,download.opensuse.org,mirrorcache.opensuse.org,g' /etc/zypp/repos.d/*.repo</code>. That seems to work in the root-certificate-less environment of <code>transactional-update shell</code>. So I'm upgrading the remaining workers now.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4590312021-10-27T12:52:03Zmkittlermarius.kittler@suse.com
<ul></ul><p>The following units failed on openqaworker1 after a reboot under Leap 15.3:</p>
<pre><code>openqaworker1:~ # systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● container-openqaworker1_container_102.service loaded failed failed Podman container-openqaworker1_container_102.service
● container-openqaworker1_container_103.service loaded failed failed Podman container-openqaworker1_container_103.service
</code></pre>
<p>I suspect these are leftovers from experimenting with a containerized setup so I disabled them for now.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4590882021-10-27T14:53:55Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul><p>All workers (openqaworker1 openqaworker4 openqaworker7 power8 rebel imagetester aarch64) are now running on Leap 15.3. Besides the (unimportant) services mentioned in the comments it looks good so far.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4594302021-10-28T12:29:39Zokurzokurz@suse.com
<ul></ul><p>A whole lot of openSUSE Tumbleweed tests have been failing after the upgrade. Dimstar and fvogt narrowed down the problem to a qemu-seabios update. Was reported in <a href="https://bugzilla.suse.com/show_bug.cgi?id=1192115" class="external">https://bugzilla.suse.com/show_bug.cgi?id=1192115</a></p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4594422021-10-28T12:49:37Zfavogtfvogt@suse.com
<ul></ul><p>Until the issue is fixed, the qemu-seabios package should be downgraded on the openQA workers for x86, like this:</p>
<pre><code>zypper in --oldpackage https://download.opensuse.org/repositories/openSUSE:/Leap:/15.2:/Update/standard/noarch/qemu-seabios-1.12.1+-lp152.9.20.1.noarch.rpm
zypper al https://download.opensuse.org/repositories/openSUSE:/Leap:/15.2:/Update/standard/noarch/qemu-seabios-1.12.1+-lp152.9.20.1.noarch.rpm
</code></pre>
<p>(resp. with <code>transactional-update pkg in</code> and <code>zypper al</code> instead)</p>
<p>I did this on ow7, but on the transactional systems I copied the bios files and used bind mounts instead, to avoid reboots which set back the test states.</p>
<p>I also noticed that os-autoinst uses <code>usb-ehci</code> as controller by default while <code>qemu-xhci</code> has various advantages, and opened a PR to switch to xhci: <a href="https://github.com/os-autoinst/os-autoinst/pull/1838" class="external">https://github.com/os-autoinst/os-autoinst/pull/1838</a>. Incidentally, this also appears to work around the bios issue, so the package downgrade could be omitted if qemu-xhci is used.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4595052021-10-28T16:50:16Zdheidlerdheidler@suse.com
<ul></ul><p>See <a href="https://progress.opensuse.org/projects/openqav3/wiki/Wiki#o3-s390-workers" class="external">https://progress.opensuse.org/projects/openqav3/wiki/Wiki#o3-s390-workers</a> and <a href="https://progress.opensuse.org/projects/openqav3/wiki/Wiki#o3-s390-workers" class="external">https://progress.opensuse.org/projects/openqav3/wiki/Wiki#o3-s390-workers</a></p>
<p>actually they were productive.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4606542021-11-02T11:30:20Zmkittlermarius.kittler@suse.com
<ul></ul><p>I'm aware of the issues and also noted it in <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Upgrade arm3 to Leap 15.3 and compare failure rate size:M (Resolved)" href="https://progress.opensuse.org/issues/101265#note-9">#101265#note-9</a>.</p>
<p>Is there anything left to do for o3 workers (since <a class="user active user-mention" href="https://progress.opensuse.org/users/20030">@favogt</a> already took care of them)? And maybe we can merge <a href="https://github.com/os-autoinst/os-autoinst/pull/1838" class="external">https://github.com/os-autoinst/os-autoinst/pull/1838</a> despite missing test coverage?</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4606812021-11-02T12:54:45Zfavogtfvogt@suse.com
<ul></ul><p>Due to <a href="https://bugzilla.opensuse.org/show_bug.cgi?id=1192126" class="external">https://bugzilla.opensuse.org/show_bug.cgi?id=1192126</a>, <code>qemu-ovmf-x86_64</code> had to be downgraded to the Leap 15.2 version as well.</p>
<blockquote>
<p>And maybe we can merge <a href="https://github.com/os-autoinst/os-autoinst/pull/1838" class="external">https://github.com/os-autoinst/os-autoinst/pull/1838</a> despite missing test coverage?</p>
</blockquote>
<p>In theory we could probably revert the downgrade now that it uses XHCI, but staying on the older seabios for a bit longer won't hurt I'd say.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4611012021-11-03T14:47:52Zmkittlermarius.kittler@suse.com
<ul></ul><p>I've just checked two transactional workers and both have 1.14.0_0_g155821a-103.2 installed. Not sure about the bind mount but it looks like the workers are already back to normal anyways.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4611022021-11-03T14:59:26Zfavogtfvogt@suse.com
<ul></ul><p>mkittler wrote:</p>
<blockquote>
<p>I've just checked two transactional workers and both have 1.14.0_0_g155821a-103.2 installed. Not sure about the bind mount but it looks like the workers are already back to normal anyways.</p>
</blockquote>
<p>Looks like there are no zypper locks defined. Either they got deleted somehow or I added them incorrectly without noticing.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4612482021-11-04T10:17:31Zmkittlermarius.kittler@suse.com
<ul></ul><p>Then I'll leave the systems as they are except for the non-transactional worker openqaworker7 where I removed the lock.</p>
<p>I suppose this ticket can be considered resolved - or were there any other problems?</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4617022021-11-08T09:54:01Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>Looks like nothing else came up so I'm closing this issue.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4618522021-11-08T14:19:12Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>Feedback</i></li></ul><p>The PR fixes only the seabios issue so I'm downgrading the transactional workers now to cover the ovmf issue as well:</p>
<pre><code>transactional-shell
zypper in --oldpackage https://download.opensuse.org/update/leap/15.2/oss/noarch/qemu-ovmf-x86_64-201911-lp152.6.17.1.noarch.rpm
zypper al qemu-ovmf
exit
reboot
</code></pre>
<p>So far I've done this only on openqaworker1 which is currently rebooting.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4618762021-11-08T15:10:23Zmkittlermarius.kittler@suse.com
<ul></ul><p>openqaworker1 is up again and the lock and package are in place. I've also checked the other workers but they all had the package and lock still in place. (Except for aarch64 but I suppose only the x86_64 workers are relevant here. And rebel doesn't have the package installed at all.)</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4622032021-11-09T12:53:06Zmkittlermarius.kittler@suse.com
<ul></ul><p>It didn't work because it should have been <code>zypper al qemu-ovmf-x86_64</code>. So I downgraded the package again. Judging by the <a href="https://openqa.opensuse.org/tests/2022946#next_previous" class="external">job's history</a> the <code>grub_test</code> module generally works with the downgrade. I'll check tomorrow again whether the downgraded package survived the nightly update.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=4626022021-11-10T10:37:09Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>The downgrade survived the nightly transactional update. Also further test runs (conducted on different workers) worked.</p>
openQA Infrastructure - action #99189: Upgrade o3 workers to openSUSE Leap 15.3 size:Mhttps://progress.opensuse.org/issues/99189?journal_id=5245552022-05-31T18:19:53Zokurzokurz@suse.com
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-4 status-3 priority-4 priority-default closed child" href="/issues/111863">action #111863</a>: Upgrade o3 workers to openSUSE Leap 15.4 size:M</i> added</li></ul>