A big number of tests fail with networking (all workers) due to SLE libslirp0 update
This is just a sample test - many more are affected across the board
The SUT seem not to receive DHCP leases, resulting in all kind of test issues.
Rerunning them might or might not work
openQA test in scenario opensuse-Tumbleweed-XFCE-Live-x86_64-xfce-live@64bit fails in
Fails since (at least) Build 20220421
Last good: 20220420 (or more recent)
- Look into mitigations for both o3 and osd
- Ensure root cause issue is fixed
- Conduct lessons learned
- Wait until libslirp0 is fixed
- Remove package lock on all o3+osd machines
- Install updated libslirp0
- Confirm tests are still working fine
- Revert https://github.com/os-autoinst/os-autoinst/pull/2042
Always latest result in this scenario: latest
so far I don't know what's the cause. I consider os-autoinst update unlikely as jobs after the latest hotfix were fine and there was no change after that, right? Maybe qemu update or some failing service or configuration?
i have seen an dnsmasq update on ariel
that sounds suspiciuous. Let me check dnsmasq logs
i tried reverting that yesterday on ariel, but it dif not make ow7 better (ow1 and 4 still worked yesterday evening)
over night it got worse and all fail
I rolled back openqaworker to state of 2022-04-21 now, triggering tests on that machine.
possible though that the update was applied again
openqa-clone-job --within-instance https://openqa.opensuse.org/tests/2310741 _GROUP=0 TEST=autotest_investigate_poo110196 _SKIP_POST_FAIL_HOOKS=1 WORKER_CLASS=7
Created job #2310798: opensuse-Tumbleweed-DVD-x86_64-autoyast_multi_btrfs:investigate:last_good_tests_and_build:e8e5f0b966a518c15c11a4fc0b03489d3dafec9b+20220420@64bit -> https://openqa.opensuse.org/t2310798
Also reverted dnsmasq on o3
2021-12-06 20:07:30|install|dnsmasq|2.86-7.17.1|x86_64||repo-sle-update|561e2500f84e107c73091df9d0ac94bc8188bb75e32f290a822eacfe0fd0eeed| 2022-04-22 19:10:31|install|dnsmasq|2.86-15018.104.22.168|x86_64||repo-sle-update|5f5da91359421f64fe90696907f125cfcb8780824eadd2eac49c221bbbd780be| 2022-04-22 20:15:13|command|root@ariel|'zypper' 'in' '--oldpackage' 'dnsmasq-2.86-7.17.1'| 2022-04-22 20:15:15|install|dnsmasq|2.86-7.17.1|x86_64|root@ariel|repo-sle-update|561e2500f84e107c73091df9d0ac94bc8188bb75e32f290a822eacfe0fd0eeed| 2022-04-23 00:00:19|install|dnsmasq|2.86-15022.214.171.124|x86_64||repo-sle-update|5f5da91359421f64fe90696907f125cfcb8780824eadd2eac49c221bbbd780be|
and triggered new job
Disabled on openqaworker7
systemctl disable --now openqa-continuous-update.timer
and installed old version with
zypper in --force /var/cache/zypp/packages/devel_openQA/x86_64/os-autoinst-4.6.1650537502.22e982ce-lp153.1209.1.x86_64.rpm
Triggered another job
Takes a bit long to clone a different git version of the test distribution, using a non-git job as template.
openqa-clone-job --within-instance https://openqa.opensuse.org/tests/2310665 _GROUP=0 TEST=autotest_investigate_poo110196 _SKIP_POST_FAIL_HOOKS=1 WORKER_CLASS=openqaworker7
Created job #2310801: opensuse-Tumbleweed-DVD-x86_64-Build20220421-autoyast_multi_btrfs@64bit -> https://openqa.opensuse.org/t2310801
I think the rollback on openqaworker7 was not effective, found some package updates installed again. From /var/log/zypp/history found that also libslirp0 was updated, reverted with
zypper -n in --oldpackage libslirp0-4.3.1-1.51
and triggered another test
openqa-clone-job --within-instance https://openqa.opensuse.org/tests/2310759 _GROUP=0 TEST=textmode_investigate_poo110196 _SKIP_POST_FAIL_HOOKS=1 WORKER_CLASS=openqaworker7 SCHEDULE=tests/installation/bootloader,tests/installation/welcome,tests/installation/online_repos,tests/installation/installation_mode
Created job #2310803: opensuse-Staging:I-Staging-DVD-x86_64-BuildI.420.1-textmode@64bit -> https://openqa.opensuse.org/t2310803
That test passed.
So libslirp0 is at fault.
The problematic change from the changelog:
* Wed Feb 23 2022 email@example.com - security update - added patches fix CVE-2021-3592 [bsc#1187364], invalid pointer initialization may lead to information disclosure (bootp) + libslirp-CVE-2021-3592.patch fix CVE-2021-3594 [bsc#1187367], invalid pointer initialization may lead to information disclosure (udp) + libslirp-CVE-2021-3594.patch fix CVE-2021-3595 [bsc#1187366], invalid pointer initialization may lead to information disclosure (tftp) + libslirp-CVE-2021-3595.patch
Pinning all our machines to that version:
for i in openqaworker1 openqaworker4 openqaworker7 power8 imagetester rebel; do echo $i && ssh root@$i "transactional-update run /bin/sh -c 'zypper -n in --oldpackage libslirp0-4.3.1-1.51 && zypper al libslirp0' && reboot || zypper -n in --oldpackage libslirp0-4.3.1-1.51 && zypper al libslirp0" ; done
there is already a bug report https://bugzilla.opensuse.org/show_bug.cgi?id=1198773 which seems to fit.
- Project changed from openQA Tests to openQA Project
- Subject changed from A big number of tests fail with networking (all workers) to A big number of tests fail with networking (all workers) due to SLE libslirp0 update
- Description updated (diff)
- Category deleted (
Bugs in existing tests)
It should be ok for o3 now. I pinned the old package version but I have not restarted all affected tests.
After applying the package lock I did on w7
systemctl enable --now openqa-continuous-update.timer again and checked that the service runs fine with
systemctl start openqa-continuous-update && journalctl -f openqa-continuous-update.
Also I saw that OSD has the same update installed everywhere as well but have not seen related problems in jobs.
yeah, ok, broken the same, see https://openqa.suse.de/tests/8615929#step/scc_registration/5
Rolled back on OSD workers with
sudo salt --no-color --state-output=changes -C 'G@roles:worker' cmd.run 'zypper -n in --oldpackage libslirp0-4.3.1-1.51 && zypper al libslirp0'
This did not work on openqaworker14+15 (see #104970) so I did manually
sudo zypper -n in --oldpackage libslirp0-4.3.1-1.51 && sudo zypper al libslirp0
- Category set to Concrete Bugs
https://build.suse.de/request/show/266342 shows that zluo approved the update with test report https://qam.suse.de/testreports/SUSE:Maintenance:23007:266342/log which explicitly states that no tests have been done by the reviewer which is obviously bad and could have been done better knowing that the purpose of libslirp0 is explicitly networking for qemu and that was not tested.
#14 Updated by okurz about 2 months ago
- Tags set to reactive work
On openqaworker7 did
zypper rl libslirp0 && zypper -n in libslirp0
openqa-clone-job --within-instance https://openqa.opensuse.org/tests/2327458 _GROUP=0 TEST=textmode_investigate_poo110196 _SKIP_POST_FAIL_HOOKS=1 WORKER_CLASS=openqaworker7 SCHEDULE=tests/installation/bootloader,tests/installation/welcome,tests/installation/online_repos,tests/installation/installation_mode
#15 Updated by okurz about 2 months ago
https://openqa.opensuse.org/tests/2328321# is fine, so doing
for i in openqaworker1 openqaworker4 openqaworker7 power8 imagetester rebel; do echo $i && ssh root@$i "zypper rl libslirp0 && transactional-update run /bin/sh -c 'zypper -n in libslirp0' && reboot || zypper -n in libslirp0" ; done
and for OSD
sudo salt --no-color --state-output=changes -C 'G@roles:worker' cmd.run 'zypper rl libslirp0 && zypper -n in libslirp0'