action #115292
closed[qem][qe-core]test fails in zypper_lifecycle due to "zypper refresh" timeout
100%
Description
Observation¶
openQA test in scenario sle-15-SP4-Server-DVD-Updates-x86_64-qam-allpatterns+addons@64bit fails in
zypper_lifecycle
Test suite description¶
Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. All modules defined in SCC_ADDONS
#20201124: added chrony-pool-openSUSE due to 15SP1 QU5 failure https://openqa.suse.de/tests/5060854
Reproducible¶
Fails since (at least) Build 20220813-1 (current job)
Expected result¶
Last good: 20220812-1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by rfan1 over 2 years ago
This sporadic issue can be seen in past few builds.
Within QEM tests, we have many patches to be tested in a single job, and then "zypper refresh" command may need more time to download and preparse the metadata of packages.
I have checked the passed case and take one as an example:
https://openqa.suse.de/tests/9324926/logfile?filename=autoinst-log.txt
[2022-08-13T05:32:31.585448+02:00] [debug] <<< testapi::type_string(string="zypper refresh", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-08-13T05:32:32.059275+02:00] [debug] tests/console/zypper_lifecycle.pm:71 called testapi::script_run
[2022-08-13T05:32:32.059556+02:00] [debug] <<< testapi::type_string(string="; echo nD07C-\$?- > /dev/ttyS0\n", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-08-13T05:32:33.173642+02:00] [debug] tests/console/zypper_lifecycle.pm:71 called testapi::script_run
[2022-08-13T05:32:33.173951+02:00] [debug] <<< testapi::wait_serial(timeout=300, expect_not_found=0, no_regex=0, regexp=qr/nD07C-\d+-/u, buffer_size=undef, quiet=undef, record_output=undef)
[2022-08-13T05:37:14.794120+02:00] [debug] >>> testapi::wait_serial: (?^u:nD07C-\d+-): ok
It used ~5 minutes to complete this operation, so we can increase this value to make the test more stably.
Updated by rfan1 over 2 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 20
- Estimated time set to 3.00 h
Set new timeout=420 and see:
# openqa-clone-job --from http://openqa.suse.de --host http://openqa.suse.de 9332230 _GROUP_ID=0 -skip-download --skip-chained-deps BUILD=115292 CASEDIR=https://github.com/rfan1/os-autoinst-distri-opensuse.git#zypper_lifecycle_timeout TEST=poo115292 RETRY=0
Created job #9332685: sle-15-SP4-Server-DVD-Updates-x86_64-Build20220814-1-qam-allpatterns+addons@64bit -> http://openqa.suse.de/t9332685
Updated by rfan1 over 2 years ago
Well, the case passed within 6 minutes, so it is safe to increase the value to 7 minutes.
https://openqa.suse.de/tests/9332685/logfile?filename=autoinst-log.txt
[2022-08-15T14:25:45.788613+02:00] [debug] <<< testapi::script_run(cmd="zypper refresh", die_on_timeout=1, quiet=undef, timeout=420, output="")
[2022-08-15T14:25:45.788726+02:00] [debug] tests/console/zypper_lifecycle.pm:71 called testapi::script_run
[2022-08-15T14:25:45.788851+02:00] [debug] <<< testapi::type_string(string="zypper refresh", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-08-15T14:25:46.263632+02:00] [debug] tests/console/zypper_lifecycle.pm:71 called testapi::script_run
[2022-08-15T14:25:46.263907+02:00] [debug] <<< testapi::type_string(string="; echo ymXzQ-\$?- > /dev/ttyS0\n", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-08-15T14:25:47.381768+02:00] [debug] tests/console/zypper_lifecycle.pm:71 called testapi::script_run
[2022-08-15T14:25:47.382057+02:00] [debug] <<< testapi::wait_serial(no_regex=0, record_output=undef, timeout=420, buffer_size=undef, quiet=undef, expect_not_found=0, regexp=qr/ymXzQ-\d+-/u)
[2022-08-15T14:31:12.278336+02:00] [debug] >>> testapi::wait_serial: (?^u:ymXzQ-\d+-): ok
Updated by rfan1 over 2 years ago
As Michael Andres's kindly suggestion:
OTOH the zypper_lifecycle test enforces being slow:
select_console 'root-console';
# First we'd make sure that we have a clean zypper cache env and all dirs have
# 0755 and all files have 0644 pemmission.
# For some reason the system will change the permission on /var/cache/zypp/{solv,raw}
# files. this cause the zypper lifecycle failed when building cache for non-root user.
script_run('rm -fr /var/cache/zypp/solv /var/cache/zypp/solv');
script_run('rm -fr /var/cache/zypp/raw /var/cache/zypp/raw');
That's IMO a strange idea, especially in presence of 272 repos to be refreshed. Removing the raw cache enforces all files to be downloaded again without being able to reuse any unchanged files. There are several 'zypper ref' and also install commands in previous test. So the caches are almost certainly up-to date - until zypper_lifecycle throws away everything.
I'm not sure about the rationale, but 'the system will change the permission' only if root used a restrictive umask, and the tests should know. But even if you are afraid of having messed up the permissions in a previous test, then chmod -R u+rwX,og+rX /var/cache/zypp should be sufficient
Let me try to enhance current logic as well.
Updated by rfan1 over 2 years ago
- % Done changed from 20 to 70
Updated by rfan1 over 2 years ago
- Status changed from In Progress to Feedback
- % Done changed from 70 to 90
Updated by rfan1 over 2 years ago
- Status changed from Feedback to Resolved
- % Done changed from 90 to 100
Tests passed in the latest openQA run.