Project

General

Profile

Actions

action #115292

closed

[qem][qe-core]test fails in zypper_lifecycle due to "zypper refresh" timeout

Added by rfan1 over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
Start date:
2022-08-15
Due date:
% Done:

100%

Estimated time:
3.00 h
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP4-Server-DVD-Updates-x86_64-qam-allpatterns+addons@64bit fails in
zypper_lifecycle

Test suite description

Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. All modules defined in SCC_ADDONS
#20201124: added chrony-pool-openSUSE due to 15SP1 QU5 failure https://openqa.suse.de/tests/5060854

Reproducible

Fails since (at least) Build 20220813-1 (current job)

Expected result

Last good: 20220812-1 (or more recent)

Further details

Always latest result in this scenario: latest

Actions #1

Updated by rfan1 over 2 years ago

This sporadic issue can be seen in past few builds.

Within QEM tests, we have many patches to be tested in a single job, and then "zypper refresh" command may need more time to download and preparse the metadata of packages.

I have checked the passed case and take one as an example:

https://openqa.suse.de/tests/9324926/logfile?filename=autoinst-log.txt

[2022-08-13T05:32:31.585448+02:00] [debug] <<< testapi::type_string(string="zypper refresh", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-08-13T05:32:32.059275+02:00] [debug] tests/console/zypper_lifecycle.pm:71 called testapi::script_run
[2022-08-13T05:32:32.059556+02:00] [debug] <<< testapi::type_string(string="; echo nD07C-\$?- > /dev/ttyS0\n", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-08-13T05:32:33.173642+02:00] [debug] tests/console/zypper_lifecycle.pm:71 called testapi::script_run
[2022-08-13T05:32:33.173951+02:00] [debug] <<< testapi::wait_serial(timeout=300, expect_not_found=0, no_regex=0, regexp=qr/nD07C-\d+-/u, buffer_size=undef, quiet=undef, record_output=undef)
[2022-08-13T05:37:14.794120+02:00] [debug] >>> testapi::wait_serial: (?^u:nD07C-\d+-): ok

It used ~5 minutes to complete this operation, so we can increase this value to make the test more stably.

Actions #2

Updated by rfan1 over 2 years ago

  • Assignee set to rfan1
Actions #3

Updated by rfan1 over 2 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 20
  • Estimated time set to 3.00 h

Set new timeout=420 and see:

# openqa-clone-job  --from http://openqa.suse.de --host http://openqa.suse.de 9332230  _GROUP_ID=0 -skip-download --skip-chained-deps BUILD=115292  CASEDIR=https://github.com/rfan1/os-autoinst-distri-opensuse.git#zypper_lifecycle_timeout TEST=poo115292 RETRY=0
Created job #9332685: sle-15-SP4-Server-DVD-Updates-x86_64-Build20220814-1-qam-allpatterns+addons@64bit -> http://openqa.suse.de/t9332685
Actions #4

Updated by rfan1 over 2 years ago

Well, the case passed within 6 minutes, so it is safe to increase the value to 7 minutes.

https://openqa.suse.de/tests/9332685/logfile?filename=autoinst-log.txt

[2022-08-15T14:25:45.788613+02:00] [debug] <<< testapi::script_run(cmd="zypper refresh", die_on_timeout=1, quiet=undef, timeout=420, output="")
[2022-08-15T14:25:45.788726+02:00] [debug] tests/console/zypper_lifecycle.pm:71 called testapi::script_run
[2022-08-15T14:25:45.788851+02:00] [debug] <<< testapi::type_string(string="zypper refresh", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-08-15T14:25:46.263632+02:00] [debug] tests/console/zypper_lifecycle.pm:71 called testapi::script_run
[2022-08-15T14:25:46.263907+02:00] [debug] <<< testapi::type_string(string="; echo ymXzQ-\$?- > /dev/ttyS0\n", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-08-15T14:25:47.381768+02:00] [debug] tests/console/zypper_lifecycle.pm:71 called testapi::script_run
[2022-08-15T14:25:47.382057+02:00] [debug] <<< testapi::wait_serial(no_regex=0, record_output=undef, timeout=420, buffer_size=undef, quiet=undef, expect_not_found=0, regexp=qr/ymXzQ-\d+-/u)
[2022-08-15T14:31:12.278336+02:00] [debug] >>> testapi::wait_serial: (?^u:ymXzQ-\d+-): ok
Actions #5

Updated by rfan1 over 2 years ago

As Michael Andres's kindly suggestion:

OTOH the zypper_lifecycle test enforces being slow:
     select_console 'root-console';
     # First we'd make sure that we have a clean zypper cache env and all dirs have
     # 0755 and all files have 0644 pemmission.
     # For some reason the system will change the permission on /var/cache/zypp/{solv,raw}
     # files. this cause the zypper lifecycle failed when building cache for non-root user.
     script_run('rm -fr /var/cache/zypp/solv /var/cache/zypp/solv');
     script_run('rm -fr /var/cache/zypp/raw /var/cache/zypp/raw');
That's IMO a strange idea, especially in presence of 272 repos to be refreshed. Removing the raw cache enforces all files to be downloaded again without being able to reuse any unchanged files. There are several 'zypper ref' and also install commands in previous test. So the caches are almost certainly up-to date - until zypper_lifecycle throws away everything.
I'm not sure about the rationale, but 'the system will change the permission' only if root used a restrictive umask, and the tests should know. But even if you are afraid of having messed up the permissions in a previous test, then chmod -R u+rwX,og+rX /var/cache/zypp should be sufficient

Let me try to enhance current logic as well.

Actions #7

Updated by rfan1 over 2 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 70 to 90
Actions #8

Updated by rfan1 over 2 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 90 to 100

Tests passed in the latest openQA run.

Actions

Also available in: Atom PDF