Project

General

Profile

Actions

action #125804

closed

coordination #121876: [epic] Handle openQA review failures in Yam squad - SLE 15 SP5

[sporadic] Increase timeout execution in the Continuos migration scenarios

Added by JERiveraMoya about 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2023-03-11
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Increase timeout execution in the following scenario:
offline_sles15sp1_sles15sp4_sles15sp5_media_all_full_s390x_ph0@s390x-kvm-sle12 fails in
svirt_upload_assets
and check in the history of other scenario or just estimate it if no history in which other test suite this should be increased.

Acceptance criteria

AC1: Scenarios in Continuous migration has enough time to run

Actions #1

Updated by JERiveraMoya about 1 year ago

  • Project changed from openQA Tests to qe-yam
  • Subject changed from test fails in svirt_upload_assets to Increase timeout execution in the Continuos migration scenarios
  • Description updated (diff)
  • Category deleted (Bugs in existing tests)
  • Status changed from New to Workable
  • Priority changed from Normal to High
  • Target version set to Current
Actions #2

Updated by hjluo about 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to hjluo
Actions #3

Updated by hjluo about 1 year ago

set MAX_JOB_TIME=7200 for offline_sles15sp1_sles15sp4_sles15sp5_media_all_full_s390x_ph0 7200

and for
https://openqa.suse.de/t107305401
https://openqa.suse.de/t107305401
https://openqa.suse.de/t107305402

Actions #4

Updated by hjluo about 1 year ago

now,we can just focus on the ARM and s390x-kvm-sle12 cases by now.

Actions #5

Updated by JERiveraMoya about 1 year ago

links are not available, could you please paste the MR?

Actions #6

Updated by hjluo about 1 year ago

  • for Aarch64 flavor Continuous-Migration-SLE15SP5 we've set MAX_JOB_TIME=14400, and no case failed for timeout since then.
  • in 88.1 no cases failed for execution time > MAX_JOB_TIME.
Actions #7

Updated by hjluo about 1 year ago

Actions #9

Updated by hjluo about 1 year ago

Actions #10

Updated by JERiveraMoya about 1 year ago

  • Priority changed from High to Normal
Actions #11

Updated by hjluo about 1 year ago

Now in build 93.2 AKA PublicRC-202304, all cases{include s390x} are finished for the MAX_JOB_TIME.

Actions #12

Updated by hjluo about 1 year ago

  • Status changed from In Progress to Resolved

now mark it as resolved.

Actions #13

Updated by openqa_review about 1 year ago

  • Status changed from Resolved to Feedback

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: online_sles15sp3_sles15sp4_sles15sp5_scc_all_full_yast_
https://openqa.suse.de/tests/11153512#step/svirt_upload_assets/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #14

Updated by JERiveraMoya about 1 year ago

  • Status changed from Feedback to In Progress

@hjluo, could you please take a look? ticket was reopen automatically because that job is using it as label.

Actions #15

Updated by hjluo 12 months ago

OK. I'd take a look for the detail.

Actions #16

Updated by JERiveraMoya 12 months ago

hjluo wrote:

OK. I'd take a look for the detail.

Last one passed, but can you run it 5-10 times to see if the timeout is enough?

Actions #17

Updated by JERiveraMoya 12 months ago

  • Subject changed from Increase timeout execution in the Continuos migration scenarios to [sporadic] Increase timeout execution in the Continuos migration scenarios
Actions #18

Updated by hjluo 12 months ago

  • now the failed one is online_sles15sp3_sles15sp4_sles15sp5_scc_all_full_yast_s390x_ph0_1 and I've run 20_instances with MAX_JOB_TIME=14400 to check it.

  • The one that for this ticket offline_sles15sp1_sles15sp4_sles15sp5_media_all_full_s390x_ph0 was blocked by a bug1210196

Actions #19

Updated by hjluo 12 months ago

from the result_page, 5 out of 20 cases passed, the rest were failed at svirt_upload_assets, but not for timeout, some case just run less than 14400 seconds.

Actions #20

Updated by hjluo 12 months ago

run with a branch to extend the timeout.

for i in {1..5}; do
bash -x  ./hj-tools/hj-branch.sh -a  hjluo -b  svirt_upload -j 11153512 -s "_GROUP=0 MAX_JOB_TIME=14400  TEST=online_sles15sp3_sles15sp4_sles15sp5_scc_all_full_yast_s390x_ph0_${i} _SKIP_POST_FAIL_HOOKS= PUBLISH_HDD_1=SLES-15-SP5-s390x-Build101.1-15SP3-15SP4-ph0_${i}.qcow2 PUBLISH_PFLASH_VARS=SLES-15-SP5-s390x-Build101.1-15SP3-15SP4-ph0-uefi-vars-${i}.qcow2"

The running_jobs

Actions #21

Updated by hjluo 12 months ago

now the VR was blocked by the ERRICSSON repo issue, the PR owner is now fixing it. https://suse.slack.com/archives/C02D16TCP99/p1685687709108189, it now blocked all zypper_patch.

Actions #22

Updated by hjluo 12 months ago

now using [writeback option to fast the qemu-imge convert speed.

'writeback' uses the page cache, considering the write complete when the data is in the page cache, and reading data from the page cache. This is likely to give the best performance but is also likely to give inconsistent performance and cause trouble for other applications.

reference:

  • 1 qemu: Make disk image conversion dramatically faster
  • 2 how to improve qcow performance?
Actions #23

Updated by hjluo 12 months ago

PR: 17222

Actions #24

Updated by hjluo 12 months ago

PR was merged.

Actions #25

Updated by JERiveraMoya 11 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF