action #125804
closedcoordination #121876: [epic] Handle openQA review failures in Yam squad - SLE 15 SP5
[sporadic] Increase timeout execution in the Continuos migration scenarios
Added by JERiveraMoya almost 2 years ago. Updated over 1 year ago.
Description
Motivation¶
Increase timeout execution in the following scenario:
offline_sles15sp1_sles15sp4_sles15sp5_media_all_full_s390x_ph0@s390x-kvm-sle12 fails in
svirt_upload_assets
and check in the history of other scenario or just estimate it if no history in which other test suite this should be increased.
Acceptance criteria¶
AC1: Scenarios in Continuous migration has enough time to run
Updated by JERiveraMoya almost 2 years ago
- Project changed from openQA Tests (public) to qe-yam
- Subject changed from test fails in svirt_upload_assets to Increase timeout execution in the Continuos migration scenarios
- Description updated (diff)
- Category deleted (
Bugs in existing tests) - Status changed from New to Workable
- Priority changed from Normal to High
- Target version set to Current
Updated by hjluo almost 2 years ago
- Status changed from Workable to In Progress
- Assignee set to hjluo
Updated by hjluo over 1 year ago
set MAX_JOB_TIME=7200 for offline_sles15sp1_sles15sp4_sles15sp5_media_all_full_s390x_ph0 7200
and for
https://openqa.suse.de/t107305401
https://openqa.suse.de/t107305401
https://openqa.suse.de/t107305402
Updated by hjluo over 1 year ago
now,we can just focus on the ARM and s390x-kvm-sle12 cases by now.
Updated by JERiveraMoya over 1 year ago
links are not available, could you please paste the MR?
Updated by hjluo over 1 year ago
- for Aarch64 flavor Continuous-Migration-SLE15SP5 we've set MAX_JOB_TIME=14400, and no case failed for timeout since then.
- in 88.1 no cases failed for execution time > MAX_JOB_TIME.
Updated by hjluo over 1 year ago
- for the latest 90.1 build, there are some s390x failed with svirt_upload_assets.
- the reason is that the qcow files are more than 13~15GB and the worker was slow. after rerun both passed in 2 hours.
- tests in 90.1 after rerun
Updated by hjluo over 1 year ago
Now in build 93.2 AKA PublicRC-202304, all cases{include s390x} are finished for the MAX_JOB_TIME.
Updated by hjluo over 1 year ago
- Status changed from In Progress to Resolved
now mark it as resolved.
Updated by openqa_review over 1 year ago
- Status changed from Resolved to Feedback
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: online_sles15sp3_sles15sp4_sles15sp5_scc_all_full_yast_
https://openqa.suse.de/tests/11153512#step/svirt_upload_assets/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by JERiveraMoya over 1 year ago
- Status changed from Feedback to In Progress
@hjluo, could you please take a look? ticket was reopen automatically because that job is using it as label.
Updated by JERiveraMoya over 1 year ago
hjluo wrote:
OK. I'd take a look for the detail.
Last one passed, but can you run it 5-10 times to see if the timeout is enough?
Updated by JERiveraMoya over 1 year ago
- Subject changed from Increase timeout execution in the Continuos migration scenarios to [sporadic] Increase timeout execution in the Continuos migration scenarios
Updated by hjluo over 1 year ago
now the failed one is online_sles15sp3_sles15sp4_sles15sp5_scc_all_full_yast_s390x_ph0_1 and I've run 20_instances with MAX_JOB_TIME=14400 to check it.
The one that for this ticket offline_sles15sp1_sles15sp4_sles15sp5_media_all_full_s390x_ph0 was blocked by a bug1210196
Updated by hjluo over 1 year ago
from the result_page, 5 out of 20 cases passed, the rest were failed at svirt_upload_assets, but not for timeout, some case just run less than 14400 seconds.
Updated by hjluo over 1 year ago
run with a branch to extend the timeout.
for i in {1..5}; do
bash -x ./hj-tools/hj-branch.sh -a hjluo -b svirt_upload -j 11153512 -s "_GROUP=0 MAX_JOB_TIME=14400 TEST=online_sles15sp3_sles15sp4_sles15sp5_scc_all_full_yast_s390x_ph0_${i} _SKIP_POST_FAIL_HOOKS= PUBLISH_HDD_1=SLES-15-SP5-s390x-Build101.1-15SP3-15SP4-ph0_${i}.qcow2 PUBLISH_PFLASH_VARS=SLES-15-SP5-s390x-Build101.1-15SP3-15SP4-ph0-uefi-vars-${i}.qcow2"
The running_jobs
Updated by hjluo over 1 year ago
now the VR was blocked by the ERRICSSON repo issue, the PR owner is now fixing it. https://suse.slack.com/archives/C02D16TCP99/p1685687709108189, it now blocked all zypper_patch.
Updated by hjluo over 1 year ago
now using [writeback option to fast the qemu-imge convert speed.
'writeback' uses the page cache, considering the write complete when the data is in the page cache, and reading data from the page cache. This is likely to give the best performance but is also likely to give inconsistent performance and cause trouble for other applications.
reference:
Updated by JERiveraMoya over 1 year ago
- Status changed from In Progress to Resolved