Project

General

Profile

Actions

action #120429

closed

coordination #121876: [epic] Handle openQA review failures in Yam squad - SLE 15 SP5

Increase timeout when checking if YaST logs can be uploaded

Added by tinawang123 over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2022-11-15
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Three jobs failed:
https://openqa.suse.de/tests/9951916#step/logs_from_installation_system/6
https://openqa.suse.de/tests/9951915
https://openqa.suse.de/tests/9951914

According to the screen: https://openqa.suse.de/tests/9951916#step/logs_from_installation_system/4
It has gotten the response, but it still returned timeout.
We should increase timeout when checking if logs can be uploaded, otherwise we fail for s390x KVM.
Check code: (can_upload_logs](https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/network_utils.pm#L72)

Acceptance criteria

AC1: Increase timeout when checking if YaST logs can be uploaded

Actions #1

Updated by JERiveraMoya over 1 year ago

  • Subject changed from test fails in logs_from_installation_system - command 'ping -c 1 worker2.oqa.suse.de' timed out to Increase timeout when checking if YaST logs can be uploaded
  • Description updated (diff)
  • Priority changed from Normal to High
  • Target version set to Current
Actions #2

Updated by JERiveraMoya over 1 year ago

  • Description updated (diff)
Actions #3

Updated by coolgw over 1 year ago

I suspect /dev/ttysclp0 is still work or not.

Actions #4

Updated by JERiveraMoya over 1 year ago

  • Tags deleted (qe-yast-refinement)
  • Status changed from New to Workable
Actions #5

Updated by geor over 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to geor
Actions #7

Updated by JERiveraMoya over 1 year ago

  • Parent task set to #121876
Actions #8

Updated by geor over 1 year ago

PR closed.
After various iteration we can concur that increasing the timeout does not resolve the issue.
The culprit for the occasional failure here is script_run, which does not always capture the return code from the ping command. It can happen that ping has returned successfully in the first 10 seconds, but script_run will time out in 60 seconds (or whichever timeout value it had).

A new approach is needed here, that will address the sporadic inability of script_run's underlying functionality to get ping's return code.
As discussed in the daily we will not create a new ticket to avoid creating clutter, but will work on the new issue that has arisen in this ticket.

Actions #9

Updated by openqa_review over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp4_ltss_pscc_sdk_all_full
https://openqa.suse.de/tests/10219982#step/logs_from_installation_system/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #10

Updated by geor over 1 year ago

So, it turns out that this is not related to the ping command, but rather, there are cases where the return code of a shell command is not captured by openqa.
For instance, here we can see that an assert_script_run("ls") command has timed out, despite the fact that the ls command has successfully returned.
This issue, where the return code of a shell command is not captured, appears only sporadically and is not easy to debug given it's nature.

Actions #11

Updated by geor over 1 year ago

  • Status changed from In Progress to Closed
Actions #12

Updated by geor over 1 year ago

Actions #13

Updated by openqa_review over 1 year ago

  • Status changed from Closed to Feedback

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp5_pscc_sdk-asmm-contm-lgm-tcm-wsm-pcm_all_full
https://openqa.suse.de/tests/10256078#step/logs_from_installation_system/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #14

Updated by geor over 1 year ago

  • Status changed from Feedback to Resolved
Actions #15

Updated by openqa_review about 1 year ago

  • Status changed from Resolved to Feedback

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp5_media_sdk-asmm-contm-lgm-tcm-wsm-pcm_all_full
https://openqa.suse.de/tests/10448596#step/logs_from_installation_system/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #16

Updated by geor about 1 year ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF