Project

General

Profile

action #120429

coordination #121876: [epic] Handle openQA review failures in Yam squad

Increase timeout when checking if YaST logs can be uploaded

Added by tinawang123 3 months ago. Updated 19 days ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2022-11-15
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Three jobs failed:
https://openqa.suse.de/tests/9951916#step/logs_from_installation_system/6
https://openqa.suse.de/tests/9951915
https://openqa.suse.de/tests/9951914

According to the screen: https://openqa.suse.de/tests/9951916#step/logs_from_installation_system/4
It has gotten the response, but it still returned timeout.
We should increase timeout when checking if logs can be uploaded, otherwise we fail for s390x KVM.
Check code: (can_upload_logs](https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/network_utils.pm#L72)

Acceptance criteria

AC1: Increase timeout when checking if YaST logs can be uploaded

History

#1 Updated by JERiveraMoya 3 months ago

  • Subject changed from test fails in logs_from_installation_system - command 'ping -c 1 worker2.oqa.suse.de' timed out to Increase timeout when checking if YaST logs can be uploaded
  • Description updated (diff)
  • Priority changed from Normal to High
  • Target version set to Current

#2 Updated by JERiveraMoya 3 months ago

  • Description updated (diff)

#3 Updated by coolgw 3 months ago

I suspect /dev/ttysclp0 is still work or not.

#4 Updated by JERiveraMoya 2 months ago

  • Tags deleted (qe-yast-refinement)
  • Status changed from New to Workable

#5 Updated by geor 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to geor

#7 Updated by JERiveraMoya about 2 months ago

  • Parent task set to #121876

#8 Updated by geor about 2 months ago

PR closed.
After various iteration we can concur that increasing the timeout does not resolve the issue.
The culprit for the occasional failure here is script_run, which does not always capture the return code from the ping command. It can happen that ping has returned successfully in the first 10 seconds, but script_run will time out in 60 seconds (or whichever timeout value it had).

A new approach is needed here, that will address the sporadic inability of script_run's underlying functionality to get ping's return code.
As discussed in the daily we will not create a new ticket to avoid creating clutter, but will work on the new issue that has arisen in this ticket.

#9 Updated by openqa_review about 1 month ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp4_ltss_pscc_sdk_all_full
https://openqa.suse.de/tests/10219982#step/logs_from_installation_system/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

#10 Updated by geor about 1 month ago

So, it turns out that this is not related to the ping command, but rather, there are cases where the return code of a shell command is not captured by openqa.
For instance, here we can see that an assert_script_run("ls") command has timed out, despite the fact that the ls command has successfully returned.
This issue, where the return code of a shell command is not captured, appears only sporadically and is not easy to debug given it's nature.

#11 Updated by geor about 1 month ago

  • Status changed from In Progress to Closed

#12 Updated by geor about 1 month ago

#13 Updated by openqa_review 20 days ago

  • Status changed from Closed to Feedback

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp5_pscc_sdk-asmm-contm-lgm-tcm-wsm-pcm_all_full
https://openqa.suse.de/tests/10256078#step/logs_from_installation_system/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

#14 Updated by geor 19 days ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF