Project

General

Profile

Actions

action #122608

closed

QA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

QA - coordination #116623: [epic] Migration of SUSE Nbg based openQA+QA+QAM systems to new security zones

coordination #122650: [epic] Fix firewall block and improve error reporting when test fails in curl log upload

exit code of shell command not received by script_run

Added by geor over 1 year ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-01-02
Due date:
% Done:

0%

Estimated time:

Description

Occasionally, script_run (and assert_script_run) do not receive the exit code from the shell command, even though the command has exited, resulting in a timeout.
This has been noticed on s390x-kvm here where the ping command returns but still script_run times out.
It is also obvious in this example where assert_script_run times out, despite the fact that the ls command has returned.
The issue is very sporadic and not easy to debug.


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #122656: Ask SUSE-IT network admins to *not* block this traffic which we need for tests regarding s390x within SUSE network size:MResolvedokurz2023-01-03

Actions
Actions #1

Updated by okurz over 1 year ago

  • Category set to Support
  • Status changed from New to Feedback
  • Assignee set to okurz
  • Target version set to Ready

Could it be that the command actually takes so long that the test runs into a timeout and just after that the command shows up on the command line? Does the corresponding exit code marker actually show up in the serial log file?

Actions #2

Updated by geor over 1 year ago

okurz wrote:

Could it be that the command actually takes so long that the test runs into a timeout and just after that the command shows up on the command line? Does the corresponding exit code marker actually show up in the serial log file?

It could be but it is not likely in my observations, I have noticed that when script_run fails, it fails even with a timeout in the order of minutes, and for the example in the case of the ping command, the command returns in milliseconds, but it then times out after some minutes.
I did not find something on the serial log.

Actions #3

Updated by okurz over 1 year ago

  • Related to action #122656: Ask SUSE-IT network admins to *not* block this traffic which we need for tests regarding s390x within SUSE network size:M added
Actions #4

Updated by okurz over 1 year ago

  • Tags set to infra
  • Status changed from Feedback to Blocked
  • Parent task set to #122650

then I assume this is part of #122650, blocked by #122656

Actions #5

Updated by openqa_review over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp4_ltss_pscc_sdk_all_full
https://openqa.suse.de/tests/10219982#step/logs_from_installation_system/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #6

Updated by openqa_review about 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp4_ltss_pscc_sdk_all_full
https://openqa.suse.de/tests/10540163#step/logs_from_installation_system/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 56 days if nothing changes in this ticket.

Actions #7

Updated by okurz about 1 year ago

  • Tags deleted (infra)
  • Category changed from Support to Regressions/Crashes
  • Status changed from Blocked to New
  • Assignee deleted (okurz)

#122656 has been resolved. The latest example https://openqa.suse.de/tests/10540163#step/logs_from_installation_system/5 shows what is described by the original issue. The string UWHKD- is expected but never found in the serial output although https://openqa.suse.de/tests/10540163#step/logs_from_installation_system/4 clearly looks like a ping -c1 command that already finished quickly. So, yes, it looks like the text string never arrived back to the worker.

Actions #8

Updated by okurz about 1 year ago

  • Target version changed from Ready to future
Actions #9

Updated by openqa_review about 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sle12sp5_sles15sp4_sles15sp5_media_all_full_
https://openqa.suse.de/tests/10933017#step/logs_from_installation_system/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #10

Updated by openqa_review 11 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sle12sp5_sles15sp4_sles15sp5_media_all_full_
https://openqa.suse.de/tests/11142598#step/logs_from_installation_system/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 56 days if nothing changes in this ticket.

Actions #11

Updated by okurz 5 months ago

  • Status changed from New to Resolved
  • Assignee set to okurz
  • Target version changed from future to Ready

With NUE1 decommissioned all active systems are in new security zones and I guess machines that are brought (back) into production will also end up in new security zones. No specific work for improving error reporting here was done and I don't think we need to improve that further. We need to rely on SUSE-IT to monitor their firewall accordingly.

Actions

Also available in: Atom PDF