action #122608
closedcoordination #122650: [epic] Fix firewall block and improve error reporting when test fails in curl log upload
exit code of shell command not received by script_run
Description
Occasionally, script_run
(and assert_script_run
) do not receive the exit code from the shell command, even though the command has exited, resulting in a timeout.
This has been noticed on s390x-kvm here where the ping command returns but still script_run
times out.
It is also obvious in this example where assert_script_run
times out, despite the fact that the ls
command has returned.
The issue is very sporadic and not easy to debug.
Updated by okurz almost 2 years ago
- Category set to Support
- Status changed from New to Feedback
- Assignee set to okurz
- Target version set to Ready
Could it be that the command actually takes so long that the test runs into a timeout and just after that the command shows up on the command line? Does the corresponding exit code marker actually show up in the serial log file?
Updated by geor almost 2 years ago
okurz wrote:
Could it be that the command actually takes so long that the test runs into a timeout and just after that the command shows up on the command line? Does the corresponding exit code marker actually show up in the serial log file?
It could be but it is not likely in my observations, I have noticed that when script_run fails, it fails even with a timeout in the order of minutes, and for the example in the case of the ping command, the command returns in milliseconds, but it then times out after some minutes.
I did not find something on the serial log.
Updated by okurz almost 2 years ago
- Related to action #122656: Ask SUSE-IT network admins to *not* block this traffic which we need for tests regarding s390x within SUSE network size:M added
Updated by openqa_review almost 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: offline_sles12sp4_ltss_pscc_sdk_all_full
https://openqa.suse.de/tests/10219982#step/logs_from_installation_system/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by openqa_review almost 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: offline_sles12sp4_ltss_pscc_sdk_all_full
https://openqa.suse.de/tests/10540163#step/logs_from_installation_system/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 56 days if nothing changes in this ticket.
Updated by okurz over 1 year ago
- Tags deleted (
infra) - Category changed from Support to Regressions/Crashes
- Status changed from Blocked to New
- Assignee deleted (
okurz)
#122656 has been resolved. The latest example https://openqa.suse.de/tests/10540163#step/logs_from_installation_system/5 shows what is described by the original issue. The string UWHKD- is expected but never found in the serial output although https://openqa.suse.de/tests/10540163#step/logs_from_installation_system/4 clearly looks like a ping -c1
command that already finished quickly. So, yes, it looks like the text string never arrived back to the worker.
Updated by openqa_review over 1 year ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: offline_sle12sp5_sles15sp4_sles15sp5_media_all_full_
https://openqa.suse.de/tests/10933017#step/logs_from_installation_system/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by openqa_review over 1 year ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: offline_sle12sp5_sles15sp4_sles15sp5_media_all_full_
https://openqa.suse.de/tests/11142598#step/logs_from_installation_system/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 56 days if nothing changes in this ticket.
Updated by okurz about 1 year ago
- Status changed from New to Resolved
- Assignee set to okurz
- Target version changed from future to Ready
With NUE1 decommissioned all active systems are in new security zones and I guess machines that are brought (back) into production will also end up in new security zones. No specific work for improving error reporting here was done and I don't think we need to improve that further. We need to rely on SUSE-IT to monitor their firewall accordingly.