action #112112
open[qac] script_retry does not retry if timeout is not able to kill the inner command in 3sec
0%
Description
I'm using script_retry to ensure a long running command is
script_retry($some_cmd, timeout => 600, retry => 5, delay => 60);
The inner command timeout but the script_retry does not try to execute it again.
http://1c242.qa.suse.de/tests/326/file/autoinst-log.txt
[debug] post_fail_hook failed: command 'timeout 600 az group delete --resource-group openqa-trento-rg-326 --yes' timed out at /usr/lib/os-autoinst/testapi.pm line 1039.
testapi::script_run("timeout 600 az group delete --resource-group openqa-trento-rg"..., 603) called at sle/lib/utils.pm line 1609
utils::script_retry("az group delete --resource-group openqa-trento-rg-326 --yes", "timeout", 600, "retry", 5, "delay", 60) called at sle/lib/trento.pm line 99
trento::az_delete_group(test_trento_web=HASH(0x55d147f8bf38)) called at sle/tests/sles4sap/trento/test_trento_web.pm line 51
test_trento_web::post_fail_hook(test_trento_web=HASH(0x55d147f8bf38)) called at /usr/lib/os-autoinst/basetest.pm line 295
So the timeout that is wrapping the command itself timeout. It is like the 3sec are not enough for timeout to kill the inner command
# timeout for script_run must be larger than for the 'timeout ...' command
$ret = script_run($exec, ($timeout + 3));
Updated by okurz over 2 years ago
- Project changed from openQA Project (public) to openQA Tests (public)
- Subject changed from script_retry does not retry if timeout is not able to kill the inner command in 3sec to [qac] script_retry does not retry if timeout is not able to kill the inner command in 3sec
- Category set to Enhancement to existing tests
the subroutine script_retry
is defined within os-autoinst-distri-opensuse in https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/utils.pm#L1593 so reassigning to "openQA Tests" as the according ticket tracker.
I think it's quite expectable that timeout … $cmd
can take longer than 3 seconds to properly terminate commands. The choice of 3s is obviously a bit arbitrary. One could avoid that magic number and instead also use a configurable parameter. However as we are relying on the command "timeout" (from the package "coreutils") we could also ditch script_retry altogether and just rely on shell internal features together with the timeout command, e.g.:
retry=3; kill=5; $timeout=600; cmd="sleep 5"; for i in $(seq 1 $retry); do timeout -k $kill 1 $cmd && break || echo "Retry: $i/$retry" ; done
@ph03nix @pdostal you were the last two people touching this, feel welcome to take over.
Updated by slo-gin 9 months ago
This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.