Project

General

Profile

Actions

action #112112

open

[qac] script_retry does not retry if timeout is not able to kill the inner command in 3sec

Added by mpagot almost 2 years ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Enhancement to existing tests
Target version:
-
Start date:
2022-06-07
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

I'm using script_retry to ensure a long running command is
script_retry($some_cmd, timeout => 600, retry => 5, delay => 60);

The inner command timeout but the script_retry does not try to execute it again.

http://1c242.qa.suse.de/tests/326/file/autoinst-log.txt

[debug] post_fail_hook failed: command 'timeout 600 az group delete --resource-group openqa-trento-rg-326 --yes' timed out at /usr/lib/os-autoinst/testapi.pm line 1039.
    testapi::script_run("timeout 600 az group delete --resource-group openqa-trento-rg"..., 603) called at sle/lib/utils.pm line 1609
    utils::script_retry("az group delete --resource-group openqa-trento-rg-326 --yes", "timeout", 600, "retry", 5, "delay", 60) called at sle/lib/trento.pm line 99
    trento::az_delete_group(test_trento_web=HASH(0x55d147f8bf38)) called at sle/tests/sles4sap/trento/test_trento_web.pm line 51
    test_trento_web::post_fail_hook(test_trento_web=HASH(0x55d147f8bf38)) called at /usr/lib/os-autoinst/basetest.pm line 295

So the timeout that is wrapping the command itself timeout. It is like the 3sec are not enough for timeout to kill the inner command

        # timeout for script_run must be larger than for the 'timeout ...' command
        $ret = script_run($exec, ($timeout + 3));
Actions #1

Updated by okurz almost 2 years ago

  • Project changed from openQA Project to openQA Tests
  • Subject changed from script_retry does not retry if timeout is not able to kill the inner command in 3sec to [qac] script_retry does not retry if timeout is not able to kill the inner command in 3sec
  • Category set to Enhancement to existing tests

the subroutine script_retry is defined within os-autoinst-distri-opensuse in https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/utils.pm#L1593 so reassigning to "openQA Tests" as the according ticket tracker.

I think it's quite expectable that timeout … $cmd can take longer than 3 seconds to properly terminate commands. The choice of 3s is obviously a bit arbitrary. One could avoid that magic number and instead also use a configurable parameter. However as we are relying on the command "timeout" (from the package "coreutils") we could also ditch script_retry altogether and just rely on shell internal features together with the timeout command, e.g.:

retry=3; kill=5; $timeout=600; cmd="sleep 5"; for i in $(seq 1 $retry); do timeout -k $kill 1 $cmd && break || echo "Retry: $i/$retry" ; done

@ph03nix @pdostal you were the last two people touching this, feel welcome to take over.

Actions #2

Updated by slo-gin about 1 month ago

This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions

Also available in: Atom PDF