action #180989
Updated by robert.richardson about 1 month ago
## Observation Recently I encountered the same issue many times, namely it takes extremely long time to release certain worker, for example, grenache-1:14. Initially I scheduled two test runs on the same worker to verify another issue, but I waited quite long time after the first job finished and before the second job started running . Worker grenache-1:14 took more than 6.5 hours to finish its work on the first job modules of which actually only took less than 4 hours. This means it took another 2.5 hours for the worker grenache-1:14 to be released before taking new job. See this [example job](https://openqa.suse.de/tests/17340219). I know worker may need to do some post-testrun work, upload logs and clean itself up before taking new job. But 2.5 hours is still too long for this case. It seems that there is something wrong with it. I personally did not notice the same issue with other workers till now. ## Steps to reproduce * Schedule two jobs on grenache-1:14 * Wait for the second job to be picked up after the first one finishes. ## Impact It might not have big impact on large-scale test run overall, but it does affect my verification progress. ## Problem Looks like the worker lingers on some unnoticed work after finishing its work on a job. ## Suggestions * Check worker process/log on worker machine (ps -auxf on grenache-1 or sudo systemctl list-units openqa-worker*) * Do maintenance or cleanup if necessary * Find the relevant code in os-autoinst, e.g. grep for the error messages * *DONE* See if the problem reproduces when setting MAX_JOB_TIME=1 ## Workaround n/a