action #12178
Updated by okurz almost 8 years ago
## observation worker does not finish job on user_cancel, e.g. see http://lord.arch/tests/270, one module reported as "passed", next one as "running" but the job itself is "user_cancelled". output from worker log: ``` POST http://localhost/api/v1/jobs/270/status stopping livelog ## changing timer update_status ## removing timer update_status ## adding timer update_status 10 ... starting livelog ## changing timer update_status ## removing timer update_status ## adding timer update_status 0.5 checking backend state ... waitpid 3521 returned 0 updating status POST http://localhost/api/v1/jobs/270/status ... checking backend state ... waitpid 3521 returned 0 ... POST http://localhost/api/v1/jobs/270/status received command: cancelstop_job cancel ## removing timer update_status ## removing timer check_backend ## removing timer job_timeout killing 3521 ``` then hangs. strace on process reveals that one "isotovideo" subprocess is still in a loop, see attached strace dump. ## steps to reproduce TBC, maybe call "user_cancel" very often ## problem race condition on shutdown ## suggestion * check shutdown procedure for correctness * KILL subprocesses after TERM + timeout