action #170209
Updated by gpuliti about 1 month ago
# Observation I received an alert mail about an incomplete job: https://openqa.suse.de/tests/15996418 It fails with: ``` [2024-11-25T00:19:57.525877Z] [warn] [pid:103270] !!! : qemu-system-x86_64: -vnc :102,share=force-shared: Failed to find an availabale port: Address already in use ``` I asked in [Slack](https://suse.slack.com/archives/C02AJ1E568M/p1732530369296399) (Slack)[https://suse.slack.com/archives/C02AJ1E568M/p1732530369296399] and @tinita observed the same since "a few hours" - apparently all on worker39 https://openqa.suse.de/admin/workers/2898 shows that this started about 12 hours ago with this job: https://openqa.suse.de/tests/15995445 The `qemu-system-x86_64` process 1963 is running since 12 hours with `-vnc :102,share=force-shared`. ## Acceptance criteria * **AC1:** Affected jobs are restarted automatically * **AC2:** We have a better understanding of situations where this can happen (if at all) # Suggestion * Check one more time for bugs – also consider testing (!) – in the code for handling leftover QEMU processes * Check one more time for bugs – also consider testing (!) – in terminating/killing the process group of isotovideo (in Mojo::…::ReadWriteProcess) * Add/enable debug logging when starting/stopping isotovideo (maybe on ReadWriteProcess level) * Consider starting/stopping isotovideo in a process group with low-level Perl code to replicate the error and investigate and potentially replace the problematic Mojo::ReadWriteProcess?