action #133457
Updated by okurz over 1 year ago
## Observation https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1714239 ``` Name: /etc/systemd/system/auto-update.service - Function: file.managed - Result: Clean - Started: 21:29:26.689214 - Duration: 359.255 ms Name: service.systemctl_reload - Function: module.run - Result: Clean - Started: 21:29:27.053802 - Duration: 0.018 ms Name: auto-upgrade.service - Function: service.dead - Result: Clean - Started: 21:29:27.054218 - Duration: 61.444 ms Name: auto-upgrade.timer - Function: service.dead - Result: Clean - Started: 21:29:27.116368 - Duration: 82.058 ms Name: auto-update.timer - Function: service.running - Result: Clean - Started: 21:29:27.203488 - Duration: 255.774 ms Summary for openqa.suse.de -------------- Succeeded: 345 (changed=30) Failed: 0 -------------- Total states run: 345 Total run time: 383.468 s.++ echo -n . ++ true ++ sleep 1 .++ echo -n . [...] ++ true ++ sleep 1 .++ echo -n . ++ true ++ sleep 1 ERROR: Job failed: execution took longer than 2h0m0s seconds took longer than 2h0m0s seconds ``` ## Acceptance criteria * **AC1:** jobs commonly don't run into the 2h gitlab CI timeout * **AC2:** We can identify the faulty salt minion (because very likely it's one of those being stuck) ## Suggestions * look up an older ticket and read what we did there about this * check if there are actually artifacts uploaded or not * check if machines can be reached over salt * check usual runtimes of salt state apply * try if it is reproducible * research upstream if there is anything better we can do to prevent to run into the seemingly hardcoded gitlab 2h timeout * run the internal command of salt apply with a timeout well below 2h, e.g. in https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/deploy.yml#L43 just prepend "timeout 1h …" -