action #80408
openQA Project - coordination #39719: [saga][epic] Detect "known failures" and mark jobs as such to make tests more stable, reviewing test results and tracking known issues easier
openQA Project - coordination #62420: [epic] Distinguish all types of incompletes
revert longer timeout override for openQA services as we could not see less problems with corrupted worker cache
0%
Description
Motivation¶
To find out if worker cache services corrupt the sqlite database due to being killed on systemd service termination we enlarged the timeout on o3 and osd of all relevant worker systemd services temporarily in #80106 . As mkittler reported (confirm!) that neither helped with getting rid of corrupted cache nor did it prevent the killing of services but now the shutdown of systems can take much longer as we still have #62441
Acceptance criteria¶
- AC1: openQA worker hosts shut down within less than 2m again
Suggestions¶
Revert all actions from #80106
Related issues
History
#1
Updated by okurz about 2 months ago
- Copied from action #80106: corrupted worker cache sqlite: Enlarge systemd service kill timeout temporarily added
#2
Updated by nicksinger about 2 months ago
- Assignee set to nicksinger
#3
Updated by nicksinger about 2 months ago
Removed the file from all workers on OSD and reloaded systemd. A quick peak with systemctl cat $service
showed success. Now the o3 workers
#4
Updated by nicksinger about 2 months ago
- Status changed from Workable to Resolved
Also deleted on all o3 workers