Project

General

Profile

action #80408

openQA Project - coordination #39719: [saga][epic] Detect "known failures" and mark jobs as such to make tests more stable, reviewing test results and tracking known issues easier

openQA Project - coordination #62420: [epic] Distinguish all types of incompletes

revert longer timeout override for openQA services as we could not see less problems with corrupted worker cache

Added by okurz about 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2020-11-26
Due date:
% Done:

0%

Estimated time:

Description

Motivation

To find out if worker cache services corrupt the sqlite database due to being killed on systemd service termination we enlarged the timeout on o3 and osd of all relevant worker systemd services temporarily in #80106 . As mkittler reported (confirm!) that neither helped with getting rid of corrupted cache nor did it prevent the killing of services but now the shutdown of systems can take much longer as we still have #62441

Acceptance criteria

  • AC1: openQA worker hosts shut down within less than 2m again

Suggestions

Revert all actions from #80106


Related issues

Copied from openQA Infrastructure - action #80106: corrupted worker cache sqlite: Enlarge systemd service kill timeout temporarilyResolved

History

#1 Updated by okurz about 2 months ago

  • Copied from action #80106: corrupted worker cache sqlite: Enlarge systemd service kill timeout temporarily added

#2 Updated by nicksinger about 2 months ago

  • Assignee set to nicksinger

#3 Updated by nicksinger about 2 months ago

Removed the file from all workers on OSD and reloaded systemd. A quick peak with systemctl cat $service showed success. Now the o3 workers

#4 Updated by nicksinger about 2 months ago

  • Status changed from Workable to Resolved

Also deleted on all o3 workers

Also available in: Atom PDF