Project

General

Profile

Actions

action #80408

closed

openQA Project - coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues

openQA Project - coordination #62420: [epic] Distinguish all types of incompletes

revert longer timeout override for openQA services as we could not see less problems with corrupted worker cache

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2020-11-26
Due date:
% Done:

0%

Estimated time:

Description

Motivation

To find out if worker cache services corrupt the sqlite database due to being killed on systemd service termination we enlarged the timeout on o3 and osd of all relevant worker systemd services temporarily in #80106 . As mkittler reported (confirm!) that neither helped with getting rid of corrupted cache nor did it prevent the killing of services but now the shutdown of systems can take much longer as we still have #62441

Acceptance criteria

  • AC1: openQA worker hosts shut down within less than 2m again

Suggestions

Revert all actions from #80106


Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure - action #80106: corrupted worker cache sqlite: Enlarge systemd service kill timeout temporarilyResolvednicksinger

Actions
Actions #1

Updated by okurz over 3 years ago

  • Copied from action #80106: corrupted worker cache sqlite: Enlarge systemd service kill timeout temporarily added
Actions #2

Updated by nicksinger over 3 years ago

  • Assignee set to nicksinger
Actions #3

Updated by nicksinger over 3 years ago

Removed the file from all workers on OSD and reloaded systemd. A quick peak with systemctl cat $service showed success. Now the o3 workers

Actions #4

Updated by nicksinger over 3 years ago

  • Status changed from Workable to Resolved

Also deleted on all o3 workers

Actions

Also available in: Atom PDF