Project

General

Profile

Actions

action #135482

closed

coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

coordination #135122: [epic] OSD openQA refuses to assign jobs, >3k scheduled not being picked up, no alert

Move to systemd journal only on o3+osd (was: Missing openqa_websockets log file on OSD for websocket server) size:M

Added by kraih over 1 year ago. Updated about 1 year ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2023-09-11
Due date:
% Done:

0%

Estimated time:

Description

Motivation

While investigating #135122 we noticed that there is currently no log file for the websocket server. Despite one existing for each of the other openQA services on the OSD webui (/var/log/openqa, /var/log/openqa_gru, /var/log/openqa_scheduler). This is currently getting in the way because we can't just add new log messages to the websocket server to help with debugging. Searching the journal for specific log messages is almost impossible since it is too slow.

Acceptance criteria

  • AC1: We use system journal only for o3+osd or we know why we don't

Suggestions

  • Optional: Research in old tickets why we chose explicit log files over just trusting systemd journal
  • As by default openQA and all related tooling already just use the systemd service we shouldn't need to implement anything in upstream openQA itself, just change the config accordingly
  • Just disable log files for all openQA related services on o3 and see what happens
  • After positive result do the same for OSD
  • Ensure that all openQA related services still run as expected
  • Ensure that our system journal shows results from all according openQA services for a sufficient amount of time, at least 7 days or so
  • Look into how logwarn can access the journal, either just configure journald to write to a logfile and point logwarn to that, or if it's too much effort create dedicated ticket

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #137813: [alert] Failed systemd services - qamaster - logrotate fails on /var/log/messages with "/usr/bin/xz: (stdin): Read error: Input/output error" size:SResolvedjbaier_cz2023-10-07

Actions
Actions

Also available in: Atom PDF