action #135482
closedcoordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances
coordination #135122: [epic] OSD openQA refuses to assign jobs, >3k scheduled not being picked up, no alert
Move to systemd journal only on o3+osd (was: Missing openqa_websockets log file on OSD for websocket server) size:M
0%
Description
Motivation¶
While investigating #135122 we noticed that there is currently no log file for the websocket server. Despite one existing for each of the other openQA services on the OSD webui (/var/log/openqa
, /var/log/openqa_gru
, /var/log/openqa_scheduler
). This is currently getting in the way because we can't just add new log messages to the websocket server to help with debugging. Searching the journal for specific log messages is almost impossible since it is too slow.
Acceptance criteria¶
- AC1: We use system journal only for o3+osd or we know why we don't
Suggestions¶
- Optional: Research in old tickets why we chose explicit log files over just trusting systemd journal
- As by default openQA and all related tooling already just use the systemd service we shouldn't need to implement anything in upstream openQA itself, just change the config accordingly
- Just disable log files for all openQA related services on o3 and see what happens
- After positive result do the same for OSD
- Ensure that all openQA related services still run as expected
- Ensure that our system journal shows results from all according openQA services for a sufficient amount of time, at least 7 days or so
- Look into how logwarn can access the journal, either just configure journald to write to a logfile and point logwarn to that, or if it's too much effort create dedicated ticket
Updated by okurz over 1 year ago
- Category set to Feature requests
- Assignee set to okurz
There is journalctl -u openqa-websockets
. That should be enough, isn't it?
Updated by tinita over 1 year ago
okurz wrote in #note-4:
There is
journalctl -u openqa-websockets
. That should be enough, isn't it?
I'm sure I had checked this but didn't see the log messages. Now I can see it, so everything ok. I saved the current journal which starts at Sep 4 so we have the historical data to compare the number of worker status updates to the occurrence of the problem.
Updated by kraih over 1 year ago
Actually, i wasn't aware that the journalctl -g ...
grep option was fast enough for us to use with OSD, but even for openqa-webui it seems to work fine. So this ticket can probably be rejected.
Updated by okurz over 1 year ago
kraih wrote in #note-6:
Actually, i wasn't aware that the
journalctl -g ...
grep option was fast enough for us to use with OSD, but even for openqa-webui it seems to work fine. So this ticket can probably be rejected.
oh, nice! This is why I think it's a good idea to migrate more and more to systemd journal.
Updated by okurz over 1 year ago
- Assignee deleted (
okurz)
So I assume you are ok to accept the journal solution for now. I guess we can re-consider moving webui and gru also to systemd journal. Or at least research why we use separate log files.
Updated by kraih over 1 year ago
okurz wrote in #note-8:
So I assume you are ok to accept the journal solution for now. I guess we can re-consider moving webui and gru also to systemd journal. Or at least research why we use separate log files.
Yes, works for me.
Updated by okurz over 1 year ago
- Subject changed from Missing openqa_websockets log file on OSD for websocket server to Move to systemd journal only (was: Missing openqa_websockets log file on OSD for websocket server)
- Target version changed from Ready to future
Updated by okurz about 1 year ago
- Target version changed from future to Tools - Next
Updated by okurz about 1 year ago
- Related to action #137813: [alert] Failed systemd services - qamaster - logrotate fails on /var/log/messages with "/usr/bin/xz: (stdin): Read error: Input/output error" size:S added
Updated by okurz about 1 year ago
- Target version changed from Tools - Next to Ready
Updated by okurz about 1 year ago
- Tags changed from reactive work to reactive work, infra
Updated by okurz about 1 year ago
- Subject changed from Move to systemd journal only (was: Missing openqa_websockets log file on OSD for websocket server) to Move to systemd journal only on o3+osd (was: Missing openqa_websockets log file on OSD for websocket server) size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz about 1 year ago
- Status changed from Workable to In Progress
- Assignee set to okurz
Updated by okurz about 1 year ago
- Status changed from In Progress to Rejected
I did a bit of research and found no good best practices or suggestions how we would be able to easily integrate logwarn. I thought maybe we can just configure journald to write to a file. But then we would need to also ensure logrotation on that file. We could forward to a syslogger but then we would have double the data and as we have quite big logfiles already I don't think we should duplicate all the log messages. With that I don't think it's worth to do it and we should live with the mixture that we have. Please speak up if you think otherwise.
Updated by tinita about 1 year ago
We could try out https://opensource.com/article/20/7/systemd-journals-email as a replacement for logwarn.