action #135482
closed
coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances
coordination #135122: [epic] OSD openQA refuses to assign jobs, >3k scheduled not being picked up, no alert
Move to systemd journal only on o3+osd (was: Missing openqa_websockets log file on OSD for websocket server) size:M
Added by kraih about 1 year ago.
Updated 12 months ago.
Category:
Feature requests
Description
Motivation¶
While investigating #135122 we noticed that there is currently no log file for the websocket server. Despite one existing for each of the other openQA services on the OSD webui (/var/log/openqa
, /var/log/openqa_gru
, /var/log/openqa_scheduler
). This is currently getting in the way because we can't just add new log messages to the websocket server to help with debugging. Searching the journal for specific log messages is almost impossible since it is too slow.
Acceptance criteria¶
- AC1: We use system journal only for o3+osd or we know why we don't
Suggestions¶
- Optional: Research in old tickets why we chose explicit log files over just trusting systemd journal
- As by default openQA and all related tooling already just use the systemd service we shouldn't need to implement anything in upstream openQA itself, just change the config accordingly
- Just disable log files for all openQA related services on o3 and see what happens
- After positive result do the same for OSD
- Ensure that all openQA related services still run as expected
- Ensure that our system journal shows results from all according openQA services for a sufficient amount of time, at least 7 days or so
- Look into how logwarn can access the journal, either just configure journald to write to a logfile and point logwarn to that, or if it's too much effort create dedicated ticket
- Description updated (diff)
- Description updated (diff)
- Description updated (diff)
- Category set to Feature requests
- Assignee set to okurz
There is journalctl -u openqa-websockets
. That should be enough, isn't it?
okurz wrote in #note-4:
There is journalctl -u openqa-websockets
. That should be enough, isn't it?
I'm sure I had checked this but didn't see the log messages. Now I can see it, so everything ok. I saved the current journal which starts at Sep 4 so we have the historical data to compare the number of worker status updates to the occurrence of the problem.
Actually, i wasn't aware that the journalctl -g ...
grep option was fast enough for us to use with OSD, but even for openqa-webui it seems to work fine. So this ticket can probably be rejected.
kraih wrote in #note-6:
Actually, i wasn't aware that the journalctl -g ...
grep option was fast enough for us to use with OSD, but even for openqa-webui it seems to work fine. So this ticket can probably be rejected.
oh, nice! This is why I think it's a good idea to migrate more and more to systemd journal.
So I assume you are ok to accept the journal solution for now. I guess we can re-consider moving webui and gru also to systemd journal. Or at least research why we use separate log files.
okurz wrote in #note-8:
So I assume you are ok to accept the journal solution for now. I guess we can re-consider moving webui and gru also to systemd journal. Or at least research why we use separate log files.
Yes, works for me.
- Subject changed from Missing openqa_websockets log file on OSD for websocket server to Move to systemd journal only (was: Missing openqa_websockets log file on OSD for websocket server)
- Target version changed from Ready to future
- Target version changed from future to Tools - Next
- Related to action #137813: [alert] Failed systemd services - qamaster - logrotate fails on /var/log/messages with "/usr/bin/xz: (stdin): Read error: Input/output error" size:S added
- Target version changed from Tools - Next to Ready
- Tags changed from reactive work to reactive work, infra
- Subject changed from Move to systemd journal only (was: Missing openqa_websockets log file on OSD for websocket server) to Move to systemd journal only on o3+osd (was: Missing openqa_websockets log file on OSD for websocket server) size:M
- Description updated (diff)
- Status changed from New to Workable
- Status changed from Workable to In Progress
- Assignee set to okurz
- Status changed from In Progress to Rejected
I did a bit of research and found no good best practices or suggestions how we would be able to easily integrate logwarn. I thought maybe we can just configure journald to write to a file. But then we would need to also ensure logrotation on that file. We could forward to a syslogger but then we would have double the data and as we have quite big logfiles already I don't think we should duplicate all the log messages. With that I don't think it's worth to do it and we should live with the mixture that we have. Please speak up if you think otherwise.
Also available in: Atom
PDF