action #137765
closedlogwarn does not work on new o3 (anymore?) size:M
0%
Description
Observation¶
I see 4 errors for today:
% grep error /var/log/openqa
[2023-10-11T09:19:50.194856Z] [error] [pid:32418] Publishing opensuse.openqa.job.done failed: Connect timeout (9 attempts left)
[2023-10-11T09:19:51.918852Z] [error] [pid:32401] Publishing opensuse.openqa.job.done failed: Connect timeout (9 attempts left)
[2023-10-11T14:34:47.825604Z] [error] [pid:30720] Needle file firefox_audio-command-not-found-20231010b.json not found within /var/lib/openqa/share/tests/opensuse/products/opensuse/needles.
[2023-10-11T14:34:47.825784Z] [error] [pid:30720] Needle file firefox_audio-command-not-found-20231010c.json not found within /var/lib/openqa/share/tests/opensuse/products/opensuse/needles.
But I think we never got an email.
I just ran this command manually: LINE_LIMIT=10000 /opt/openqa-logwarn/pretty_logwarn /var/log/openqa
but got no output
Acceptance Criteria¶
- AC1: logreport emails are known to be sent out on unexpected issues
Suggestions¶
- Confirm when it stopped working - did it ever work on the "new" o3?
- https://mailman.suse.de/mlarch/SuSE/o3-admins/2023/o3-admins.2023.09/msg00042.html Thu, 14 Sep 2023 14:10:03 +0000 but likely it's from old-ariel
- Find out where logwarn saves its state
- Try out email sending from the according account on o3
Updated by okurz about 1 year ago
- Tags set to infra, o3, email, logwarn
- Target version set to Ready
Updated by livdywan about 1 year ago
- Subject changed from logwarn does not work on new o3 to logwarn does not work on new o3 (anymore?) size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by jbaier_cz about 1 year ago
- Status changed from Workable to In Progress
Updated by jbaier_cz about 1 year ago
- Status changed from In Progress to Resolved
Looking at the code and the snippet in description:
[2023-10-11T09:19:50.194856Z] [error] [pid:32418] Publishing opensuse.openqa.job.done failed: Connect timeout (9 attempts left)
[2023-10-11T09:19:51.918852Z] [error] [pid:32401] Publishing opensuse.openqa.job.done failed: Connect timeout (9 attempts left)
disabled by rule '!\[error\].* Publishing .* failed: Connect timeout \([1-9][0-9]* attempts left\)'
#105903
[2023-10-11T14:34:47.825604Z] [error] [pid:30720] Needle file firefox_audio-command-not-found-20231010b.json not found within /var/lib/openqa/share/tests/opensuse/products/opensuse/needles.
[2023-10-11T14:34:47.825784Z] [error] [pid:30720] Needle file firefox_audio-command-not-found-20231010c.json not found within /var/lib/openqa/share/tests/opensuse/products/opensuse/needles.
disabled by rule '!\[error\].* Needle file .*.json not found within /.*\.'
#105915
To answer the suggestions:
- it never stopped working
- logwarn state is handled inside
/var/lib/logwarn
- e-mails from logwarn are delivered (as can be seen in the mailing list)
Updated by jbaier_cz about 1 year ago
- Related to action #105903: o3 logreports - Publishing opensuse.openqa.job.restart failed: Connect timeout (9 attempts left) added
- Related to action #105915: o3 logreports - Needle file <filename>.json not found within /var/.../opensuse/needles added
- Related to action #105828: 4-7 logreport emails a day cause alert fatigue size:M added
Updated by tinita about 1 year ago
- Status changed from Resolved to New
- Assignee deleted (
jbaier_cz)
Today I did this on o3:
/usr/share/openqa/script/openqa eval -V 'OpenQA::Log::log_error("tina testing logwarn")'
It appeared in /var/log/openqa
:
[2023-11-16T14:14:58.779833Z] [error] [pid:14509] tina testing logwarn
But we never got an email.
See also #150908 where we weren't notified.
I tested the logwarn script explicitly with an example file and it reported the expected lines.
So am I missing something again or is it really not working now? The last logwarn email is from Thu, 09 Nov 2023 00:10:02 +0000
Updated by okurz about 1 year ago
You could check the local email queue or system journal
Updated by jbaier_cz about 1 year ago
Let me guess, you were expecting a mail around 14:10, so it could be this one, right?
145751929E 816 Thu Nov 16 13:10:03 o3-admins@opensuse.org
(connect to relay.infra.opensuse.org[192.168.47.4]:25: Connection timed out)
o3-admins@suse.de
Updated by tinita about 1 year ago
jbaier_cz wrote in #note-9:
Let me guess, you were expecting a mail around 14:10, so it could be this one, right?
145751929E 816 Thu Nov 16 13:10:03 o3-admins@opensuse.org (connect to relay.infra.opensuse.org[192.168.47.4]:25: Connection timed out) o3-admins@suse.de
Yes, where do you see that? I checked /var/mail/root and didn't see any error
Updated by tinita about 1 year ago
Ah, in the journal:
Nov 16 16:02:12 new-ariel postfix/smtp[7886]: connect to relay.infra.opensuse.org[192.168.47.4]:25: Connection timed out
Updated by jbaier_cz about 1 year ago
- Status changed from New to Resolved
This is a completely new issue after the o3 migration caused by old-ariel decommission. I created #150956