Project

General

Profile

Actions

action #137765

closed

logwarn does not work on new o3 (anymore?) size:M

Added by tinita 5 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
Regressions/Crashes
Target version:
Start date:
2023-10-11
Due date:
% Done:

0%

Estimated time:

Description

Observation

I see 4 errors for today:

% grep error /var/log/openqa
[2023-10-11T09:19:50.194856Z] [error] [pid:32418] Publishing opensuse.openqa.job.done failed: Connect timeout (9 attempts left)
[2023-10-11T09:19:51.918852Z] [error] [pid:32401] Publishing opensuse.openqa.job.done failed: Connect timeout (9 attempts left)
[2023-10-11T14:34:47.825604Z] [error] [pid:30720] Needle file firefox_audio-command-not-found-20231010b.json not found within /var/lib/openqa/share/tests/opensuse/products/opensuse/needles.
[2023-10-11T14:34:47.825784Z] [error] [pid:30720] Needle file firefox_audio-command-not-found-20231010c.json not found within /var/lib/openqa/share/tests/opensuse/products/opensuse/needles.

But I think we never got an email.

I just ran this command manually: LINE_LIMIT=10000 /opt/openqa-logwarn/pretty_logwarn /var/log/openqa but got no output

Acceptance Criteria

  • AC1: logreport emails are known to be sent out on unexpected issues

Suggestions


Related issues 3 (2 open1 closed)

Related to openQA Project - action #105903: o3 logreports - Publishing opensuse.openqa.job.restart failed: Connect timeout (9 attempts left)New2022-02-03

Actions
Related to openQA Project - action #105915: o3 logreports - Needle file <filename>.json not found within /var/.../opensuse/needlesNew2022-02-03

Actions
Related to openQA Infrastructure - action #105828: 4-7 logreport emails a day cause alert fatigue size:MResolvedtinita2022-02-032022-02-17

Actions
Actions #1

Updated by okurz 5 months ago

  • Tags set to infra, o3, email, logwarn
  • Target version set to Ready
Actions #2

Updated by livdywan 5 months ago

  • Subject changed from logwarn does not work on new o3 to logwarn does not work on new o3 (anymore?) size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by jbaier_cz 5 months ago

  • Assignee set to jbaier_cz
Actions #4

Updated by jbaier_cz 4 months ago

  • Status changed from Workable to In Progress
Actions #5

Updated by jbaier_cz 4 months ago

  • Status changed from In Progress to Resolved

Looking at the code and the snippet in description:

[2023-10-11T09:19:50.194856Z] [error] [pid:32418] Publishing opensuse.openqa.job.done failed: Connect timeout (9 attempts left)
[2023-10-11T09:19:51.918852Z] [error] [pid:32401] Publishing opensuse.openqa.job.done failed: Connect timeout (9 attempts left)

disabled by rule '!\[error\].* Publishing .* failed: Connect timeout \([1-9][0-9]* attempts left\)' #105903

[2023-10-11T14:34:47.825604Z] [error] [pid:30720] Needle file firefox_audio-command-not-found-20231010b.json not found within /var/lib/openqa/share/tests/opensuse/products/opensuse/needles.
[2023-10-11T14:34:47.825784Z] [error] [pid:30720] Needle file firefox_audio-command-not-found-20231010c.json not found within /var/lib/openqa/share/tests/opensuse/products/opensuse/needles.

disabled by rule '!\[error\].* Needle file .*.json not found within /.*\.' #105915

To answer the suggestions:

  • it never stopped working
  • logwarn state is handled inside /var/lib/logwarn
  • e-mails from logwarn are delivered (as can be seen in the mailing list)
Actions #6

Updated by jbaier_cz 4 months ago

  • Related to action #105903: o3 logreports - Publishing opensuse.openqa.job.restart failed: Connect timeout (9 attempts left) added
  • Related to action #105915: o3 logreports - Needle file <filename>.json not found within /var/.../opensuse/needles added
  • Related to action #105828: 4-7 logreport emails a day cause alert fatigue size:M added
Actions #7

Updated by tinita 3 months ago

  • Status changed from Resolved to New
  • Assignee deleted (jbaier_cz)

Today I did this on o3:

/usr/share/openqa/script/openqa eval -V 'OpenQA::Log::log_error("tina testing logwarn")'

It appeared in /var/log/openqa:

[2023-11-16T14:14:58.779833Z] [error] [pid:14509] tina testing logwarn

But we never got an email.
See also #150908 where we weren't notified.
I tested the logwarn script explicitly with an example file and it reported the expected lines.
So am I missing something again or is it really not working now? The last logwarn email is from Thu, 09 Nov 2023 00:10:02 +0000

Actions #8

Updated by okurz 3 months ago

You could check the local email queue or system journal

Actions #9

Updated by jbaier_cz 3 months ago

Let me guess, you were expecting a mail around 14:10, so it could be this one, right?

145751929E      816 Thu Nov 16 13:10:03  o3-admins@opensuse.org
  (connect to relay.infra.opensuse.org[192.168.47.4]:25: Connection timed out)
                                         o3-admins@suse.de

Actions #10

Updated by tinita 3 months ago

jbaier_cz wrote in #note-9:

Let me guess, you were expecting a mail around 14:10, so it could be this one, right?

145751929E      816 Thu Nov 16 13:10:03  o3-admins@opensuse.org
  (connect to relay.infra.opensuse.org[192.168.47.4]:25: Connection timed out)
                                         o3-admins@suse.de

Yes, where do you see that? I checked /var/mail/root and didn't see any error

Actions #11

Updated by tinita 3 months ago

Ah, in the journal:

Nov 16 16:02:12 new-ariel postfix/smtp[7886]: connect to relay.infra.opensuse.org[192.168.47.4]:25: Connection timed out
Actions #12

Updated by jbaier_cz 3 months ago

  • Status changed from New to Resolved

This is a completely new issue after the o3 migration caused by old-ariel decommission. I created #150956

Actions

Also available in: Atom PDF