Project

General

Profile

Actions

action #106880

closed

Job template name ... is already used in job group error logged on o3 size:M

Added by livdywan almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2022-02-16
Due date:
% Done:

0%

Estimated time:

Description

Observation

[2022-02-16T06:49:08.407202Z] [error] Job template name 'security_tpm2_swtpm' with opensuse-Tumbleweed-DVD-x86_64 and 64bit is already used in job group 'Development Tumbleweed'

Acceptance criteria

  • AC1: User errors are not logged (only reported in the ui to the user)
  • AC2: Internal error messages are not shown to the user (just generic messages)

Suggestions

  • Remove the error on the openQA side, since this looks like it should be surfaced in API responses and UX as a user error
  • Distinguish different error classes that are both logged and used in API error responses

Related issues 3 (0 open3 closed)

Related to openQA Infrastructure (public) - action #105828: 4-7 logreport emails a day cause alert fatigue size:MResolvedtinita2022-02-032022-02-17

Actions
Related to openQA Project (public) - action #105924: o3 logreports - Template was modifiedRejectedmkittler2022-02-03

Actions
Related to openQA Project (public) - action #106245: o3 logreports - Testsuite 'xyz' is invalidRejectedmkittler

Actions
Actions #1

Updated by okurz almost 3 years ago

  • Priority changed from Normal to High
  • Target version set to Ready

Adding to backlog with "High" to address urgency of alerting, i.e. exclude from alerting with openqa-logwarn

Actions #2

Updated by livdywan almost 3 years ago

okurz wrote:

Adding to backlog with "High" to address urgency of alerting, i.e. exclude from alerting with openqa-logwarn

Imho dropping the message from openQA is what we should do right away if we agree that it's the right solution, otherwise we're just doubling the work

Actions #3

Updated by tinita almost 3 years ago

cdywan wrote:

Imho dropping the message from openQA is what we should do right away if we agree that it's the right solution, otherwise we're just doubling the work

I think that might not be trivial though.
The message comes from a die in create_or_update_job_template, and it's called inside a try block, and the catch block collects all errors, logs them and returns them via API.

We still should log unexpected errors (e.g. from the database), but errors like this should not be logged.

So we should differentiate between user errors and unexpected errors.

Actions #4

Updated by tinita almost 3 years ago

  • Related to action #105828: 4-7 logreport emails a day cause alert fatigue size:M added
Actions #5

Updated by livdywan almost 3 years ago

  • Subject changed from Job template name ... is already used in job group error logged on o3 to Job template name ... is already used in job group error logged on o3 size:M
  • Description updated (diff)
Actions #6

Updated by livdywan almost 3 years ago

  • Description updated (diff)
  • Status changed from New to Workable
Actions #7

Updated by tinita almost 3 years ago

  • Related to action #105909: o3 logreports - Ignoring invalid group {"name":"123"} when creating new job added
Actions #8

Updated by tinita almost 3 years ago

#105909 is likely the same issue

Actions #9

Updated by tinita almost 3 years ago

  • Related to action #105924: o3 logreports - Template was modified added
Actions #10

Updated by tinita almost 3 years ago

Also #105924

Actions #11

Updated by tinita almost 3 years ago

  • Related to action #106245: o3 logreports - Testsuite 'xyz' is invalid added
Actions #12

Updated by tinita almost 3 years ago

aaand #106245

Actions #13

Updated by mkittler almost 3 years ago

  • Assignee set to mkittler
Actions #14

Updated by mkittler almost 3 years ago

  • Status changed from Workable to In Progress

Draft: https://github.com/os-autoinst/openQA/pull/4520 (let's see what test it'll break)

Actions #15

Updated by openqa_review almost 3 years ago

  • Due date set to 2022-03-08

Setting due date based on mean cycle time of SUSE QE Tools

Actions #16

Updated by mkittler almost 3 years ago

  • Status changed from In Progress to Feedback
Actions #17

Updated by tinita almost 3 years ago

  • Related to deleted (action #105909: o3 logreports - Ignoring invalid group {"name":"123"} when creating new job)
Actions #18

Updated by okurz almost 3 years ago

  • Due date deleted (2022-03-08)
  • Status changed from Feedback to Resolved

https://github.com/os-autoinst/openQA/pull/4520 is merged as well as your https://github.com/os-autoinst/openqa-logwarn/pull/28
As the logwarn change is deployed within minutes I triggered a manual deployment on o3 (zypper dup) so that we will not run into this message overnight. I tested the changed functionality by trying to put duplicate job template definitions into https://openqa.opensuse.org/admin/job_templates/74 ("Development Other") of a job template that is already defined in https://openqa.opensuse.org/admin/job_templates/38 ("Development Tumbleweed"):

defaults:
  x86_64:
    machine: 64bit
    priority: 55
products:
  opensuse-Tumbleweed-KDE-Live-x86_64:
    distri: opensuse
    flavor: KDE-Live
    version: Tumbleweed
scenarios:
  x86_64:
    opensuse-Tumbleweed-KDE-Live-x86_64:
    - kde_live_upgrade_leap_15.2:
        machine: uefi

and I received no log message in /var/log/openqa at all but a good message in the UI telling us:

There was a problem applying the changes:
Job template name 'kde_live_upgrade_leap_15.2' with opensuse-Tumbleweed-KDE-Live-x86_64 and uefi is already used in job group 'Development Tumbleweed'

so I consider this story successfully completed as well \o/

Actions

Also available in: Atom PDF