Project

General

Profile

action #106880

Job template name ... is already used in job group error logged on o3 size:M

Added by cdywan 6 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2022-02-16
Due date:
% Done:

0%

Estimated time:

Description

Observation

[2022-02-16T06:49:08.407202Z] [error] Job template name 'security_tpm2_swtpm' with opensuse-Tumbleweed-DVD-x86_64 and 64bit is already used in job group 'Development Tumbleweed'

Acceptance criteria

  • AC1: User errors are not logged (only reported in the ui to the user)
  • AC2: Internal error messages are not shown to the user (just generic messages)

Suggestions

  • Remove the error on the openQA side, since this looks like it should be surfaced in API responses and UX as a user error
  • Distinguish different error classes that are both logged and used in API error responses

Related issues

Related to openQA Infrastructure - action #105828: 4-7 logreport emails a day cause alert fatigue size:MResolved2022-02-032022-02-17

Related to openQA Project - action #105924: o3 logreports - Template was modifiedRejected2022-02-03

Related to openQA Project - action #106245: o3 logreports - Testsuite 'xyz' is invalidRejected

History

#1 Updated by okurz 6 months ago

  • Priority changed from Normal to High
  • Target version set to Ready

Adding to backlog with "High" to address urgency of alerting, i.e. exclude from alerting with openqa-logwarn

#2 Updated by cdywan 6 months ago

okurz wrote:

Adding to backlog with "High" to address urgency of alerting, i.e. exclude from alerting with openqa-logwarn

Imho dropping the message from openQA is what we should do right away if we agree that it's the right solution, otherwise we're just doubling the work

#3 Updated by tinita 6 months ago

cdywan wrote:

Imho dropping the message from openQA is what we should do right away if we agree that it's the right solution, otherwise we're just doubling the work

I think that might not be trivial though.
The message comes from a die in create_or_update_job_template, and it's called inside a try block, and the catch block collects all errors, logs them and returns them via API.

We still should log unexpected errors (e.g. from the database), but errors like this should not be logged.

So we should differentiate between user errors and unexpected errors.

#4 Updated by tinita 6 months ago

  • Related to action #105828: 4-7 logreport emails a day cause alert fatigue size:M added

#5 Updated by cdywan 6 months ago

  • Subject changed from Job template name ... is already used in job group error logged on o3 to Job template name ... is already used in job group error logged on o3 size:M
  • Description updated (diff)

#6 Updated by cdywan 6 months ago

  • Description updated (diff)
  • Status changed from New to Workable

#7 Updated by tinita 6 months ago

  • Related to action #105909: o3 logreports - Ignoring invalid group {"name":"123"} when creating new job added

#8 Updated by tinita 6 months ago

#105909 is likely the same issue

#9 Updated by tinita 6 months ago

  • Related to action #105924: o3 logreports - Template was modified added

#10 Updated by tinita 6 months ago

Also #105924

#11 Updated by tinita 6 months ago

  • Related to action #106245: o3 logreports - Testsuite 'xyz' is invalid added

#12 Updated by tinita 6 months ago

aaand #106245

#13 Updated by mkittler 6 months ago

  • Assignee set to mkittler

#14 Updated by mkittler 6 months ago

  • Status changed from Workable to In Progress

Draft: https://github.com/os-autoinst/openQA/pull/4520 (let's see what test it'll break)

#15 Updated by openqa_review 6 months ago

  • Due date set to 2022-03-08

Setting due date based on mean cycle time of SUSE QE Tools

#16 Updated by mkittler 6 months ago

  • Status changed from In Progress to Feedback

#17 Updated by tinita 6 months ago

  • Related to deleted (action #105909: o3 logreports - Ignoring invalid group {"name":"123"} when creating new job)

#18 Updated by okurz 6 months ago

  • Due date deleted (2022-03-08)
  • Status changed from Feedback to Resolved

https://github.com/os-autoinst/openQA/pull/4520 is merged as well as your https://github.com/os-autoinst/openqa-logwarn/pull/28
As the logwarn change is deployed within minutes I triggered a manual deployment on o3 (zypper dup) so that we will not run into this message overnight. I tested the changed functionality by trying to put duplicate job template definitions into https://openqa.opensuse.org/admin/job_templates/74 ("Development Other") of a job template that is already defined in https://openqa.opensuse.org/admin/job_templates/38 ("Development Tumbleweed"):

defaults:
  x86_64:
    machine: 64bit
    priority: 55
products:
  opensuse-Tumbleweed-KDE-Live-x86_64:
    distri: opensuse
    flavor: KDE-Live
    version: Tumbleweed
scenarios:
  x86_64:
    opensuse-Tumbleweed-KDE-Live-x86_64:
    - kde_live_upgrade_leap_15.2:
        machine: uefi

and I received no log message in /var/log/openqa at all but a good message in the UI telling us:

There was a problem applying the changes:
Job template name 'kde_live_upgrade_leap_15.2' with opensuse-Tumbleweed-KDE-Live-x86_64 and uefi is already used in job group 'Development Tumbleweed'

so I consider this story successfully completed as well \o/

Also available in: Atom PDF