action #122308: Handle invalid openQA job references in qem-dashboard size:M - QA (public) - openSUSE Project Management Tool

Actions

Copy link

#1

Updated by okurz almost 2 years ago

Parent task set to #80194

Actions

Copy link

#2

Updated by okurz almost 2 years ago

Copied to action #122311: Use live openQA test results instead of inconsistent qem-dashboard database in qem-bot approver added

Actions

Copy link

#3

Updated by okurz almost 2 years ago

Apparently the "job_id" in case of 57292 is an id in the table "update_openqa_settings". So we can reference back to the job with

dashboard_db=# select job_id from openqa_jobs where update_settings=57292;
  job_id  
----------
 10217371
 10217368
 10217365
…
 10217373

so it looks like what we understand as "job_id" something that can either be an openQA job id or just a reference to a settings table that again references openQA jobs, weird design choice.

Actions

Copy link

#4

Updated by mkittler almost 2 years ago

Our starting point was the log message "Found failed …" so I've checked the bot's code base where it occurs. It looks like this kind of ID is either an incident_openqa_settings ID or an update_openqa_settings ID but not an openQA job ID. However, that makes me quite confused about my understanding of the code base. In particular, it means the comment-lookup feature I've once introduced cannot actually work because it isn't using an openQA job ID (the is_job_marked_acceptable_for_incident function is basically broken if that's correct). The log message should also be improved to state what kind of ID is logged there because "job" is highly ambiguous. The code should also have a comment where JobAggr is defined what the job_id is.

Actions

Copy link

#5

Updated by okurz almost 2 years ago

Related to action #107923: qem-bot: Ignore not-ok openQA jobs for specific incident based on openQA job comment size:M added

Actions

Copy link

#6

Updated by okurz almost 2 years ago

Subject changed from Handle non-existant openQA job references in qem-dashboard to Handle invalid openQA job references in qem-dashboard size:M
Description updated (diff)
Priority changed from High to Normal

Actions

Copy link

#7

Updated by okurz almost 2 years ago

Status changed from New to Workable

Actions

Copy link

#8

Updated by jbaier_cz almost 2 years ago

Assignee set to jbaier_cz

Actions

Copy link

#9

Updated by jbaier_cz almost 2 years ago

Status changed from Workable to In Progress

Indeed there is a confusion in the naming. Apparently, we are overusing the term job. Actually all lines related to a "job" and outputting its id are in fact printing id for dashboard entity JobAggr, which is just a helper object for N:M mapping between maintenance incident and openQA job. In short, the logged number is openqa_jobs.id, what we want is openqa_jobs.job_id. In the current qem-bot code, this is not fetched at all (as it is practically not needed, the only important is the result of that job). This can be also seen by enhancing the tests: https://github.com/openSUSE/qem-bot/commit/817a92224c9ac934c40a3307b46996252b2549b5

As we actually need the openQA job id for #107923, I will proceed with modifying the current code to retain this information.

Actions

Copy link

#10

Updated by openqa_review almost 2 years ago

Due date set to 2023-01-07

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

#11

Updated by livdywan almost 2 years ago

Unfortunately the test coverage doesn't seem to reflect what we need in production and it's now failing:

Traceback (most recent call last):
  File "./qem-bot/bot-ng.py", line 7, in <module>
    main()
  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/main.py", line 43, in main
    sys.exit(cfg.func(cfg))
  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/args.py", line 49, in do_approve
    return approve()
  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/approver.py", line 69, in __call__
    incidents_to_approve = [inc for inc in increqs if self._approvable(inc)]
  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/approver.py", line 69, in <listcomp>
    incidents_to_approve = [inc for inc in increqs if self._approvable(inc)]
  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/approver.py", line 86, in _approvable
    i_jobs = get_incident_settings(inc.inc, self.token, self.all_incidents)
  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/loader/qem.py", line 95, in get_incident_settings
    return [JobAggr(i["id"], i["job_id"], False, i["withAggregate"]) for i in settings]
  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/loader/qem.py", line 95, in <listcomp>
    return [JobAggr(i["id"], i["job_id"], False, i["withAggregate"]) for i in settings]
KeyError: 'job_id'

Actions

Copy link

#12

Updated by jbaier_cz almost 2 years ago

Stopping the pipeline or temporary reverting the PR would be probably a good idea in this case; it seems that in real data, there are some entries without job_id. I suspect that in this case, it is not the coverage what is wrong, we might have too ideal test data.

Actions

Copy link

#13

Updated by okurz almost 2 years ago

Proposing a revert https://github.com/openSUSE/qem-bot/pull/108 for now.

EDIT: Merged https://github.com/openSUSE/qem-bot/pull/108

Actions

Copy link

#14

Updated by jbaier_cz almost 2 years ago

Apparently the i (object returned from dashboard API) has not all attributes from the database, I will need to look on the dashboard and maybe enhance the API (or maybe I just need to call another endpoint).

Actions

Copy link

#15

Updated by jbaier_cz almost 2 years ago

Status changed from In Progress to Feedback

We have a bunch of new PR, after all of them are merged, the new version should list openQA job ids correctly. Where not possible, the log entry should explicitly tell the "job setting" id (which refers to incident/update setting entity in the dashboard).

Actions

Copy link

#16

Updated by livdywan almost 2 years ago

Please try and always mention the PR's here for clarity. That makes it easier to double-check that they're all being reviewed timely:

Actions

Copy link

#17

Updated by livdywan almost 2 years ago

All PR's have been merged. The pipeline from 30 minutes ago shows 2023-01-04 08:33:34 INFO Found failed, not-ignored job 10271442 for incident 26100. It's not linked but https://openqa.suse.de/tests/10271442 seems to be a valid job.

Actions

Copy link

#18

Updated by okurz almost 2 years ago

Due date deleted (~~2023-01-07~~)
Status changed from Feedback to Resolved

We crosschecked again during the weekly SUSE QE Tools unblock 2023-01-04. We also looked at the next message:

Found failed, not-ignored job 10271608 for incident 27311

checking https://openqa.suse.de/tests/10271608 we find a valid unhandled openQA test failure. Also when following http://dashboard.qam.suse.de/incident/27311 or looking on http://dashboard.qam.suse.de/blocked we find exactly one failure blocking the approval which is the very same openQA job. So all good.

Project

General

Profile

QA (public)

Tags

Custom queries

action #122308

Handle invalid openQA job references in qem-dashboard size:M

Motivation¶

Acceptance criteria¶

Suggestions¶

Updated by okurz almost 2 years ago

Updated by okurz almost 2 years ago

Updated by okurz almost 2 years ago

Updated by mkittler almost 2 years ago

Updated by okurz almost 2 years ago

Updated by okurz almost 2 years ago

Updated by okurz almost 2 years ago

Updated by jbaier_cz almost 2 years ago

Updated by jbaier_cz almost 2 years ago

Updated by openqa_review almost 2 years ago

Updated by livdywan almost 2 years ago

Updated by jbaier_cz almost 2 years ago

Updated by okurz almost 2 years ago

Updated by jbaier_cz almost 2 years ago

Updated by jbaier_cz almost 2 years ago

Updated by livdywan almost 2 years ago

Updated by livdywan almost 2 years ago

Updated by okurz almost 2 years ago