action #107923: qem-bot: Ignore not-ok openQA jobs for specific incident based on openQA job comment size:M - QA (public) - openSUSE Project Management Tool

Actions

Copy link

action #107923

closed

coordination #91646: [saga][epic] SUSE Maintenance QA workflows with fully automated testing, approval and release

qem-bot: Ignore not-ok openQA jobs for specific incident based on openQA job comment size:M

Added by okurz about 3 years ago. Updated over 1 year ago.

Status:

Resolved

Priority:

High

Assignee:

jbaier_cz

Target version:

openQA Project (public) - Ready

Start date:

Due date:

% Done:

Estimated time:

Description

Motivation¶

See the proposal in the parent epic #95479, e.g. about the specific format of an openQA label that is readable by qem-bot

Acceptance criteria¶

AC1: A not-ok openQA job with a comment following format https://progress.opensuse.org/issues/95479#Suggestions is not blocking approval of incident updates
AC2: A not-ok openQA job with such comment is still blocking approval of all other, not specified incident updates
AC3: A not-ok openQA without such comment is still blocking all related incidents

Suggestions¶

DONE: Add a testing framework to github.com/openSUSE/qem-bot/, e.g. based on github.com/os-autoinst/openqa_review -> #109641
DONE: Add a simple automatic test exercising one of the existing happy path workflows of qem-bot -> #110167
DONE: Add automatic tests for the above acceptance criteria
DONE: As a first quick-and-dirty, and messy approach read out openQA comments directly within the approval step of qem-bot (only for the failed jobs which should not take too long)
DONE: Parse the mentioned special label string and for the parsed incident remove the according not-ok openQA job from the list of blocking results
Optional: Add openQA comment parsing over the openQA API together with consistent data in qem-dashboard, i.e.
- As qem-dashboard is the "database for qem-bot" read out the according data from openQA that is pushed to qem-dashboard
- make qem-dashboard store the related data
- and then qem-bot should read it out from there

Out of scope¶

Visualize such specially handled failed openQA jobs in dashboard

Related issues 9 (1 open — 8 closed)

Related to openQA Project (public) - openqa-force-result #109857: Secure auto-review+force_result size:M auto_review:"Failed to download gobbledeegoop":force_result:softfailed

Resolved

livdywan

Actions

Related to QA (public) - action #111078: Simple automatic test exercising one of the existing happy path workflows of qem-bot size:M

Resolved

osukup

2022-05-13

Actions

Related to openQA Project (public) - action #119467: "Internal server error" on opening any job group front page at OSD

Resolved

livdywan

Actions

Related to QA (public) - action #114415: [timeboxed:10h][spike solution] qem-bot comments on IBS size:S

Resolved

tinita

2022-07-20

Actions

Related to openQA Infrastructure (public) - action #120939: [alert] Pipeline for scheduling incidents runs into timeout size:M

Resolved

livdywan

2022-11-24

2022-12-13

Actions

Related to QA (public) - action #119161: Approval step of qem-bot says incident has failed job in incidents but it looks empty on the dashboard size:M

Resolved

okurz

2022-10-21

Actions

Related to QA (public) - action #122308: Handle invalid openQA job references in qem-dashboard size:M

Resolved

jbaier_cz

2022-12-21

Actions

Related to openQA Project (public) - action #135782: auto-review+force-result ticket does not seem to have an effect when issue tracker changed after the initial comment when carry-over is effective

New

2023-09-14

Actions

Related to openQA Project (public) - action #136244: Ensure arbitrary comments can be taken over to new jobs size:M

Resolved

dheidler

2023-09-21

2023-11-17

Actions

Copy link

Updated by okurz almost 3 years ago

Related to openqa-force-result #109857: Secure auto-review+force_result size:M auto_review:"Failed to download gobbledeegoop":force_result:softfailed added

Actions

Copy link

Updated by okurz almost 3 years ago

Target version changed from future to Ready

Actions

Copy link

Updated by okurz almost 3 years ago

Description updated (diff)

Actions

Copy link

Updated by okurz almost 3 years ago

Related to action #111078: Simple automatic test exercising one of the existing happy path workflows of qem-bot size:M added

Actions

Copy link

Updated by okurz almost 3 years ago

Status changed from New to Blocked
Assignee set to okurz

#111078

Actions

Copy link

Updated by okurz almost 3 years ago

Status changed from Blocked to New
Assignee deleted (~~okurz~~)

Actions

Copy link

Updated by okurz almost 3 years ago

Priority changed from Low to Normal

Actions

Copy link

Updated by okurz almost 3 years ago

Subject changed from qem-bot: Ignore not-ok openQA jobs for specific incident based on openQA job comment to qem-bot: Ignore not-ok openQA jobs for specific incident based on openQA job comment size:M
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by mkittler over 2 years ago

Status changed from Workable to In Progress
Assignee set to mkittler

PR: https://github.com/openSUSE/qem-bot/pull/73

Actions

Copy link

#10

Updated by openqa_review over 2 years ago

Due date set to 2022-10-28

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

#11

Updated by jbaier_cz over 2 years ago

Likely a regression: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1189679

Actions

Copy link

#12

Updated by mkittler over 2 years ago

PR to fix pipeline failures: https://github.com/openSUSE/qem-bot/pull/75

Actions

Copy link

#13

Updated by jbaier_cz over 2 years ago

It seems, that the pipeline may be working, although we should still get rid of the stack traces. To actually see, if we already use the patched version, I propose a little update for the pipeline: https://gitlab.suse.de/qa-maintenance/bot-ng/-/merge_requests/59

Actions

Copy link

#14

Updated by mkittler over 2 years ago

Let's see whether https://github.com/openSUSE/qem-bot/pull/76 fixes the pipeline (e.g. https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/506277).

Actions

Copy link

#15

Updated by jbaier_cz over 2 years ago

The code still does a lot of retries on 404 from non-existent comments resulting in a very long run duration and job termination after 1h timeout.

Actions

Copy link

#16

Updated by mkittler over 2 years ago

Maybe it is the simplest to just disable the retry for now: https://github.com/openSUSE/qem-bot/pull/79
(Not sure how to make it only retry on connection errors and 500 responses which would likely be the most reasonable choice and I assumed would be the default retry behavior.)

Actions

Copy link

#17

Updated by mkittler over 2 years ago

Status changed from In Progress to Feedback

All PRs have been merged. The pipeline works again and doesn't take that long to execute (e.g. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1194317).

So putting a comment like https://progress.opensuse.org/issues/95479#Suggestions in an openQA job should now work.

Actions

Copy link

#18

Updated by okurz over 2 years ago

Nice. Also thank you for the announcement in #eng-testing. I suggest to await a real-life usage. You could query the openQA comments for matching comments if people used it.

Before resolving please make sure to put any not-done "optional" tasks in according new ticket(s).

Actions

Copy link

#19

Updated by tinita over 2 years ago

mkittler wrote:

Maybe it is the simplest to just disable the retry for now: https://github.com/openSUSE/qem-bot/pull/79
(Not sure how to make it only retry on connection errors and 500 responses which would likely be the most reasonable choice and I assumed would be the default retry behavior.)

This should probably be done in the openqa_client.
There's even a related issue: https://github.com/os-autoinst/openQA-python-client/issues/16
I had a quick look at the code, but currently it is using requests, and if I understand it correctly, it would have to be rewritten with urllib3 to use the retry feature.

Actions

Copy link

#20

Updated by mkittler over 2 years ago

Description updated (diff)

Actions

Copy link

#21

Updated by mkittler over 2 years ago

There are no such comments yet:

openqa=# select job_id, text from comments where text like '%review%acceptable%';
 job_id | text 
--------+------
(0 Zeilen)

Likely it is also possible to tweak the retry while keep using requests. (I cannot imagine that request makes this impossible. Likely they only prefer urllib3 because it has already a built-in feature for exactly what we want.)

Actions

Copy link

#22

Updated by kraih over 2 years ago

Still lots of error messages in the pipeline output https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1196209:

ERROR: ('GET', 'https://openqa.suse.de/api/v1/jobs/1944211/comments', 404)
Traceback (most recent call last):
  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/openqa.py", line 73, in get_job_comments
    "GET", "jobs/%s/comments" % job_id, retries=0
  File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 184, in openqa_request
    return self.do_request(req, retries=retries, wait=wait)
  File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 164, in do_request
    raise err
  File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 144, in do_request
    request.method, resp.url, resp.status_code
openqa_client.exceptions.RequestError: ('GET', 'https://openqa.suse.de/api/v1/jobs/1944211/comments', 404)

Actions

Copy link

#23

Updated by mkittler over 2 years ago

PR: https://github.com/openSUSE/qem-bot/pull/80

Actions

Copy link

#24

Updated by okurz over 2 years ago

https://github.com/openSUSE/qem-bot/pull/81

Actions

Copy link

#25

Updated by okurz over 2 years ago

merged

Actions

Copy link

#26

Updated by mkittler over 2 years ago

okurz wrote:

Optional: Add openQA comment parsing over the openQA API together with consistent data in qem-dashboard, i.e.

As qem-dashboard is the "database for qem-bot" read out the according data from openQA that is pushed to qem-dashboard

make qem-dashboard store the related data

and then qem-bot should read it out from there

That's likely not very useful. We'd need to query all comments when syncing the dashboard so we'd likely end up with more request on the openQA side. Besides, judging the by logs of recent approval pipeline runs I suppose openQA will be able to handle the traffic (as it doesn't look too much, we also only query comments for failed jobs).

One could do a further optimization: As soon as we see a failing job blocking an incident, we'd avoid checking any further failing job for that incident (e.g. an early break in the loop introduced here: https://github.com/openSUSE/qem-bot/pull/73/files).

Actions

Copy link

#27

Updated by mkittler over 2 years ago

PR for the optimization mentioned in the previous comment: https://github.com/openSUSE/qem-bot/pull/82

Note that the feature isn't used yet (select job_id, user_id, text from comments where text like '%review%acceptable%' and user_id != 126; returns no results).

Actions

Copy link

#28

Updated by mkittler over 2 years ago

PR to make this at least a little bit more discoverable: https://github.com/os-autoinst/openQA/pull/4867

Actions

Copy link

#29

Updated by okurz over 2 years ago

Related to action #119467: "Internal server error" on opening any job group front page at OSD added

Actions

Copy link

#30

Updated by okurz over 2 years ago

As we learned from #119467 we should have

No obvious related regression on OSD
Test coverage for the branding button code exists for both the job comments as well as job group comments

so I suggest in the next try to introduce such test.

Actions

Copy link

#31

Updated by okurz over 2 years ago

Due date changed from 2022-10-28 to 2022-11-11

We hit the mentioned regression and want to improve tests when reintroducing. Giving more time for this which should be an exception

Actions

Copy link

#32

Updated by mkittler over 2 years ago

Fixed version of the original PR with test coverage for the SUSE branding template code: https://github.com/os-autoinst/openQA/pull/4881

Actions

Copy link

#33

Updated by mkittler over 2 years ago

Looks like the feature is still not used. The PR for adding the button back has been merged.

Actions

Copy link

#34

Updated by okurz over 2 years ago

Due date deleted (~~2022-11-11~~)
Status changed from Feedback to Resolved

I triggered an extraordinary deployment on OSD and verified that the comment template showed up on https://openqa.suse.de/tests/latest#comments . Documentation and informing people can be done in related tickets, e.g. #111066

Actions

Copy link

#35

Updated by tinita over 2 years ago

Status changed from Resolved to Feedback

We should be able to use retry now because 404 is now excluded: https://github.com/os-autoinst/openQA-python-client/pull/34

Actions

Copy link

#36

Updated by mkittler over 2 years ago

Ok, although I need to check first whether we already have v4.2.1 of that Python module in the CI environment.

Actions

Copy link

#37

Updated by okurz over 2 years ago

In https://suse.slack.com/archives/C02CBB35W5B/p1668779284852569?thread_ts=1668611066.502389&cid=C02CBB35W5B Veronika Svecova tried to use the special marking comment but it seems to have no effect. Do you know why incident 26757 can not be auto-approved? The only linked failed incident test according to http://dashboard.qam.suse.de/incident/26757 is https://openqa.suse.de/tests/9987265 with a comment "@review:acceptable_for:incident_26757:kgraft-patch-SLE12-SP4_Update_23:cleared_with_Marcus" but https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1246560#L171 states "INFO: Inc 26757 has failed job in incidents"

openqa-cli api --osd job_settings/jobs key=*_TEST_ISSUES list_value=26757 returns

{"jobs":[9987265,9977121]}

which seems quite broken as there are definitely more jobs than two. At least it includes the one failed job that was marked to be ignorable. The other job is ok though.

Adding some logging qem-bot states that https://openqa.suse.de/tests/1953773 is blocking approval. This job does not even exist in OSD (anymore?) and judging by the number it would years old. I assume with that even force-result won't help. Sounds like #119161

Actions

Copy link

#38

Updated by kraih over 2 years ago

okurz wrote:

openqa-cli api --osd job_settings/jobs key=*_TEST_ISSUES list_value=26757 returns
{"jobs":[9987265,9977121]}
which seems quite broken as there are definitely more jobs than two. At least it includes the one failed job that was marked to be ignorable. The other job is ok though.

That would be the job_id limit. We only list jobs that are max(job_id) - 20000 or higher for performance. (trigram index would fix that and show 257 results) #117655

Actions

Copy link

#39

Updated by kraih over 2 years ago

okurz wrote:

Adding some logging qem-bot states that https://openqa.suse.de/tests/1953773 is blocking approval. This job does not even exist in OSD (anymore?) and judging by the number it would years old. I assume with that even force-result won't help. Sounds like #119161

Here's what the dashboard knows about the incident:

dashboard_db=# select job_id, status from openqa_jobs where incident_settings = 1953780 order by job_id desc;
 job_id  | status
---------+--------
 9962984 | passed
 9962255 | passed
 9962253 | passed
 9936895 | passed
 9936894 | passed
 9936893 | passed
 9936892 | passed
 9936891 | passed
 9936890 | passed
 9936889 | passed
 9936888 | passed
 9936884 | passed
 9936883 | passed
 9936882 | passed
 9936881 | passed
 9936880 | passed
 9936879 | passed
 9936878 | passed
 9936877 | passed
 9936876 | passed
 9936875 | passed
 9936874 | passed
 9936873 | passed
 9936872 | passed
 9936871 | passed
 9936870 | passed
 9936869 | passed
 9936868 | passed
 9936867 | passed
 9936866 | passed
 9936865 | passed
 9936864 | passed
 9936863 | passed
 9936862 | passed
 9936861 | passed
 9936860 | passed
 9936859 | passed
 9936858 | passed
 9936857 | passed
 9936856 | passed
 9936855 | passed
 9936854 | passed
 9936853 | passed
 9936852 | passed
 9936851 | passed
 9936850 | passed
 9936849 | passed
 9936848 | passed
 9936847 | passed
 9936846 | passed
 9936845 | passed
 9936844 | passed
 9936843 | passed
 9936842 | passed
 9936841 | passed
 9936840 | passed
(56 rows)

Curious that the newest 4 jobs in OSD for the incident are unknown to the dashboard and the dashboard doesn't know about 9937076 etc.:

openqa=# select * from job_settings where key like '%TEST_ISSUES' and value like '%26757%' order by job_id desc;
    id     |       key        | value | job_id  |      t_created      |      t_updated
-----------+------------------+-------+---------+---------------------+---------------------
 455803578 | LIVE_TEST_ISSUES | 26757 | 9987265 | 2022-11-18 10:59:48 | 2022-11-18 10:59:48
 455315068 | LIVE_TEST_ISSUES | 26757 | 9977121 | 2022-11-16 19:56:54 | 2022-11-16 19:56:54
 454525516 | LIVE_TEST_ISSUES | 26757 | 9962987 | 2022-11-15 12:55:06 | 2022-11-15 12:55:06
 454525454 | LIVE_TEST_ISSUES | 26757 | 9962985 | 2022-11-15 12:54:26 | 2022-11-15 12:54:26
 454525399 | LIVE_TEST_ISSUES | 26757 | 9962984 | 2022-11-15 12:54:25 | 2022-11-15 12:54:25
 454494414 | LIVE_TEST_ISSUES | 26757 | 9962255 | 2022-11-15 10:52:09 | 2022-11-15 10:52:09
 454494309 | LIVE_TEST_ISSUES | 26757 | 9962253 | 2022-11-15 10:51:38 | 2022-11-15 10:51:38
 453330250 | LIVE_TEST_ISSUES | 26757 | 9937076 | 2022-11-11 15:16:23 | 2022-11-11 15:16:23
 453330203 | LIVE_TEST_ISSUES | 26757 | 9937075 | 2022-11-11 15:16:23 | 2022-11-11 15:16:23
 453330187 | LIVE_TEST_ISSUES | 26757 | 9937074 | 2022-11-11 15:16:23 | 2022-11-11 15:16:23
 453330149 | LIVE_TEST_ISSUES | 26757 | 9937073 | 2022-11-11 15:16:22 | 2022-11-11 15:16:22
 453330101 | LIVE_TEST_ISSUES | 26757 | 9937072 | 2022-11-11 15:16:22 | 2022-11-11 15:16:22
 453330077 | LIVE_TEST_ISSUES | 26757 | 9937071 | 2022-11-11 15:16:22 | 2022-11-11 15:16:22
...
 453323887 | LIVE_TEST_ISSUES | 26757 | 9936895 | 2022-11-11 15:15:09 | 2022-11-11 15:15:09
 453323853 | LIVE_TEST_ISSUES | 26757 | 9936894 | 2022-11-11 15:15:09 | 2022-11-11 15:15:09
...

Note that these are not aggregate jobs, so it's unrelated to previous issues with missing data.

Actions

Copy link

#40

Updated by mkittler over 2 years ago

If the bot/dashboard don't know/consider the job where the comment is created on than this feature can obviously not work.

It would be great if the failed jobs would be logged (instead of getting just "INFO: Inc 26757 has failed job in incidents"). But before improving the logging it would be best to wait until the currently pending mega-PR (https://github.com/openSUSE/qem-bot/pull/84) has been merged. And then I can also improve the error handling as mentioned in #107923#note-35.

Actions

Copy link

#41

Updated by okurz over 2 years ago

Related to action #114415: [timeboxed:10h][spike solution] qem-bot comments on IBS size:S added

Actions

Copy link

#42

Updated by jbaier_cz over 2 years ago

Related to action #120939: [alert] Pipeline for scheduling incidents runs into timeout size:M added

Actions

Copy link

#43

Updated by livdywan over 2 years ago

mkittler wrote:

If the bot/dashboard don't know/consider the job where the comment is created on than this feature can obviously not work.

It would be great if the failed jobs would be logged (instead of getting just "INFO: Inc 26757 has failed job in incidents"). But before improving the logging it would be best to wait until the currently pending mega-PR (https://github.com/openSUSE/qem-bot/pull/84) has been merged. And then I can also improve the error handling as mentioned in #107923#note-35.

See also https://github.com/openSUSE/qem-bot/pull/83 which was merged as part of it

Actions

Copy link

#44

Updated by jbaier_cz over 2 years ago

Related to action #119161: Approval step of qem-bot says incident has failed job in incidents but it looks empty on the dashboard size:M added

Actions

Copy link

#45

Updated by okurz over 2 years ago

Due date set to 2022-12-16
Status changed from Feedback to In Progress
Assignee changed from mkittler to okurz

The mentioned PR https://github.com/openSUSE/qem-bot/pull/84 has been reverted in the meantime. I see that we are not progressing here so I am taking over trying to bring in my original changes one by one. This should help with debugging from logs.

Actions

Copy link

#46

Updated by okurz over 2 years ago

EDIT: All merged. New PR, one step at a time:

Actions

Copy link

#47

Updated by okurz over 2 years ago

Due date changed from 2022-12-16 to 2022-12-23
Status changed from In Progress to Feedback

Actions

Copy link

#48

Updated by mkittler over 2 years ago

What would actually still be left is the improvement mentioned in #107923#note-35.

Actions

Copy link

#49

Updated by okurz over 2 years ago

Due date changed from 2022-12-23 to 2023-01-20

Opened a new PR https://github.com/openSUSE/qem-bot/pull/103

I would like to keep this open until more people return after Christmas absence.

Actions

Copy link

#50

Updated by okurz over 2 years ago

Related to action #122308: Handle invalid openQA job references in qem-dashboard size:M added

Actions

Copy link

#52

Updated by okurz over 2 years ago

Status changed from Feedback to Workable

#122308 was resolved. So next step can be for me to work with tinita on https://github.com/openSUSE/qem-bot/pull/103 and then to crosscheck again if there are any more useful refactorings pending.

Actions

Copy link

#53

Updated by livdywan over 2 years ago

Due date deleted (~~2023-01-20~~)

I think the Due Date should've been reset here but wasn't.

Actions

Copy link

#54

Updated by okurz about 2 years ago

Assignee deleted (~~okurz~~)

This needs to be picked up by someone else. I failed to accomodate it

Actions

Copy link

#55

Updated by okurz about 2 years ago

Priority changed from Normal to High

This is the only thing we need to complete to finish off #91646 as well -> "High" prio now

Actions

Copy link

#56

Updated by jbaier_cz about 2 years ago

I am just wondering what are we missing here... the main functionality (AC1 and AC2) should be covered by the initial pull request https://github.com/openSUSE/qem-bot/pull/73 and then refined in some follow-ups, at the end we clarified the difference between dashboard job id and openqa job id and fixed that feature in https://github.com/openSUSE/qem-bot/pull/109.
All ACs are covered by a corresponding test.

Actions

Copy link

#57

Updated by okurz about 2 years ago

Two things left to do:

Refactor according to #107923#note-35
Verify ACs in production, not just from unit tests, i.e. reference an example from logs where non-tools-team users could effectively ignore an aggregate test for one incident

Actions

Copy link

#58

Updated by jbaier_cz about 2 years ago

Status changed from Workable to In Progress
Assignee set to jbaier_cz

Actions

Copy link

#59

Updated by jbaier_cz about 2 years ago

Tags deleted (~~mob~~)

Revert the retry disabling commit and see if the 404 retries are really gone: https://github.com/openSUSE/qem-bot/pull/113

Actions

Copy link

#60

Updated by openqa_review about 2 years ago

Due date set to 2023-03-18

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

#61

Updated by jbaier_cz about 2 years ago

Status changed from In Progress to Feedback

I created a small PR https://github.com/openSUSE/qem-bot/pull/114 to slightly improve the output of the Approver. Now it should also output a link to the failed / ignored openQA job

Actions

Copy link

#62

Updated by mkittler about 2 years ago

The most recent usage of the feature is https://openqa.suse.de/tests/10478348#comments (found via select job_id, user_id, text from comments where text like '%review%acceptable%' and user_id != 126;). Not sure whether that's good enough to verify that the feature works.

Actions

Copy link

#63

Updated by jbaier_cz about 2 years ago

Status changed from Feedback to Resolved

We have a new confirmation today in https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1444928

2023-03-09 11:04:15 INFO     Ignoring failed job https://openqa.suse.de/t10643070 for incident 27492 due to openQA comment
2023-03-09 11:04:15 INFO     Ignoring failed job https://openqa.suse.de/t10643071 for incident 27492 due to openQA comment
2023-03-09 11:04:15 INFO     Ignoring failed job https://openqa.suse.de/t10643068 for incident 27492 due to openQA comment
2023-03-09 11:04:15 INFO     Ignoring failed job https://openqa.suse.de/t10643069 for incident 27492 due to openQA comment
2023-03-09 11:04:15 INFO     Found failed, not-ignored job https://openqa.suse.de/t10646456 for incident 27492
2023-03-09 11:04:15 INFO     SUSE:Maintenance:27492:290872 has at least one failed job in aggregate tests

Actions

Copy link

#64

Updated by MDoucha over 1 year ago

@review:acceptable_for:... comments are not carried over between jobs. This makes the feature no better than force_result.

Example: https://openqa.suse.de/tests/11750829#comments
Comment missing here: https://openqa.suse.de/tests/11788157#comments

Actions

Copy link

#65

Updated by okurz over 1 year ago

MDoucha wrote:

@review:acceptable_for:... comments are not carried over between jobs. This makes the feature no better than force_result.

Example: https://openqa.suse.de/tests/11750829#comments
Comment missing here: https://openqa.suse.de/tests/11788157#comments

Yes, that's as designed so that no failure stays unnoticed without a bug reference to remind in

Actions

Copy link

#66

Updated by MDoucha over 1 year ago

okurz wrote:

Yes, that's as designed so that no failure stays unnoticed without a bug reference to remind in

@review:acceptable_for does not change the job result. It stays as failed. The purpose of this feature is to whitelist all identical failures for a single incident. It needs to be carried over.

Actions

Copy link

#67

Updated by okurz over 1 year ago

MDoucha wrote:

okurz wrote:

Yes, that's as designed so that no failure stays unnoticed without a bug reference to remind in

@review:acceptable_for does not change the job result. It stays as failed. The purpose of this feature is to whitelist all identical failures for a single incident. It needs to be carried over.

Well, labels are not carried over but you can combine bugrefs and labels in a single openQA comment. So just put the label plus a bugref and the carry over should happen.

Actions

Copy link

#68

Updated by MDoucha over 1 year ago

okurz wrote:

Well, labels are not carried over but you can combine bugrefs and labels in a single openQA comment. So just put the label plus a bugref and the carry over should happen.

Do you realize how dangerous it is that such a "workaround" is even possible? Somebody could do that by accident on a force_result label.

Also, no. If restarting a job means that we'll have to redo all the "acceptable_for" labels, the feature is not finished.

Actions

Copy link

#69

Updated by okurz over 1 year ago

MDoucha wrote:

Do you realize how dangerous it is that such a "workaround" is even possible? Somebody could do that by accident on a force_result label.

Also, no. If restarting a job means that we'll have to redo all the "acceptable_for" labels, the feature is not finished.

I could create a ticket and try to write down what you might want to have but as is apparent that failed apparently so please collaborate with vpelcak to write down what you actually need and then we will use that as a base for discussion how to improve and eventually implement such feature after we agreed upon the exact requirements. Simply "enabling carry over" would conflict with other requirements.

Actions

Copy link

#70

Updated by MDoucha over 1 year ago

okurz wrote:

Simply "enabling carry over" would conflict with other requirements.

Could you be more specific? I don't see any conflict with the 3 acceptance criteria in this ticket and if I'm missing something, it's important information for creating the new ticket.

Actions

Copy link

#71

Updated by okurz over 1 year ago

MDoucha wrote:

okurz wrote:

Well, labels are not carried over but you can combine bugrefs and labels in a single openQA comment. So just put the label plus a bugref and the carry over should happen.

Do you realize how dangerous it is that such a "workaround" is even possible? Somebody could do that by accident on a force_result label.

Well, yes, that accident can happen but I don't see that as something that we can change while still keeping the possibility to allow this. We don't see it as a workaround, please see further below for my references to documentation.

Also, no. If restarting a job means that we'll have to redo all the "acceptable_for" labels, the feature is not finished.

Well, that's why you have a choice

MDoucha wrote:

okurz wrote:

Simply "enabling carry over" would conflict with other requirements.

Could you be more specific? I don't see any conflict with the 3 acceptance criteria in this ticket and if I'm missing something, it's important information for creating the new ticket.

ok, let me try. You mentioned acceptance criteria of the current ticket, let me start with those.

okurz wrote:

Acceptance criteria¶

AC1: A not-ok openQA job with a comment following format https://progress.opensuse.org/issues/95479#Suggestions is not blocking approval of incident updates

this is fulfilled regardless if such acceptable_for-comment is carried over or not so we shouldn't be concerned about that

AC2: A not-ok openQA job with such comment is still blocking approval of all other, not specified incident updates

Right now this is fulfilled because only the incident in the acceptable_for-comment would be concerned maybe even independant of carry-over although there might be cases which I can't think of right now where a acceptable_for-comment is carried over until it would erroneously match another, unrelated test failure for maybe the same incident or a new patch for different OS version for the same incident number. So to be on the safe side here the more conservative approach of "no carry-over" helps.

AC3: A not-ok openQA without such comment is still blocking all related incidents

that should also be ok regardless if a special comment is carried over or not as we expect that a special comment is only having an effect on each job where it is linked to.

#95479-38 and #95479-42 from dzedro explicitly asked for no carry over which happened in that case because a comment included a combined label+bugref.

Next to the acceptance criteria mentioned in this ticket at least one other requirement comes from the design of openQA "labels and bugrefs" as documented and defined in http://open.qa/docs/#_bug_references_and_labels which says "A comment containing a bug reference will also be carried over to reduce manual review work." and for labels it says "A label containing a bug reference will still be treated as a label, not a bugref. The bugref will still be rendered as a link. That means no bug icon is shown and the comment does not become subject to carry over." and further on "force_result-labels are evaluated when when a comment is carried over. However, the carry over will only happen when the comment also contains a bug reference."

So this brings me back to my earlier suggestion that in the case that you describe likely the best is to combine within one comment an acceptable_for-label together with a bugref, e.g.

@review:acceptable_for:incident_30146:bsc#1212271

bsc#1212271

If you do not want to duplicate the bug link then what will likely fulfill the regex check in https://github.com/openSUSE/qem-bot/pull/73/files#diff-d44cab80bf43c62865043529f6e9883cc12ab37d49f5fb5af9e1bdc7c012dcaaR87 as well

@review:acceptable_for:incident_30146:.

bsc#1212271

Actions

Copy link

#72

Updated by MDoucha over 1 year ago

okurz wrote:

AC2: A not-ok openQA job with such comment is still blocking approval of all other, not specified incident updates

Right now this is fulfilled because only the incident in the acceptable_for-comment would be concerned maybe even independant of carry-over although there might be cases which I can't think of right now where a acceptable_for-comment is carried over until it would erroneously match another, unrelated test failure for maybe the same incident or a new patch for different OS version for the same incident number. So to be on the safe side here the more conservative approach of "no carry-over" helps.

OK, that seems plausible at least for aggregate tests where all SLE versions are in the same job group. In that case, we need a solution built into the QE dashboard.

Actions

Copy link

#73

Updated by okurz over 1 year ago

Due date deleted (~~2023-03-18~~)

MDoucha wrote in #note-72:

OK, that seems plausible at least for aggregate tests where all SLE versions are in the same job group. In that case, we need a solution built into the QE dashboard.

Feel welcome to add your suggestion to #65271 . Right now I don't see that SUSE QE Tools team can provide any significant contribution in this direction.

Actions

Copy link

#74

Updated by okurz over 1 year ago

Related to action #135782: auto-review+force-result ticket does not seem to have an effect when issue tracker changed after the initial comment when carry-over is effective added

Actions

Copy link

#75

Updated by livdywan over 1 year ago

Related to action #136244: Ensure arbitrary comments can be taken over to new jobs size:M added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public)

Tags

Custom queries

action #107923

qem-bot: Ignore not-ok openQA jobs for specific incident based on openQA job comment size:M

Motivation¶

Acceptance criteria¶

Suggestions¶

Out of scope¶

Updated by okurz almost 3 years ago

Updated by okurz almost 3 years ago

Updated by okurz almost 3 years ago

Updated by okurz almost 3 years ago

Updated by okurz almost 3 years ago

Updated by okurz almost 3 years ago

Updated by okurz almost 3 years ago

Updated by okurz almost 3 years ago

Updated by mkittler over 2 years ago

Updated by openqa_review over 2 years ago

Updated by jbaier_cz over 2 years ago

Updated by mkittler over 2 years ago

Updated by jbaier_cz over 2 years ago

Updated by mkittler over 2 years ago

Updated by jbaier_cz over 2 years ago

Updated by mkittler over 2 years ago

Updated by mkittler over 2 years ago

Updated by okurz over 2 years ago

Updated by tinita over 2 years ago

Updated by mkittler over 2 years ago

Updated by mkittler over 2 years ago

Updated by kraih over 2 years ago

Updated by mkittler over 2 years ago

Updated by okurz over 2 years ago

Updated by okurz over 2 years ago

Updated by mkittler over 2 years ago

Updated by mkittler over 2 years ago

Updated by mkittler over 2 years ago

Updated by okurz over 2 years ago

Updated by okurz over 2 years ago

Updated by okurz over 2 years ago

Updated by mkittler over 2 years ago

Updated by mkittler over 2 years ago

Updated by okurz over 2 years ago

Updated by tinita over 2 years ago

Updated by mkittler over 2 years ago

Updated by okurz over 2 years ago

Updated by kraih over 2 years ago

Updated by kraih over 2 years ago

Updated by mkittler over 2 years ago

Updated by okurz over 2 years ago

Updated by jbaier_cz over 2 years ago

Updated by livdywan over 2 years ago

Updated by jbaier_cz over 2 years ago

Updated by okurz over 2 years ago

Updated by okurz over 2 years ago

Updated by okurz over 2 years ago

Updated by mkittler over 2 years ago

Updated by okurz over 2 years ago

Updated by okurz over 2 years ago

Updated by okurz over 2 years ago

Updated by livdywan over 2 years ago

Updated by okurz about 2 years ago

Updated by okurz about 2 years ago

Updated by jbaier_cz about 2 years ago

Updated by okurz about 2 years ago

Updated by jbaier_cz about 2 years ago

Updated by jbaier_cz about 2 years ago

Updated by openqa_review about 2 years ago

Updated by jbaier_cz about 2 years ago

Updated by mkittler about 2 years ago

Updated by jbaier_cz about 2 years ago

Updated by MDoucha over 1 year ago

Updated by okurz over 1 year ago

Updated by MDoucha over 1 year ago

Updated by okurz over 1 year ago

Updated by MDoucha over 1 year ago

Updated by okurz over 1 year ago