Project

General

Profile

Actions

action #107923

closed

coordination #91646: [saga][epic] SUSE Maintenance QA workflows with fully automated testing, approval and release

qem-bot: Ignore not-ok openQA jobs for specific incident based on openQA job comment size:M

Added by okurz about 2 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

See the proposal in the parent epic #95479, e.g. about the specific format of an openQA label that is readable by qem-bot

Acceptance criteria

  • AC1: A not-ok openQA job with a comment following format https://progress.opensuse.org/issues/95479#Suggestions is not blocking approval of incident updates
  • AC2: A not-ok openQA job with such comment is still blocking approval of all other, not specified incident updates
  • AC3: A not-ok openQA without such comment is still blocking all related incidents

Suggestions

  • DONE: Add a testing framework to github.com/openSUSE/qem-bot/, e.g. based on github.com/os-autoinst/openqa_review -> #109641
  • DONE: Add a simple automatic test exercising one of the existing happy path workflows of qem-bot -> #110167
  • DONE: Add automatic tests for the above acceptance criteria
  • DONE: As a first quick-and-dirty, and messy approach read out openQA comments directly within the approval step of qem-bot (only for the failed jobs which should not take too long)
  • DONE: Parse the mentioned special label string and for the parsed incident remove the according not-ok openQA job from the list of blocking results
  • Optional: Add openQA comment parsing over the openQA API together with consistent data in qem-dashboard, i.e.
    • As qem-dashboard is the "database for qem-bot" read out the according data from openQA that is pushed to qem-dashboard
    • make qem-dashboard store the related data
    • and then qem-bot should read it out from there

Out of scope

Visualize such specially handled failed openQA jobs in dashboard


Related issues 9 (1 open8 closed)

Related to openQA Project - openqa-force-result #109857: Secure auto-review+force_result size:M auto_review:"Failed to download gobbledeegoop":force_result:softfailedResolvedlivdywanActions
Related to QA - action #111078: Simple automatic test exercising one of the existing happy path workflows of qem-bot size:MResolvedosukup2022-05-13

Actions
Related to openQA Project - action #119467: "Internal server error" on opening any job group front page at OSDResolvedlivdywan

Actions
Related to QA - action #114415: [timeboxed:10h][spike solution] qem-bot comments on IBS size:SResolvedtinita2022-07-20

Actions
Related to openQA Infrastructure - action #120939: [alert] Pipeline for scheduling incidents runs into timeout size:MResolvedlivdywan2022-11-242022-12-13

Actions
Related to QA - action #119161: Approval step of qem-bot says incident has failed job in incidents but it looks empty on the dashboard size:MResolvedokurz2022-10-21

Actions
Related to QA - action #122308: Handle invalid openQA job references in qem-dashboard size:MResolvedjbaier_cz2022-12-21

Actions
Related to openQA Project - action #135782: auto-review+force-result ticket does not seem to have an effect when issue tracker changed after the initial comment when carry-over is effectiveNew2023-09-14

Actions
Related to openQA Project - action #136244: Ensure arbitrary comments can be taken over to new jobs size:MResolveddheidler2023-09-212023-11-17

Actions
Actions #1

Updated by okurz almost 2 years ago

  • Related to openqa-force-result #109857: Secure auto-review+force_result size:M auto_review:"Failed to download gobbledeegoop":force_result:softfailed added
Actions #2

Updated by okurz almost 2 years ago

  • Target version changed from future to Ready
Actions #3

Updated by okurz almost 2 years ago

  • Description updated (diff)
Actions #4

Updated by okurz almost 2 years ago

  • Related to action #111078: Simple automatic test exercising one of the existing happy path workflows of qem-bot size:M added
Actions #5

Updated by okurz almost 2 years ago

  • Status changed from New to Blocked
  • Assignee set to okurz
Actions #6

Updated by okurz almost 2 years ago

  • Status changed from Blocked to New
  • Assignee deleted (okurz)
Actions #7

Updated by okurz almost 2 years ago

  • Priority changed from Low to Normal
Actions #8

Updated by okurz almost 2 years ago

  • Subject changed from qem-bot: Ignore not-ok openQA jobs for specific incident based on openQA job comment to qem-bot: Ignore not-ok openQA jobs for specific incident based on openQA job comment size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #9

Updated by mkittler over 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
Actions #10

Updated by openqa_review over 1 year ago

  • Due date set to 2022-10-28

Setting due date based on mean cycle time of SUSE QE Tools

Actions #12

Updated by mkittler over 1 year ago

PR to fix pipeline failures: https://github.com/openSUSE/qem-bot/pull/75

Actions #13

Updated by jbaier_cz over 1 year ago

It seems, that the pipeline may be working, although we should still get rid of the stack traces. To actually see, if we already use the patched version, I propose a little update for the pipeline: https://gitlab.suse.de/qa-maintenance/bot-ng/-/merge_requests/59

Actions #15

Updated by jbaier_cz over 1 year ago

The code still does a lot of retries on 404 from non-existent comments resulting in a very long run duration and job termination after 1h timeout.

Actions #16

Updated by mkittler over 1 year ago

Maybe it is the simplest to just disable the retry for now: https://github.com/openSUSE/qem-bot/pull/79
(Not sure how to make it only retry on connection errors and 500 responses which would likely be the most reasonable choice and I assumed would be the default retry behavior.)

Actions #17

Updated by mkittler over 1 year ago

  • Status changed from In Progress to Feedback

All PRs have been merged. The pipeline works again and doesn't take that long to execute (e.g. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1194317).

So putting a comment like https://progress.opensuse.org/issues/95479#Suggestions in an openQA job should now work.

Actions #18

Updated by okurz over 1 year ago

Nice. Also thank you for the announcement in #eng-testing. I suggest to await a real-life usage. You could query the openQA comments for matching comments if people used it.

Before resolving please make sure to put any not-done "optional" tasks in according new ticket(s).

Actions #19

Updated by tinita over 1 year ago

mkittler wrote:

Maybe it is the simplest to just disable the retry for now: https://github.com/openSUSE/qem-bot/pull/79
(Not sure how to make it only retry on connection errors and 500 responses which would likely be the most reasonable choice and I assumed would be the default retry behavior.)

This should probably be done in the openqa_client.
There's even a related issue: https://github.com/os-autoinst/openQA-python-client/issues/16
I had a quick look at the code, but currently it is using requests, and if I understand it correctly, it would have to be rewritten with urllib3 to use the retry feature.

Actions #20

Updated by mkittler over 1 year ago

  • Description updated (diff)
Actions #21

Updated by mkittler over 1 year ago

There are no such comments yet:

openqa=# select job_id, text from comments where text like '%review%acceptable%';
 job_id | text 
--------+------
(0 Zeilen)

Likely it is also possible to tweak the retry while keep using requests. (I cannot imagine that request makes this impossible. Likely they only prefer urllib3 because it has already a built-in feature for exactly what we want.)

Actions #22

Updated by kraih over 1 year ago

Still lots of error messages in the pipeline output https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1196209:

ERROR: ('GET', 'https://openqa.suse.de/api/v1/jobs/1944211/comments', 404)
Traceback (most recent call last):
  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/openqa.py", line 73, in get_job_comments
    "GET", "jobs/%s/comments" % job_id, retries=0
  File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 184, in openqa_request
    return self.do_request(req, retries=retries, wait=wait)
  File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 164, in do_request
    raise err
  File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 144, in do_request
    request.method, resp.url, resp.status_code
openqa_client.exceptions.RequestError: ('GET', 'https://openqa.suse.de/api/v1/jobs/1944211/comments', 404)
Actions #25

Updated by okurz over 1 year ago

merged

Actions #26

Updated by mkittler over 1 year ago

okurz wrote:

  • Optional: Add openQA comment parsing over the openQA API together with consistent data in qem-dashboard, i.e.
    • As qem-dashboard is the "database for qem-bot" read out the according data from openQA that is pushed to qem-dashboard
    • make qem-dashboard store the related data
    • and then qem-bot should read it out from there

That's likely not very useful. We'd need to query all comments when syncing the dashboard so we'd likely end up with more request on the openQA side. Besides, judging the by logs of recent approval pipeline runs I suppose openQA will be able to handle the traffic (as it doesn't look too much, we also only query comments for failed jobs).

One could do a further optimization: As soon as we see a failing job blocking an incident, we'd avoid checking any further failing job for that incident (e.g. an early break in the loop introduced here: https://github.com/openSUSE/qem-bot/pull/73/files).

Actions #27

Updated by mkittler over 1 year ago

PR for the optimization mentioned in the previous comment: https://github.com/openSUSE/qem-bot/pull/82

Note that the feature isn't used yet (select job_id, user_id, text from comments where text like '%review%acceptable%' and user_id != 126; returns no results).

Actions #28

Updated by mkittler over 1 year ago

PR to make this at least a little bit more discoverable: https://github.com/os-autoinst/openQA/pull/4867

Actions #29

Updated by okurz over 1 year ago

  • Related to action #119467: "Internal server error" on opening any job group front page at OSD added
Actions #30

Updated by okurz over 1 year ago

As we learned from #119467 we should have

  • No obvious related regression on OSD
  • Test coverage for the branding button code exists for both the job comments as well as job group comments

so I suggest in the next try to introduce such test.

Actions #31

Updated by okurz over 1 year ago

  • Due date changed from 2022-10-28 to 2022-11-11

We hit the mentioned regression and want to improve tests when reintroducing. Giving more time for this which should be an exception

Actions #32

Updated by mkittler over 1 year ago

Fixed version of the original PR with test coverage for the SUSE branding template code: https://github.com/os-autoinst/openQA/pull/4881

Actions #33

Updated by mkittler over 1 year ago

Looks like the feature is still not used. The PR for adding the button back has been merged.

Actions #34

Updated by okurz over 1 year ago

  • Due date deleted (2022-11-11)
  • Status changed from Feedback to Resolved

I triggered an extraordinary deployment on OSD and verified that the comment template showed up on https://openqa.suse.de/tests/latest#comments . Documentation and informing people can be done in related tickets, e.g. #111066

Actions #35

Updated by tinita over 1 year ago

  • Status changed from Resolved to Feedback

We should be able to use retry now because 404 is now excluded: https://github.com/os-autoinst/openQA-python-client/pull/34

Actions #36

Updated by mkittler over 1 year ago

Ok, although I need to check first whether we already have v4.2.1 of that Python module in the CI environment.

Actions #37

Updated by okurz over 1 year ago

In https://suse.slack.com/archives/C02CBB35W5B/p1668779284852569?thread_ts=1668611066.502389&cid=C02CBB35W5B Veronika Svecova tried to use the special marking comment but it seems to have no effect. Do you know why incident 26757 can not be auto-approved? The only linked failed incident test according to http://dashboard.qam.suse.de/incident/26757 is https://openqa.suse.de/tests/9987265 with a comment "@review:acceptable_for:incident_26757:kgraft-patch-SLE12-SP4_Update_23:cleared_with_Marcus" but https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1246560#L171 states "INFO: Inc 26757 has failed job in incidents"

openqa-cli api --osd job_settings/jobs key=*_TEST_ISSUES list_value=26757 returns

{"jobs":[9987265,9977121]}

which seems quite broken as there are definitely more jobs than two. At least it includes the one failed job that was marked to be ignorable. The other job is ok though.

Adding some logging qem-bot states that https://openqa.suse.de/tests/1953773 is blocking approval. This job does not even exist in OSD (anymore?) and judging by the number it would years old. I assume with that even force-result won't help. Sounds like #119161

Actions #38

Updated by kraih over 1 year ago

okurz wrote:

openqa-cli api --osd job_settings/jobs key=*_TEST_ISSUES list_value=26757 returns

{"jobs":[9987265,9977121]}

which seems quite broken as there are definitely more jobs than two. At least it includes the one failed job that was marked to be ignorable. The other job is ok though.

That would be the job_id limit. We only list jobs that are max(job_id) - 20000 or higher for performance. (trigram index would fix that and show 257 results) #117655

Actions #39

Updated by kraih over 1 year ago

okurz wrote:

Adding some logging qem-bot states that https://openqa.suse.de/tests/1953773 is blocking approval. This job does not even exist in OSD (anymore?) and judging by the number it would years old. I assume with that even force-result won't help. Sounds like #119161

Here's what the dashboard knows about the incident:

dashboard_db=# select job_id, status from openqa_jobs where incident_settings = 1953780 order by job_id desc;
 job_id  | status
---------+--------
 9962984 | passed
 9962255 | passed
 9962253 | passed
 9936895 | passed
 9936894 | passed
 9936893 | passed
 9936892 | passed
 9936891 | passed
 9936890 | passed
 9936889 | passed
 9936888 | passed
 9936884 | passed
 9936883 | passed
 9936882 | passed
 9936881 | passed
 9936880 | passed
 9936879 | passed
 9936878 | passed
 9936877 | passed
 9936876 | passed
 9936875 | passed
 9936874 | passed
 9936873 | passed
 9936872 | passed
 9936871 | passed
 9936870 | passed
 9936869 | passed
 9936868 | passed
 9936867 | passed
 9936866 | passed
 9936865 | passed
 9936864 | passed
 9936863 | passed
 9936862 | passed
 9936861 | passed
 9936860 | passed
 9936859 | passed
 9936858 | passed
 9936857 | passed
 9936856 | passed
 9936855 | passed
 9936854 | passed
 9936853 | passed
 9936852 | passed
 9936851 | passed
 9936850 | passed
 9936849 | passed
 9936848 | passed
 9936847 | passed
 9936846 | passed
 9936845 | passed
 9936844 | passed
 9936843 | passed
 9936842 | passed
 9936841 | passed
 9936840 | passed
(56 rows)

Curious that the newest 4 jobs in OSD for the incident are unknown to the dashboard and the dashboard doesn't know about 9937076 etc.:

openqa=# select * from job_settings where key like '%TEST_ISSUES' and value like '%26757%' order by job_id desc;
    id     |       key        | value | job_id  |      t_created      |      t_updated
-----------+------------------+-------+---------+---------------------+---------------------
 455803578 | LIVE_TEST_ISSUES | 26757 | 9987265 | 2022-11-18 10:59:48 | 2022-11-18 10:59:48
 455315068 | LIVE_TEST_ISSUES | 26757 | 9977121 | 2022-11-16 19:56:54 | 2022-11-16 19:56:54
 454525516 | LIVE_TEST_ISSUES | 26757 | 9962987 | 2022-11-15 12:55:06 | 2022-11-15 12:55:06
 454525454 | LIVE_TEST_ISSUES | 26757 | 9962985 | 2022-11-15 12:54:26 | 2022-11-15 12:54:26
 454525399 | LIVE_TEST_ISSUES | 26757 | 9962984 | 2022-11-15 12:54:25 | 2022-11-15 12:54:25
 454494414 | LIVE_TEST_ISSUES | 26757 | 9962255 | 2022-11-15 10:52:09 | 2022-11-15 10:52:09
 454494309 | LIVE_TEST_ISSUES | 26757 | 9962253 | 2022-11-15 10:51:38 | 2022-11-15 10:51:38
 453330250 | LIVE_TEST_ISSUES | 26757 | 9937076 | 2022-11-11 15:16:23 | 2022-11-11 15:16:23
 453330203 | LIVE_TEST_ISSUES | 26757 | 9937075 | 2022-11-11 15:16:23 | 2022-11-11 15:16:23
 453330187 | LIVE_TEST_ISSUES | 26757 | 9937074 | 2022-11-11 15:16:23 | 2022-11-11 15:16:23
 453330149 | LIVE_TEST_ISSUES | 26757 | 9937073 | 2022-11-11 15:16:22 | 2022-11-11 15:16:22
 453330101 | LIVE_TEST_ISSUES | 26757 | 9937072 | 2022-11-11 15:16:22 | 2022-11-11 15:16:22
 453330077 | LIVE_TEST_ISSUES | 26757 | 9937071 | 2022-11-11 15:16:22 | 2022-11-11 15:16:22
...
 453323887 | LIVE_TEST_ISSUES | 26757 | 9936895 | 2022-11-11 15:15:09 | 2022-11-11 15:15:09
 453323853 | LIVE_TEST_ISSUES | 26757 | 9936894 | 2022-11-11 15:15:09 | 2022-11-11 15:15:09
...

Note that these are not aggregate jobs, so it's unrelated to previous issues with missing data.

Actions #40

Updated by mkittler over 1 year ago

If the bot/dashboard don't know/consider the job where the comment is created on than this feature can obviously not work.

It would be great if the failed jobs would be logged (instead of getting just "INFO: Inc 26757 has failed job in incidents"). But before improving the logging it would be best to wait until the currently pending mega-PR (https://github.com/openSUSE/qem-bot/pull/84) has been merged. And then I can also improve the error handling as mentioned in #107923#note-35.

Actions #41

Updated by okurz over 1 year ago

  • Related to action #114415: [timeboxed:10h][spike solution] qem-bot comments on IBS size:S added
Actions #42

Updated by jbaier_cz over 1 year ago

  • Related to action #120939: [alert] Pipeline for scheduling incidents runs into timeout size:M added
Actions #43

Updated by livdywan over 1 year ago

mkittler wrote:

If the bot/dashboard don't know/consider the job where the comment is created on than this feature can obviously not work.

It would be great if the failed jobs would be logged (instead of getting just "INFO: Inc 26757 has failed job in incidents"). But before improving the logging it would be best to wait until the currently pending mega-PR (https://github.com/openSUSE/qem-bot/pull/84) has been merged. And then I can also improve the error handling as mentioned in #107923#note-35.

See also https://github.com/openSUSE/qem-bot/pull/83 which was merged as part of it

Actions #44

Updated by jbaier_cz over 1 year ago

  • Related to action #119161: Approval step of qem-bot says incident has failed job in incidents but it looks empty on the dashboard size:M added
Actions #45

Updated by okurz over 1 year ago

  • Due date set to 2022-12-16
  • Status changed from Feedback to In Progress
  • Assignee changed from mkittler to okurz

The mentioned PR https://github.com/openSUSE/qem-bot/pull/84 has been reverted in the meantime. I see that we are not progressing here so I am taking over trying to bring in my original changes one by one. This should help with debugging from logs.

Actions #47

Updated by okurz over 1 year ago

  • Due date changed from 2022-12-16 to 2022-12-23
  • Status changed from In Progress to Feedback
Actions #48

Updated by mkittler over 1 year ago

What would actually still be left is the improvement mentioned in #107923#note-35.

Actions #49

Updated by okurz over 1 year ago

  • Due date changed from 2022-12-23 to 2023-01-20

Opened a new PR https://github.com/openSUSE/qem-bot/pull/103

I would like to keep this open until more people return after Christmas absence.

Actions #50

Updated by okurz over 1 year ago

  • Related to action #122308: Handle invalid openQA job references in qem-dashboard size:M added
Actions #52

Updated by okurz over 1 year ago

  • Status changed from Feedback to Workable

#122308 was resolved. So next step can be for me to work with tinita on https://github.com/openSUSE/qem-bot/pull/103 and then to crosscheck again if there are any more useful refactorings pending.

Actions #53

Updated by livdywan about 1 year ago

  • Due date deleted (2023-01-20)

I think the Due Date should've been reset here but wasn't.

Actions #54

Updated by okurz about 1 year ago

  • Assignee deleted (okurz)

This needs to be picked up by someone else. I failed to accomodate it

Actions #55

Updated by okurz about 1 year ago

  • Priority changed from Normal to High

This is the only thing we need to complete to finish off #91646 as well -> "High" prio now

Actions #56

Updated by jbaier_cz about 1 year ago

I am just wondering what are we missing here... the main functionality (AC1 and AC2) should be covered by the initial pull request https://github.com/openSUSE/qem-bot/pull/73 and then refined in some follow-ups, at the end we clarified the difference between dashboard job id and openqa job id and fixed that feature in https://github.com/openSUSE/qem-bot/pull/109.
All ACs are covered by a corresponding test.

Actions #57

Updated by okurz about 1 year ago

Two things left to do:

  • Refactor according to #107923#note-35
  • Verify ACs in production, not just from unit tests, i.e. reference an example from logs where non-tools-team users could effectively ignore an aggregate test for one incident
Actions #58

Updated by jbaier_cz about 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to jbaier_cz
Actions #59

Updated by jbaier_cz about 1 year ago

  • Tags deleted (mob)

Revert the retry disabling commit and see if the 404 retries are really gone: https://github.com/openSUSE/qem-bot/pull/113

Actions #60

Updated by openqa_review about 1 year ago

  • Due date set to 2023-03-18

Setting due date based on mean cycle time of SUSE QE Tools

Actions #61

Updated by jbaier_cz about 1 year ago

  • Status changed from In Progress to Feedback

I created a small PR https://github.com/openSUSE/qem-bot/pull/114 to slightly improve the output of the Approver. Now it should also output a link to the failed / ignored openQA job

Actions #62

Updated by mkittler about 1 year ago

The most recent usage of the feature is https://openqa.suse.de/tests/10478348#comments (found via select job_id, user_id, text from comments where text like '%review%acceptable%' and user_id != 126;). Not sure whether that's good enough to verify that the feature works.

Actions #63

Updated by jbaier_cz about 1 year ago

  • Status changed from Feedback to Resolved

We have a new confirmation today in https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1444928

2023-03-09 11:04:15 INFO     Ignoring failed job https://openqa.suse.de/t10643070 for incident 27492 due to openQA comment
2023-03-09 11:04:15 INFO     Ignoring failed job https://openqa.suse.de/t10643071 for incident 27492 due to openQA comment
2023-03-09 11:04:15 INFO     Ignoring failed job https://openqa.suse.de/t10643068 for incident 27492 due to openQA comment
2023-03-09 11:04:15 INFO     Ignoring failed job https://openqa.suse.de/t10643069 for incident 27492 due to openQA comment
2023-03-09 11:04:15 INFO     Found failed, not-ignored job https://openqa.suse.de/t10646456 for incident 27492
2023-03-09 11:04:15 INFO     SUSE:Maintenance:27492:290872 has at least one failed job in aggregate tests
Actions #64

Updated by MDoucha 8 months ago

@review:acceptable_for:... comments are not carried over between jobs. This makes the feature no better than force_result.

Example: https://openqa.suse.de/tests/11750829#comments
Comment missing here: https://openqa.suse.de/tests/11788157#comments

Actions #65

Updated by okurz 8 months ago

MDoucha wrote:

@review:acceptable_for:... comments are not carried over between jobs. This makes the feature no better than force_result.

Example: https://openqa.suse.de/tests/11750829#comments
Comment missing here: https://openqa.suse.de/tests/11788157#comments

Yes, that's as designed so that no failure stays unnoticed without a bug reference to remind in

Actions #66

Updated by MDoucha 8 months ago

okurz wrote:

Yes, that's as designed so that no failure stays unnoticed without a bug reference to remind in

@review:acceptable_for does not change the job result. It stays as failed. The purpose of this feature is to whitelist all identical failures for a single incident. It needs to be carried over.

Actions #67

Updated by okurz 8 months ago

MDoucha wrote:

okurz wrote:

Yes, that's as designed so that no failure stays unnoticed without a bug reference to remind in

@review:acceptable_for does not change the job result. It stays as failed. The purpose of this feature is to whitelist all identical failures for a single incident. It needs to be carried over.

Well, labels are not carried over but you can combine bugrefs and labels in a single openQA comment. So just put the label plus a bugref and the carry over should happen.

Actions #68

Updated by MDoucha 8 months ago

okurz wrote:

Well, labels are not carried over but you can combine bugrefs and labels in a single openQA comment. So just put the label plus a bugref and the carry over should happen.

Do you realize how dangerous it is that such a "workaround" is even possible? Somebody could do that by accident on a force_result label.

Also, no. If restarting a job means that we'll have to redo all the "acceptable_for" labels, the feature is not finished.

Actions #69

Updated by okurz 8 months ago

MDoucha wrote:

Do you realize how dangerous it is that such a "workaround" is even possible? Somebody could do that by accident on a force_result label.

Also, no. If restarting a job means that we'll have to redo all the "acceptable_for" labels, the feature is not finished.

I could create a ticket and try to write down what you might want to have but as is apparent that failed apparently so please collaborate with vpelcak to write down what you actually need and then we will use that as a base for discussion how to improve and eventually implement such feature after we agreed upon the exact requirements. Simply "enabling carry over" would conflict with other requirements.

Actions #70

Updated by MDoucha 8 months ago

okurz wrote:

Simply "enabling carry over" would conflict with other requirements.

Could you be more specific? I don't see any conflict with the 3 acceptance criteria in this ticket and if I'm missing something, it's important information for creating the new ticket.

Actions #71

Updated by okurz 8 months ago

MDoucha wrote:

okurz wrote:

Well, labels are not carried over but you can combine bugrefs and labels in a single openQA comment. So just put the label plus a bugref and the carry over should happen.

Do you realize how dangerous it is that such a "workaround" is even possible? Somebody could do that by accident on a force_result label.

Well, yes, that accident can happen but I don't see that as something that we can change while still keeping the possibility to allow this. We don't see it as a workaround, please see further below for my references to documentation.

Also, no. If restarting a job means that we'll have to redo all the "acceptable_for" labels, the feature is not finished.

Well, that's why you have a choice

MDoucha wrote:

okurz wrote:

Simply "enabling carry over" would conflict with other requirements.

Could you be more specific? I don't see any conflict with the 3 acceptance criteria in this ticket and if I'm missing something, it's important information for creating the new ticket.

ok, let me try. You mentioned acceptance criteria of the current ticket, let me start with those.

okurz wrote:

Acceptance criteria

this is fulfilled regardless if such acceptable_for-comment is carried over or not so we shouldn't be concerned about that

  • AC2: A not-ok openQA job with such comment is still blocking approval of all other, not specified incident updates

Right now this is fulfilled because only the incident in the acceptable_for-comment would be concerned maybe even independant of carry-over although there might be cases which I can't think of right now where a acceptable_for-comment is carried over until it would erroneously match another, unrelated test failure for maybe the same incident or a new patch for different OS version for the same incident number. So to be on the safe side here the more conservative approach of "no carry-over" helps.

  • AC3: A not-ok openQA without such comment is still blocking all related incidents

that should also be ok regardless if a special comment is carried over or not as we expect that a special comment is only having an effect on each job where it is linked to.

#95479-38 and #95479-42 from dzedro explicitly asked for no carry over which happened in that case because a comment included a combined label+bugref.

Next to the acceptance criteria mentioned in this ticket at least one other requirement comes from the design of openQA "labels and bugrefs" as documented and defined in http://open.qa/docs/#_bug_references_and_labels which says "A comment containing a bug reference will also be carried over to reduce manual review work." and for labels it says "A label containing a bug reference will still be treated as a label, not a bugref. The bugref will still be rendered as a link. That means no bug icon is shown and the comment does not become subject to carry over." and further on "force_result-labels are evaluated when when a comment is carried over. However, the carry over will only happen when the comment also contains a bug reference."

So this brings me back to my earlier suggestion that in the case that you describe likely the best is to combine within one comment an acceptable_for-label together with a bugref, e.g.

@review:acceptable_for:incident_30146:bsc#1212271

bsc#1212271

If you do not want to duplicate the bug link then what will likely fulfill the regex check in https://github.com/openSUSE/qem-bot/pull/73/files#diff-d44cab80bf43c62865043529f6e9883cc12ab37d49f5fb5af9e1bdc7c012dcaaR87 as well

@review:acceptable_for:incident_30146:.

bsc#1212271
Actions #72

Updated by MDoucha 8 months ago

okurz wrote:

  • AC2: A not-ok openQA job with such comment is still blocking approval of all other, not specified incident updates

Right now this is fulfilled because only the incident in the acceptable_for-comment would be concerned maybe even independant of carry-over although there might be cases which I can't think of right now where a acceptable_for-comment is carried over until it would erroneously match another, unrelated test failure for maybe the same incident or a new patch for different OS version for the same incident number. So to be on the safe side here the more conservative approach of "no carry-over" helps.

OK, that seems plausible at least for aggregate tests where all SLE versions are in the same job group. In that case, we need a solution built into the QE dashboard.

Actions #73

Updated by okurz 8 months ago

  • Due date deleted (2023-03-18)

MDoucha wrote in #note-72:

OK, that seems plausible at least for aggregate tests where all SLE versions are in the same job group. In that case, we need a solution built into the QE dashboard.

Feel welcome to add your suggestion to #65271 . Right now I don't see that SUSE QE Tools team can provide any significant contribution in this direction.

Actions #74

Updated by okurz 7 months ago

  • Related to action #135782: auto-review+force-result ticket does not seem to have an effect when issue tracker changed after the initial comment when carry-over is effective added
Actions #75

Updated by livdywan 6 months ago

  • Related to action #136244: Ensure arbitrary comments can be taken over to new jobs size:M added
Actions

Also available in: Atom PDF