Project

General

Profile

Actions

action #92125

closed

Move "MR" on submission tests into a separate job group

Added by okurz almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2021-05-04
Due date:
2021-07-30
% Done:

0%

Estimated time:

Description

Motivation

Discussed during meeting about "Shift Left": Currently MR "on submission" tests are scheduled as part of already existing incident tests. The focus for "on submission" tests is on selecting super-stable tests. For this we need to be able to select individual job scenarios and also exclude job modules, e.g. using EXCLUDE_MODULES. As first example scenario "mau-sles-robot-fw" was mentioned.

Acceptance criteria

  • AC1: MR on submission tests are scheduled within separate job groups with their own schedule

Suggestions


Related issues 1 (0 open1 closed)

Related to openQA Project - action #95075: Find jobs matching search parameters over /api/v1/jobs (especially documentation) size:SResolvedmkittler2021-07-052021-07-23

Actions
Actions #1

Updated by okurz almost 3 years ago

  • Project changed from 46 to QA
Actions #2

Updated by okurz almost 3 years ago

Ondřej Súkup can you help me with this ticket here? Did you implement the triggering of MR tests within the openQA maintenance bot?

Actions #3

Updated by osukup almost 3 years ago

yes , Implemented this to openqabot ... , I thought focus was to run Incidents tests before is MR accepted to catch incident issues early as possible

Actions #4

Updated by osukup almost 3 years ago

It possible to create own groups for MR incidents, but this mainly increase maintenance overhead for a test groups and bot data ( double +- same configs for openQA and bot) with only small added value in separated builds and better accountability to used resources of openQA -->

Actions #5

Updated by okurz almost 3 years ago

osukup wrote:

It possible to create own groups for MR incidents, but this mainly increase maintenance overhead for a test groups and bot data ( double +- same configs for openQA and bot) with only small added value in separated builds and better accountability to used resources of openQA -->

Exactly. I had the same concern. The idea here is only a very stable and fast subset of scenarios is used for "on submission" tests and that this configuration is maintained by "Maintenance", not by "QE". Could you please provide a bit more information:

  • how to change the bot to trigger a different test schedule for MR than for incident tests?
  • how to configure the bot to trigger according tests? Which "openQA medium" does it trigger for MR tests?

For further reference, the confluence page for "Shift Left" and particular "on submission" testing is available on https://confluence.suse.com/pages/viewpage.action?pageId=723878219

Actions #6

Updated by osukup almost 3 years ago

okurz wrote:

osukup wrote:

It possible to create own groups for MR incidents, but this mainly increase maintenance overhead for a test groups and bot data ( double +- same configs for openQA and bot) with only small added value in separated builds and better accountability to used resources of openQA -->

Exactly. I had the same concern. The idea here is only a very stable and fast subset of scenarios is used for "on submission" tests and that this configuration is maintained by "Maintenance", not by "QE". Could you please provide a bit more information:

  • how to change the bot to trigger a different test schedule for MR than for incident tests? new config for bot +- same as for classic incidents jobs with new specialized flavours

--> + it needs configure new FLAVORs in openQA, new job groubs and of course someone who maintain this/ keep in sync with other

  • how to configure the bot to trigger according tests? Which "openQA medium" does it trigger for MR tests? there is problem - we have fewer info about MR jobs than standard incidents -> so we cant trigger jobs on included binaries

For further reference, the confluence page for "Shift Left" and particular "on submission" testing is available on https://confluence.suse.com/pages/viewpage.action?pageId=723878219

Actions #7

Updated by okurz almost 3 years ago

https://gitlab.suse.de/qa-maintenance/openQABot/-/merge_requests/56 is the original MR that brought in changes for MRs.

"flavor" is important for https://gitlab.suse.de/qa-maintenance/openQABot/-/merge_requests/56/diffs#5e51e5be70701a2e1c4ddcb96edd12c9dd8589c5_174_196 , right? I assume this is configured on qam2.suse.de in /etc/openqa/bot.yml . The file is deployed by ansible from the "qam-metadata-openqabot" package, living in https://gitlab.suse.de/qa-maintenance/metadata/-/blob/master/bot/bot.yml . From git to IBS to host over ansible.

https://gitlab.suse.de/qa-maintenance/openQABot/-/blob/master/systemd/openqabot-mr.timer shows how maintenance requests are scheduled every hour

Actions #8

Updated by okurz almost 3 years ago

  • Subject changed from feasibility to move "MR" on submission tests into a separate job group to Mmove "MR" on submission tests into a separate job group
  • Description updated (diff)
  • Status changed from New to Workable
Actions #9

Updated by okurz almost 3 years ago

  • Subject changed from Mmove "MR" on submission tests into a separate job group to Move "MR" on submission tests into a separate job group
Actions #11

Updated by osukup almost 3 years ago

From my point of view, this change only adds an unnecessary burden. With very small benefits.

Cons:

1) It will practically duplicate QEM Incidents jobs
2) It will be unclear who will maintain this and keep in sync with Incidents Jobs
3) Any new change or fix in QEM Incidents jobs will need to be duplicated in MR groups
4) Data for schedule jobs will be the same, the only thing which will be changed -> FLAVOR
5) Same for OSD, a big bunch of new mediums with difference only in FLAVOR

Benefit:

1) separated MAintenance and QEM jobs ( now is a difference in BUILD )
2) --> better accountability

But, we can separate jobs simply in view if we add the possibility to filter jobs with patterns/regexps for variables or BUILD value in the openQA's test overview.
And +- same thing for accounting - we can differ resource based on BUILD


BUILD in incident jobs is constructed with simple schema:

  • standard job - :INC_NR:PKG_NAME
  • maintenance request - MR:REQ_NR:PKG_NAME

plus L3 runs POC with openQA ( now only kernel)
and for kernel, we have also KOTD jobs which have BUILD=KERNEL_VERSION

Actions #12

Updated by okurz almost 3 years ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz

Thank you for your nice evaluation. I agree with your assessment which is why also in #91082#note-5 I have suggested to go ahead with the current test structure as is. Let's wait for feedback on that.

Setting blocked on #91082

Actions #13

Updated by okurz almost 3 years ago

  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)

#91082#note-6 explains again that the preference is to have that separate job group with its own schedule.

Actions #14

Updated by okurz almost 3 years ago

The topic was brought up again by hrommel1+cyberiad and it was clarified that this request is also more important than "multiple package version in incident" tickets.

Out of scope: Git versioning for the separate job group schedule so I expect a single job group with a single job template of "sles4sap_robot_fw".

Actions #15

Updated by okurz almost 3 years ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

I created https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/merge_requests/144 with osukup to create the schedule. osukup will do according bot code changes.

Actions #16

Updated by openqa_review almost 3 years ago

  • Due date set to 2021-07-07

Setting due date based on mean cycle time of SUSE QE Tools

Actions #18

Updated by okurz almost 3 years ago

  • Assignee changed from okurz to osukup

@osukup what's necessary for https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/merge_requests/144 to be merged? and how are you doing regarding getting the bot aligned for on submission tests?

Actions #19

Updated by osukup almost 3 years ago

okurz wrote:

@osukup what's necessary for https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/merge_requests/144 to be merged? and how are you doing regarding getting the bot aligned for on submission tests?

I merged it, although I'm not directly involved in qam-openqa-yml.

Bot is now sheduling 15-SP3 Incidents-MR for 'On Submission'

Actions #20

Updated by okurz almost 3 years ago

can you please reference according bot code changes?

Actions #21

Updated by okurz almost 3 years ago

Also there are tests like https://openqa.suse.de/tests/6328338 still triggered in the other job groups while we should only trigger within the "on submission" job group.

Actions #22

Updated by okurz almost 3 years ago

I did https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/merge_requests/151 to fix the dependencies for the test suites. Tests are in place, separate job group, separate schedule, maintained in git repo. I wanted to verify that the two existing scenarios are fine but that is currently blocked by the network problems within SUSE R&D. The problem is within https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/jobs/477393#L28 because gitlab CI runners can't access gitlab.suse.de. That's a problem that had been reported to EngInfra already.

EDIT: I could successfully retrigger so https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/jobs/477873 has now succeeded
now we can wait for https://openqa.suse.de/tests/6355977 as one example to show if the test works for a MR

EDIT: The above was cancelled (reason unknown) but https://openqa.suse.de/tests/6355975 is passed

Actions #23

Updated by osukup almost 3 years ago

solution with '' instead of version doesn't really work --> back to trees and copy/paste a big bunch of products templates

*) -> bot schedules for this test group 15-SP2 and 15-SP3 ... but only last scheduled is run

Actions #24

Updated by okurz almost 3 years ago

What do you mean with "version" here? And what trees?

EDIT: ok, I understood the part about the version now. But I don't see a need to do any copy-pasting here. Either we don't need to schedule more than one version or we extend openQA to provide what we need from it

Also, still, can you please reference according bot code changes that you did? Simply a link to a merge request or commits

Also there are tests like https://openqa.suse.de/tests/overview?distri=sle&version=15-SP2&build=MR%3A244015%3Aclamav&groupid=306 still triggered in the incident job groups while we should only trigger within the "on submission" job group. Can you please comment on that?

Actions #25

Updated by okurz almost 3 years ago

@osukup why have you done https://gitlab.suse.de/qa-maintenance/metadata/-/merge_requests/491 ? I have explained to you explicitly that I don't see the need to schedule any more tests for MR for now as long as we don't have the feedback from maintenance that this is what they need. And please give others a chance for merge request review. I suggest you revert the MR before introducing this big blob of hard to maintain text duplication.

Actions #26

Updated by okurz almost 3 years ago

  • Status changed from In Progress to Feedback
  • Assignee changed from osukup to okurz

The problem has been realized by jmichel in https://chat.suse.de/channel/qa-sap-ha?msg=AAWzYCkioikz5rLgQ as well stating that the test schedule should be reduced again. I couldn't discuss the problem with you over other channels hence I now created a revert with https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/merge_requests/156

Let me handle the next steps and crosscheck with requesters what to do next.

Actions #27

Updated by okurz almost 3 years ago

Someone also created multiple job groups https://openqa.suse.de/group_overview/390 https://openqa.suse.de/group_overview/389 https://openqa.suse.de/group_overview/388 https://openqa.suse.de/group_overview/387 . I don't think we should have multiple job groups and not individual ones per version.

Actions #28

Updated by okurz almost 3 years ago

  • Related to action #95075: Find jobs matching search parameters over /api/v1/jobs (especially documentation) size:S added
Actions #29

Updated by okurz almost 3 years ago

  • Assignee changed from okurz to osukup

next coordination meeting conducted, see notes in https://confluence.suse.com/pages/viewpage.action?pageId=723878219&focusedCommentId=778469603#comment-778469603

  • Delete version specific job groups
  • Check why there are no tests scheduled since 5 days in "On Submission" test
  • Delete schedule for MR tests in single incident groups
  • Adapt openQABot bot to feed back results from "on submission" job group, not MR tests within single incident group

@osukup which parts can you take over?

Actions #30

Updated by okurz almost 3 years ago

  • Due date changed from 2021-07-07 to 2021-07-30
  • Priority changed from High to Normal

As it shows we rely heavily on the previous knowledge of individuals hence we effectively can not treat this with high prio. Due to unforeseen absence bumping the due-date to a much longer time in the future for grace-time.

Actions #31

Updated by ilausuch over 2 years ago

@osukup, Do you need help on this ticket? Maybe someone in the team could help you,

Actions #32

Updated by okurz over 2 years ago

I just found out about https://gitlab.suse.de/qa-maintenance/openQABot/-/blob/master/systemd/openqabot-mrsep.timer which I was not aware about in before. If this is the service that triggers "On Submission" tests in service specific job groups then we do not need them at all and this service should be disabled.

Actions #33

Updated by osukup over 2 years ago

  • Delete version specific job groups done
  • Check why there are no tests scheduled since 5 days in "On Submission" test because if is defined flavor with same name but correct version has higher priority than with *
  • Delete schedule for MR tests in single incident groups done
  • Adapt openQABot bot to feed back results from "on submission" job group, not MR tests within single incident done
Actions #34

Updated by osukup over 2 years ago

and from first results .. as excepted only last sheduled version is started, and of course SLE12SP* jobs will be failed

Actions #35

Updated by livdywan over 2 years ago

osukup wrote:

and from first results .. as excepted only last sheduled version is started, and of course SLE12SP* jobs will be failed

Are you going to work on the 4 items above? In that case please update the due date and make this "in progress". And if possible some outlook of the steps required, so others can help out and validate if things work as expected

Actions #36

Updated by osukup over 2 years ago

cdywan wrote:

osukup wrote:

and from first results .. as excepted only last sheduled version is started, and of course SLE12SP* jobs will be failed

Are you going to work on the 4 items above? In that case please update the due date and make this "in progress". And if possible some outlook of the steps required, so others can help out and validate if things work as expected

4 items above .. all marked done, but I don't think current state is what is desired or a valid solution .. ( but on the brighter side, load on openqa will be significantly lower)

Actions #37

Updated by okurz over 2 years ago

osukup wrote:

  • Check why there are no tests scheduled since 5 days in "On Submission" test

because if is defined flavor with same name but correct version has higher priority than with *

so deleting the version specific product definitions in the schedule again should fix it I assume. This can be an improvement point for the future to simplify scheduling from openQA side without needing clunky workarounds from test schedule maintainers

EDIT: I provided an update about the current status in https://chat.suse.de/group/initialmr?msg=kRSgNNFwD3FdPmfjz

Triggering fixed. Now again only tests within the job group "On Submission" are triggered and no other tests in any other job group: https://openqa.suse.de/parent_group_overview/36#grouped_by_build . https://openqa.suse.de/tests/overview?distri=sle&version=15-SP3&build=MR:247582:aspell&groupid=385 is an example of 2 passed jobs. https://openqa.suse.de/tests/6639169#step/accept_license/2 is a new failure, reason unknown

Actions #38

Updated by livdywan over 2 years ago

  • Due date changed from 2021-07-30 to 2021-08-06

EDIT: I provided an update about the current status in https://chat.suse.de/group/initialmr?msg=kRSgNNFwD3FdPmfjz

I don't know what's mentioned there so I will read this as: status is still being discussed and at the least takes til the end of the week.

Actions #39

Updated by okurz over 2 years ago

  • Due date changed from 2021-08-06 to 2021-07-30

cdywan wrote:

EDIT: I provided an update about the current status in https://chat.suse.de/group/initialmr?msg=kRSgNNFwD3FdPmfjz

I don't know what's mentioned there so I will read this as: status is still being discussed and at the least takes til the end of the week.

No. I only provided what is mentioned in #92125#note-37 already. Further improvements how to handle a similar situation in the future have already been discussed outside this ticket. If osukup does not see further tasks necessary to cover AC1 from #92125#Acceptance-criteria we can resolve.

Actions #40

Updated by osukup over 2 years ago

yes , AC1 seem fulfilled - separate job_group and schedule

Actions #41

Updated by osukup over 2 years ago

  • Status changed from Feedback to Resolved
Actions #42

Updated by hrommel1 over 2 years ago

  • Status changed from Resolved to In Progress

Reopened and put into progress because we still have jobs groups bound to a specific product version:

https://openqa.suse.de/group_overview/390
https://openqa.suse.de/group_overview/389
https://openqa.suse.de/group_overview/388
https://openqa.suse.de/group_overview/387

AFAIR the agreement was to remove those groups and have everything in group

https://openqa.suse.de/group_overview/385

Actions #43

Updated by osukup over 2 years ago

  • Status changed from In Progress to Resolved

hrommel1 wrote:

Reopened and put into progress because we still have jobs groups bound to a specific product version:

https://openqa.suse.de/group_overview/390
https://openqa.suse.de/group_overview/389
https://openqa.suse.de/group_overview/388
https://openqa.suse.de/group_overview/387

AFAIR the agreement was to remove those groups and have everything in group

https://openqa.suse.de/group_overview/385

everything is in this group (385) , all other groups are residue from past ...

Actions

Also available in: Atom PDF