Project

General

Profile

action #15844

coordination #13812: [epic][dashboard] openQA Dashboard ideas

[tools]finish at least one build per day

Added by okurz almost 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2017-01-10
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

User story

As a release manager of a product with many builds per day I want to have at least one build fully finished and tested so that late jobs are not skipped and I get a full picture of product quality

acceptance criteria

  • AC1:
    • Given a job group with only builds older than 1d (e.g. 2d)
    • When two new builds are triggered consecutively in short time
    • Then the first build is not obsoleted by the later one
  • AC2: (regression test)
    • Given a job group with a recent build (e.g. from the same day)
    • When many new builds are triggered consecutively in short time
    • Then the first build is obsoleted by the later one jobs of older builds are obsoleted after a certain limit of tests are scheduled

tasks

I can already think of two ways how to do it:

  1. From the sync and trigger side, on a new build: If there are no completed builds, i.e. with no skipped jobs, within the last 24h, call sync with _NOOBSOLETEBUILD. That's it. That should do no harm if there are builds within the last 24h that are completely finished anyway.
  2. From an external script detect a build which is the most recent one after at least one day, mark it as important by build tagging and remove that comment later on again

okurz prefers 1.

optional: Follow the approach mentioned in #9760#note-8 which is:

  • first check current implementation if changing the priority on currently scheduled jobs has an influence on the test suite or just the job
  • do not obsolete old builds but instead on new iso: if jobs for old build are in state scheduled, set priority-10
  • if priority of all scheduled jobs within one build are all equal 0, obsolete the build

further details

The current manual approach is that the QA reviewer of the day decides this and can mark one build as important.

The time of "24h" is open for discussion, it might be a different time, e.g. 12h, or a number of builds, or a combination of both.

That might be a feature request on openQA itself or maybe on the supporting workflows and the scripts we use for providing media to test.

Also see #9760#note-7 for notes about use of _NOOBSOLETEBUILD

History

#1 Updated by RBrownSUSE almost 5 years ago

From the sync and trigger side, on a new build: If there are no builds within the last >24h, call sync with _NOOBSOLETEBUILD. That's it. That should do no harm if there are >builds within the last 24h that are completely finished anyway.

I think it should be "If there are no 'completed' builds within 24 hours (ie no builds with skipped jobs)"

This will also then resolve the 'lots of half tested builds' problem by ensuring we do at least one full test of at least one build every 24 hours

#2 Updated by okurz almost 5 years ago

  • Description updated (diff)

yes, right, included.

  • Added: optional implementation proposal about decreasing priority of older scheduled jobs and not obsoleting
  • Added: comment that "24h" is arbitrary and could be some other time or a number of builds, etc.

#3 Updated by okurz almost 5 years ago

  • Target version set to Milestone 5

#4 Updated by coolo almost 5 years ago

where are we with this? Is one build finishing?

#5 Updated by okurz almost 5 years ago

#13560 was about only syncing complete architectures. In the same change I also prepared for no obsoletion but rbrown and ast prefer it to stay as in before for now: https://gitlab.suse.de/openqa/scripts/merge_requests/59#note_38622
It is just a simple switch but it is off now. the ticket is in state "new", it's not started. Try to convince them, implement it yourself or wait :-)

#6 Updated by okurz almost 5 years ago

  • Target version changed from Milestone 5 to Milestone 6

#7 Updated by okurz over 4 years ago

  • Status changed from New to In Progress
  • Assignee set to okurz

gsd#openqa/scripts#69 targets the approach "1." including "optional".

deployed temporarily already on osd and closely monitor logs.

#8 Updated by okurz over 4 years ago

updated MR as the original approach could not work. It would call "deprioritize-or-cancel" every time when "openqa-iso-sync-sles" would be called and there is no build in progress or a dirty repository is present. I changed the approach to call the script from within rsync.pl just before it triggers an ISO.

#9 Updated by okurz over 4 years ago

so build 0271 came in, solution works in principle but the deprioritizing on all jobs running on a product is done per arch and medium so too much depriorization in one step and also jobs are cancelled then, even running ones

testing the state after initial medium got triggered:

In [11]: jobs = requests.get('https://openqa.suse.de/api/v1/jobs?state=running&state=scheduled&latest=1&flavor=Server-DVD').json()
In [12]: {j['id']: j['settings']['BUILD'] for j in jobs['jobs']}
Out[12]: 
{806799: '0267',
 806801: '0267',
…
 806954: '0267',
 807143: '0271',
…

so suggestions to improve:

  • match more specifically on the exact jobs of one medium/flavor/arch to deprioritize and cancel
  • do not cancel running jobs at all, only scheduled

#10 Updated by RBrownSUSE over 4 years ago

  • Subject changed from finish at least one build per day to [tools]finish at least one build per day

#11 Updated by okurz over 4 years ago

https://github.com/os-autoinst/openQA/pull/1255 merged

https://gitlab.suse.de/openqa/scripts/merge_requests/69 merged

osd is already deployed with that new version.

So far three builds have been triggered for SLE since that change and jobs of old builds are still running. So far the old version of openqa/scripts has been used with "obsolete=1". Comparing the currently scheduled virtualization jobs I can see no depriorization, e.g. sle-12-SP3-Server-DVD-x86_64-Build0278-gi-guest_sles11sp4-on-host_sles12sp3-kvm@64bit-ipmi has prio 50 as well as sle-12-SP3-Server-DVD-x86_64-Build0274-gi-guest_sles11sp4-on-host_sles12sp3-kvm@64bit-ipmi as expected.

Also debug output is available now, e.g.:

[Thu Mar 16 15:17:43 2017] [7918:debug] Triggering new iso with build '0126', obsolete: 1, deprioritize: 0        
[Thu Mar 16 16:17:24 2017] [16888:debug] Triggering new iso with build '0127', obsolete: 1, deprioritize: 0       

Checked out master of openqa/scripts on osd and re-enabled cron-job so now waiting for new build which is triggered by the new version of trigger scripts.

#12 Updated by okurz over 4 years ago

  • Description updated (diff)
  • Status changed from In Progress to Resolved

now new build 0279 was triggered. https://openqa.suse.de/tests/815422 is an example of an "old" job of build 0277 which is still scheduled but deprioritized. Priority value 60 instead of previously 50. The corresponding job from 0279 is scheduled but with priority 50 so should be executed first.

That should ensure that the user story is fulfilled includidng the (updated) acceptance criteria

Also available in: Atom PDF