Project

General

Profile

Actions

action #101520

closed

[bot-ng] Stop very frequent scheduling of single incident jobs size:M

Added by dzedro over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-10-27
Due date:
% Done:

0%

Estimated time:

Description

Observation

I don't know how often are single jobs rescheduled, but stop this insanity.
Tests are blindly rotating because they even don't finish or something failed.
Below is one of many updates e.g. :21228:samba 3501 single jobs where 2037 are obsoleted and just 30 failed.

select id, test, version, result from jobs where build = ':21228:samba' and result = 'failed' order by id desc;
   id    |                  test                   | version | result 
---------+-----------------------------------------+---------+--------
 7513541 | qam_ha_hawk_client                      | 15-SP2  | failed
 7507613 | qam-regression-installation-SLED        | 15-SP2  | failed
 7507543 | cryptlvm                                | 15.3    | failed
 7507483 | qam_ha_hawk_client                      | 15-SP2  | failed
 7503579 | qam_ha_hawk_client                      | 15-SP2  | failed
 7496686 | qam_ha_qdevice_node2                    | 15-SP2  | failed
 7491914 | cryptlvm                                | 15.3    | failed
 7489156 | qam-incidentinstall                     | 15-SP2  | failed
 7488544 | qam-incidentinstall-ha                  | 15-SP2  | failed
 7482383 | mau-extratests1                         | 15-SP2  | failed
…
 7141789 | qam_ha_hawk_client                      | 15-SP2  | failed
(30 rows)
select id, test, version, result from jobs where build = ':21228:samba' order by id desc;
   id    |                      test                      | version |      result      
---------+------------------------------------------------+---------+------------------
 7545733 | qam-incidentinstall                            | 15-SP2  | none
 7545699 | mau-extratests-docker                          | 15-SP2  | none
 7545698 | mau-extratests1                                | 15-SP2  | none
… (yes, we get it, the list is very long :) )
 7329994 | qam_ha_rolling_update_support_server           | 15-SP2  | obsoleted
(2037 rows)

Problem

The component that triggers these tests is gitlab.suse.de/qa-maintenance/bot-ng and AFAIK (okurz) there were no recent changes in this component that should explain a change in scheduling behaviour.

Expected result

Less tests are triggered (but how many "less"?)

Suggestions

  • Ask coolo what he means with "the repo is moving"
  • Look into the gitlab CI pipelines https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines , e.g. from the latest log (at time of writing) https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/239575 , to understand what is going on, if there is something unusual happening
  • Potentially temporarily reduce the frequency with which the pipeline is triggered to prevent "too many" jobs to be triggered
  • Find out if this only happens for some incidents or all
  • Try to find what the tool is expected to do vaguely
  • What could be tried optionally (like a guess) is to apply a timeout (or cooldown time) per incident to not schedule too often regardless what triggered changes

Files


Related issues 1 (0 open1 closed)

Related to openQA Project - action #102347: bot-ng: repohash calculationResolvedjbaier_cz2021-11-12

Actions
Actions

Also available in: Atom PDF