Project

General

Profile

Actions

action #164613

closed

Job with RETRY should not be cloned if it was obsoleted

Added by AdamWill 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2024-07-29
Due date:
% Done:

0%

Estimated time:

Description

Check this little job cluster:

What happened there is that 2755002 ran, but while it was running, a new POST was sent that created 2755034. This is because the Fedora packager edited the update and fixed the bug that was causing failures. However, when 2755034 was created and 2755002 was obsoleted, 2755002 also got cloned, because it had RETRY set and had not yet been retried. This resulted in the creation of 2755068, which is considered the most 'current' test run.

Because it was a clone of 2755002, it had the same settings as 2755002, without the new packages that were added when the maintainer edited the update (compare the value of ADVISORY_NVRS_1 in each of the three tests; in 2755002 and 2755068 it has just one package, in 2755034 it has three). This meant it failed. 2755034 passed.

So until I retriggered the tests again just now, the web UI and anything else that looks for the most current result considered the test failed, because they took 2755068 (with the settings reflecting the old state of the update) instead of 2755034 (with the settings reflecting the new state of the update).

I would argue that if a new POST happens, that probably indicates that we want the test created by that POST to be the 'current' test, not a clone of a previously-posted test with RETRY set which was running when the POST happened. So I think we should not clone a job with RETRY set if it ended because was obsoleted by a new POST.

Actions #2

Updated by okurz 2 months ago

  • Category set to Feature requests
  • Status changed from New to Feedback
  • Assignee set to okurz
  • Target version set to Ready
Actions #3

Updated by okurz 2 months ago

https://github.com/os-autoinst/openQA/pull/5797 merged. Trying with consistently not trying a retry on all aborted results: https://github.com/os-autoinst/openQA/pull/5806

Actions #4

Updated by okurz 2 months ago

  • Due date set to 2024-08-14
Actions #5

Updated by okurz 2 months ago

  • Due date deleted (2024-08-14)
  • Status changed from Feedback to Resolved

https://github.com/os-autoinst/openQA/pull/5806 merged and also deployed on OSD meanwhile: https://mailman.suse.de/mlarch/SuSE/openqa/2024/openqa.2024.08/msg00000.html

No obvious issues seen right now. I guess it would be a long time to see if there are related problems anyway so resolving.

Actions

Also available in: Atom PDF