Project

General

Profile

Actions

action #156277

open

[qe-core] Some jobs are retriggered by geekotest (likely obs-sync plugin) without clear answer why

Added by szarate 2 months ago. Updated 2 months ago.

Status:
New
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-10-26
Due date:
% Done:

0%

Estimated time:

Description

Observation

I found many scheduled test suites waiting for ipmi workers, but the number of running tests (also using ipmi worker) is less than the number of ipmi workers. It is very obvious and can be seen easily in openQA Build page like this https://openqa.suse.de/tests/overview?distri=sle&version=15-SP6&build=28.1&groupid=263.

Steps to reproduce

  • Will update if there are known steps help reproduce

Impact

Workers are already assigned to tests that are supposed to stop. So resource is wasted.

Problem

After investigating this a while, I found some workers are executing some tests secretly, which means it can not be seen from above openQA Build page, for example:

All test suites were already cancelled.

But after navigating into specific worker page, I found it was executing a test suite secretly, for example:

Because the running one is not on top, so it can not be seen by just look up in openQA job group or build page.

Let's take worker sapworker3:2 and test suite sle-15-SP6-Online-x86_64-Build28.1-uefi-gi-guest_sles12sp5-on-host_developing-kvm@64bit-ipmi-large-mem as an example. The scheduled test run https://openqa.suse.de/tests/12668997 was already cancelled, and it is expected to stop. But it was then triggered and run again secretly as another test run https://openqa.suse.de/tests/12664633 without being displayed in openQA job group or build page. Although the test number 12664633 is smaller than 12668997, the 12668997 was cancelled hours earlier. And I confirmed sapworker3:2 is assigned to test 12664633 after it became idle, and at that time point, test 12668997 was already cancelled hours ago. The 12664633 is hidden in sapworker3:2 worker page, for example:

So I have to click the "working" button right to sapworker3:2 to discover it.

I did not manually triggered or scheduled any relevant test for this issue.

Suggestions

  • Show running test run always on top if feasible
  • Better schedule and trigger logic

Workaround

n/a


Files

Selection_284.png (49.9 KB) Selection_284.png waynechen55, 2023-10-26 07:41
Selection_283.png (110 KB) Selection_283.png waynechen55, 2023-10-26 07:43
Selection_285.png (121 KB) Selection_285.png waynechen55, 2023-10-26 07:49
Selection_284.png (49.9 KB) Selection_284.png waynechen55, 2023-10-26 08:06
Selection_283.png (110 KB) Selection_283.png waynechen55, 2023-10-26 08:06
Selection_285.png (121 KB) Selection_285.png waynechen55, 2023-10-26 08:06
Selection_287.png (78.2 KB) Selection_287.png waynechen55, 2023-11-03 00:06
Selection_288.png (99.2 KB) Selection_288.png waynechen55, 2023-11-03 00:06
6th_rounds.png (39.8 KB) 6th_rounds.png Julie_CAO, 2023-11-03 02:21
5th_rounds.png (45 KB) 5th_rounds.png Julie_CAO, 2023-11-03 02:21
4th_rounds.png (44 KB) 4th_rounds.png Julie_CAO, 2023-11-03 02:21
3rd_rounds.png (45.4 KB) 3rd_rounds.png Julie_CAO, 2023-11-03 02:21

Related issues 1 (1 open0 closed)

Copied from openQA Project - action #138593: Restart of scheduled products is prone to retriggers by humansNew2023-10-26

Actions
Actions #1

Updated by szarate 2 months ago

  • Copied from action #138593: Restart of scheduled products is prone to retriggers by humans added
Actions #2

Updated by szarate 2 months ago

  • Subject changed from Some jobs are retriggered by geekotest (likely obs-sync plugin) without clear answer why to [qe-core] Some jobs are retriggered by geekotest (likely obs-sync plugin) without clear answer why
  • Category set to Regressions/Crashes
  • Assignee set to szarate

Oli, I'm leaving it on qe-core, no need to kick it out of openQA project.

Actions #3

Updated by okurz 2 months ago

That's fine. The ticket belongs here. The decision regarding planning that we do is on all tickets without "target version" — so not this one — if we should do it "now" -> "Ready" or "not now" -> "future"

Actions #4

Updated by Julie_CAO 2 months ago

The entire build 59.2 was retriggered again, is it expected?

about 11 hours ago  coolo   scheduled   SLE 15-SP6  Online  x86_64  59.2    SLE-15-SP6-Online-x86_64-Build59.2-Media1.iso
Actions #5

Updated by szarate 2 months ago

Julie_CAO wrote in #note-4:

The entire build 59.2 was retriggered again, is it expected?

about 11 hours ago    coolo   scheduled   SLE 15-SP6  Online  x86_64  59.2    SLE-15-SP6-Online-x86_64-Build59.2-Media1.iso

I'm pretty sure we have a bot somewhere, coolo is not working anymore in the company.

I would suggest to delete the user, and see who screams. left a note on slack

Actions

Also available in: Atom PDF