Project

General

Profile

Actions

action #180020

open

Aggregates on 12-SP5 x86_64 don't show all tests size:S

Added by dzedro 15 days ago. Updated 3 days ago.

Status:
Feedback
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2025-04-04
Due date:
2025-04-24 (Due in 5 days)
% Done:

0%

Estimated time:

Description

Observation

When you compare Server-DVD-Updates & Server-DVD-Updates-LTSS-ES on aggregates page some results, I think only on Server-DVD-Updates, are missing.
For example test mru-install-minimal-with-addons is not visible on the aggregates page, but I could find it trough dependencies.
When I delete &groupid=414 from the aggregates page the test mru-install-minimal-with-addons is visible, I guess all tests are visible.

Acceptance criteria

  • AC1: We know what makes the "invisible" test special.
  • AC2: The way tests are scheduled has been changed or a limitation of the openQA web UI has been resolved so these tests are displayed.

Suggestions

  • Maybe related to #177048#change-911905, but there was also issue only on Server-DVD-Updates 12-SP5.
  • Worst thing on this is that there can be failure e.g. https://openqa.suse.de/tests/17262820 and also dashboard does not see/show it.
  • Most likely the problem is that the job is not part of any job group. So openQA is corred when it does not show this job for when a job group parameter is required. This leaves the question why the job was scheduled outside of a job group.

Files

Actions #1

Updated by dzedro 15 days ago

  • Description updated (diff)
Actions #2

Updated by okurz 15 days ago

  • Tags set to reactive work
  • Target version set to Ready
Actions #3

Updated by okurz 11 days ago

  • Priority changed from Normal to High
Actions #4

Updated by livdywan 10 days ago

  • Subject changed from Aggregates on 12-SP5 x86_64 don't show all tests to Aggregates on 12-SP5 x86_64 don't show all tests size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #5

Updated by ybonatakis 10 days ago

  • Status changed from Workable to In Progress
  • Assignee set to ybonatakis
Actions #6

Updated by openqa_review 9 days ago

  • Due date set to 2025-04-24

Setting due date based on mean cycle time of SUSE QE Tools

Actions #7

Updated by ybonatakis 9 days ago ยท Edited

AC1: We know what makes the "invisible" test special.

A very simple explanation of this, in a first level investigation is that the group_id is NULL

longer analysis:
lib/OpenQA/WebAPI/Controller/Test.pm > overview function runs the following sql

SELECT me.id, me.result_dir, me.archived, me.state, me.priority, me.result,
me.reason, me.clone_id, me.blocked_by_id, me.TEST, me.DISTRI, me.VERSION,
me.FLAVOR, me.ARCH, me.BUILD, me.MACHINE, me.group_id, me.assigned_worker_id,
me.t_started, me.t_finished, me.logs_present, me.passed_module_count,
me.failed_module_count, me.softfailed_module_count, me.skipped_module_count,
me.externally_skipped_module_count, me.scheduled_product_id, me.result_size,
me.t_created, me.t_updated FROM jobs me WHERE ( ( me.clone_id IS NULL AND
me.group_id IN ( ? ) AND me.DISTRI = ? AND me.VERSION = ? AND me.BUILD IN ( ?
) ) ) ORDER BY me.id DESC: '414', 'sle', '12-SP5', '20250403-1'

and it doesnt look to have any problem (no recent changes in the code as well)

But if you remove the me.BUILD you get

17258527    17258527-sle-12-SP5-Server-DVD-Updates-x86_64-Build20250403-1-mru-install-minimal-with-addons@64bit false   done    50  passed              mru-install-minimal-with-addons sle 12-SP5  Server-DVD-Updates  x86_64  20250403-1  64bit       3435    2025-04-03 21:57:55.000 2025-04-03 22:17:23.000 false   11  0   0   0   0   2746417 2642748 2025-04-03 20:23:07.000 2025-04-09 00:46:03.000
17258493    17258493-sle-12-SP5-Server-DVD-Updates-x86_64-Build20250403-1-mru-install-minimal-with-addons@uefi_sle12    false   done    50  passed              mru-install-minimal-with-addons sle 12-SP5  Server-DVD-Updates  x86_64  20250403-1  uefi_sle12      2788    2025-04-03 21:55:11.000 2025-04-03 22:21:00.000 false   11  0   0   0   0   2746417 3115776 2025-04-03 20:23:07.000 2025-04-09 00:46:01.000
17257856    17257856-sle-12-SP5-Server-DVD-Updates-LTSS-ES-x86_64-Build20250403-1-mru-install-minimal-with-addons@64bit false   done    50  passed              mru-install-minimal-with-addons sle 12-SP5  Server-DVD-Updates-LTSS-ES  x86_64  20250403-1  64bit   414 3437    2025-04-03 21:49:06.000 2025-04-03 22:09:15.000 true    11  0   0   0   0   2746409 12249957    2025-04-03 20:23:04.000 2025-04-03 22:09:15.000
17257844    17257844-sle-12-SP5-Server-DVD-Updates-s390x-Build20250403-1-mru-install-minimal-with-addons@s390x-kvm  false   done    35  passed              mru-install-minimal-with-addons sle 12-SP5  Server-DVD-Updates  s390x   20250403-1  s390x-kvm   414 2653    2025-04-03 20:23:16.000 2025-04-03 20:45:36.000 true    12  0   0   0   0   2746410 12479078    2025-04-03 20:23:04.000 2025-04-03 20:45:36.000
17257837    17257837-sle-12-SP5-Server-DVD-Updates-LTSS-ES-x86_64-Build20250403-1-mru-install-minimal-with-addons@uefi_sle12    false   done    50  passed              mru-install-minimal-with-addons sle 12-SP5  Server-DVD-Updates-LTSS-ES  x86_64  20250403-1  uefi_sle12  414 2605    2025-04-03 21:49:08.000 2025-04-03 22:09:31.000 true    11  0   0   0   0   2746409 12643135    2025-04-03 20:23:04.000 2025-04-03 22:09:31.000
17257794    17257794-sle-12-SP5-Server-DVD-Updates-aarch64-Build20250403-1-mru-install-minimal-with-addons@aarch64-virtio   false   done    50  passed              mru-install-minimal-with-addons sle 12-SP5  Server-DVD-Updates  aarch64 20250403-1  aarch64-virtio  414 3059    2025-04-04 01:10:08.000 2025-04-04 01:34:35.000 true    11  0   0   0   0   2746420 13926102    2025-04-03 20:23:04.000 2025-04-04 01:34:35.000

first two with ids 17258527 and 17258493 do not have group_id.
They are both also do not have logs. I just mentioned this as an observation (is it possible cleanup to remove group_ids?)

AC2: The way tests are scheduled has been changed or a limitation of the openQA web UI has been resolved so these tests are displayed.

I dont know what exactly is the right thing to do for AC2.
Change the group_id in db?
Dig deeper?

Also thinking to decrease priority. wdyt?

Actions #8

Updated by tinita 9 days ago

  • Priority changed from High to Normal
Actions #9

Updated by tinita 9 days ago

It is not a problem anymore right now, but we should find out why the job group was missing

Actions #10

Updated by ybonatakis 9 days ago

  • Status changed from In Progress to Feedback
  • Assignee changed from ybonatakis to dzedro

@dzedro assigned back to you. The coclusion is that the missing jobs do not have group_id and that causes the problem in the dashboard.
The question now is why and I hope you could shed some light.
If you think there is something the qa tools has to fix, could you provide steps to reproduce the issue

Actions #11

Updated by dzedro 5 days ago

  • Assignee changed from dzedro to ybonatakis

What do you mean I could shed some light on the issue ?
Of course QA tools has to fix it, who does change openQA backend ? QA tools ?!
I have no idea why some jobs don't have group_id, it's reproduced every day on 12-SP5 Server-DVD-Updates!
Did this happen out of blue sky ? QA Core didn't touch this code. If QA tools has no idea then it's miracle.
Let's call pope for help.

Actions #12

Updated by tinita 5 days ago

  • Priority changed from Normal to High

Raising priority again. I don't remember why we concluded that it was fixed. @ybonatakis can you investigate what went wrong in scheduling?
The __CI_JOB_URL in the scheduled product links to the pipeline where the product was scheduled.

Actions #13

Updated by tinita 5 days ago

FYI, the first time this job had no group was https://openqa.suse.de/tests/17198161#next_previous march 28

Actions #14

Updated by mkittler 4 days ago

This change from March 28 looks suspicious: https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/commit/833f6a32d881c9777f499eb2c4e38b1c265bf6f1
I suppose the real change would be a change within one of the scheduling tables on OSD (and it is just reflected in that commit), or is it the other way around?

Actions #15

Updated by mkittler 4 days ago

I found a mistake in a previous change I made to the scheduling. I'll come up with a fix that will remove at least one place in the code that would allow for missing groups. (And I haven't found another place yet.)

Actions #16

Updated by tinita 4 days ago

  • Status changed from Feedback to In Progress
  • Assignee changed from ybonatakis to mkittler

We three just sat together and looked at the database and the code, and Marius found something. Reassigning

Actions #17

Updated by mkittler 4 days ago

This change should fix one concrete regression that can lead to groupless jobs: https://github.com/os-autoinst/openQA/pull/6391

I haven't added any unit tests yet; maybe I can simply adjust a test that is now failing.

If you like you can assign this issue to me. I would then conclude this issue after the change is merged. I've gone though the code for a while and I don't think there are any other problems that would lead to jobs wrongly scheduled without a group.

Actions #18

Updated by dzedro 4 days ago

Thank you very much, so at the end it was related. ๐Ÿ˜‰

Actions #19

Updated by mkittler 3 days ago

  • Status changed from In Progress to Feedback

The PR has been merged and deployed as of 16.04.25 07:21.


This query can be used to check for problematic jobs (jobs which should have a group as they were scheduled via a scheduled product but have none):

select id, t_created, test from jobs where group_id is null and scheduled_product_id is not null and (select count(id) from scheduled_products where scheduled_products.id = scheduled_product_id and settings ->> '_GROUP_ID' = '0') = 0 order by id desc;

So the following query can be used to check whether there are still problematic jobs after my change:

select id, t_created, test from jobs where t_created > '2025-04-16 08:00:00' and group_id is null and scheduled_product_id is not null and (select count(id) from scheduled_products where scheduled_products.id = scheduled_product_id and settings ->> '_GROUP_ID' = '0') = 0 order by id desc;

I'll do this again next week to see whether my change was effective. I guess one has to watch out for false-positives, tough. (The query might return false-positives for jobs scheduled via scenario definitions YAML.)
Maybe this will return false-positives for jobs that

Actions

Also available in: Atom PDF