action #180020
openAggregates on 12-SP5 x86_64 don't show all tests size:S
0%
Description
Observation¶
When you compare Server-DVD-Updates
& Server-DVD-Updates-LTSS-ES
on aggregates page some results, I think only on Server-DVD-Updates, are missing.
For example test mru-install-minimal-with-addons is not visible on the aggregates page, but I could find it trough dependencies.
When I delete &groupid=414
from the aggregates page the test mru-install-minimal-with-addons is visible, I guess all tests are visible.
Acceptance criteria¶
- AC1: We know what makes the "invisible" test special.
- AC2: The way tests are scheduled has been changed or a limitation of the openQA web UI has been resolved so these tests are displayed.
Suggestions¶
- Maybe related to #177048#change-911905, but there was also issue only on Server-DVD-Updates 12-SP5.
- Worst thing on this is that there can be failure e.g. https://openqa.suse.de/tests/17262820 and also dashboard does not see/show it.
- Most likely the problem is that the job is not part of any job group. So openQA is corred when it does not show this job for when a job group parameter is required. This leaves the question why the job was scheduled outside of a job group.
Files
Updated by ybonatakis 10 days ago
- Status changed from Workable to In Progress
- Assignee set to ybonatakis
Updated by openqa_review 9 days ago
- Due date set to 2025-04-24
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ybonatakis 9 days ago ยท Edited
AC1: We know what makes the "invisible" test special.
A very simple explanation of this, in a first level investigation is that the group_id is NULL
longer analysis:
lib/OpenQA/WebAPI/Controller/Test.pm > overview function runs the following sql
SELECT me.id, me.result_dir, me.archived, me.state, me.priority, me.result,
me.reason, me.clone_id, me.blocked_by_id, me.TEST, me.DISTRI, me.VERSION,
me.FLAVOR, me.ARCH, me.BUILD, me.MACHINE, me.group_id, me.assigned_worker_id,
me.t_started, me.t_finished, me.logs_present, me.passed_module_count,
me.failed_module_count, me.softfailed_module_count, me.skipped_module_count,
me.externally_skipped_module_count, me.scheduled_product_id, me.result_size,
me.t_created, me.t_updated FROM jobs me WHERE ( ( me.clone_id IS NULL AND
me.group_id IN ( ? ) AND me.DISTRI = ? AND me.VERSION = ? AND me.BUILD IN ( ?
) ) ) ORDER BY me.id DESC: '414', 'sle', '12-SP5', '20250403-1'
and it doesnt look to have any problem (no recent changes in the code as well)
But if you remove the me.BUILD
you get
17258527 17258527-sle-12-SP5-Server-DVD-Updates-x86_64-Build20250403-1-mru-install-minimal-with-addons@64bit false done 50 passed mru-install-minimal-with-addons sle 12-SP5 Server-DVD-Updates x86_64 20250403-1 64bit 3435 2025-04-03 21:57:55.000 2025-04-03 22:17:23.000 false 11 0 0 0 0 2746417 2642748 2025-04-03 20:23:07.000 2025-04-09 00:46:03.000
17258493 17258493-sle-12-SP5-Server-DVD-Updates-x86_64-Build20250403-1-mru-install-minimal-with-addons@uefi_sle12 false done 50 passed mru-install-minimal-with-addons sle 12-SP5 Server-DVD-Updates x86_64 20250403-1 uefi_sle12 2788 2025-04-03 21:55:11.000 2025-04-03 22:21:00.000 false 11 0 0 0 0 2746417 3115776 2025-04-03 20:23:07.000 2025-04-09 00:46:01.000
17257856 17257856-sle-12-SP5-Server-DVD-Updates-LTSS-ES-x86_64-Build20250403-1-mru-install-minimal-with-addons@64bit false done 50 passed mru-install-minimal-with-addons sle 12-SP5 Server-DVD-Updates-LTSS-ES x86_64 20250403-1 64bit 414 3437 2025-04-03 21:49:06.000 2025-04-03 22:09:15.000 true 11 0 0 0 0 2746409 12249957 2025-04-03 20:23:04.000 2025-04-03 22:09:15.000
17257844 17257844-sle-12-SP5-Server-DVD-Updates-s390x-Build20250403-1-mru-install-minimal-with-addons@s390x-kvm false done 35 passed mru-install-minimal-with-addons sle 12-SP5 Server-DVD-Updates s390x 20250403-1 s390x-kvm 414 2653 2025-04-03 20:23:16.000 2025-04-03 20:45:36.000 true 12 0 0 0 0 2746410 12479078 2025-04-03 20:23:04.000 2025-04-03 20:45:36.000
17257837 17257837-sle-12-SP5-Server-DVD-Updates-LTSS-ES-x86_64-Build20250403-1-mru-install-minimal-with-addons@uefi_sle12 false done 50 passed mru-install-minimal-with-addons sle 12-SP5 Server-DVD-Updates-LTSS-ES x86_64 20250403-1 uefi_sle12 414 2605 2025-04-03 21:49:08.000 2025-04-03 22:09:31.000 true 11 0 0 0 0 2746409 12643135 2025-04-03 20:23:04.000 2025-04-03 22:09:31.000
17257794 17257794-sle-12-SP5-Server-DVD-Updates-aarch64-Build20250403-1-mru-install-minimal-with-addons@aarch64-virtio false done 50 passed mru-install-minimal-with-addons sle 12-SP5 Server-DVD-Updates aarch64 20250403-1 aarch64-virtio 414 3059 2025-04-04 01:10:08.000 2025-04-04 01:34:35.000 true 11 0 0 0 0 2746420 13926102 2025-04-03 20:23:04.000 2025-04-04 01:34:35.000
first two with ids 17258527 and 17258493 do not have group_id.
They are both also do not have logs. I just mentioned this as an observation (is it possible cleanup to remove group_ids?)
AC2: The way tests are scheduled has been changed or a limitation of the openQA web UI has been resolved so these tests are displayed.
I dont know what exactly is the right thing to do for AC2.
Change the group_id in db?
Dig deeper?
Also thinking to decrease priority. wdyt?
Updated by ybonatakis 9 days ago
- Status changed from In Progress to Feedback
- Assignee changed from ybonatakis to dzedro
@dzedro assigned back to you. The coclusion is that the missing jobs do not have group_id and that causes the problem in the dashboard.
The question now is why and I hope you could shed some light.
If you think there is something the qa tools has to fix, could you provide steps to reproduce the issue
Updated by dzedro 5 days ago
- Assignee changed from dzedro to ybonatakis
What do you mean I could shed some light on the issue ?
Of course QA tools has to fix it, who does change openQA backend ? QA tools ?!
I have no idea why some jobs don't have group_id, it's reproduced every day on 12-SP5 Server-DVD-Updates!
Did this happen out of blue sky ? QA Core didn't touch this code. If QA tools has no idea then it's miracle.
Let's call pope for help.
Updated by tinita 5 days ago
- Priority changed from Normal to High
Raising priority again. I don't remember why we concluded that it was fixed. @ybonatakis can you investigate what went wrong in scheduling?
The __CI_JOB_URL
in the scheduled product links to the pipeline where the product was scheduled.
Updated by tinita 5 days ago
FYI, the first time this job had no group was https://openqa.suse.de/tests/17198161#next_previous march 28
Updated by mkittler 4 days ago
This change from March 28 looks suspicious: https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/commit/833f6a32d881c9777f499eb2c4e38b1c265bf6f1
I suppose the real change would be a change within one of the scheduling tables on OSD (and it is just reflected in that commit), or is it the other way around?
Updated by mkittler 4 days ago
This change should fix one concrete regression that can lead to groupless jobs: https://github.com/os-autoinst/openQA/pull/6391
I haven't added any unit tests yet; maybe I can simply adjust a test that is now failing.
If you like you can assign this issue to me. I would then conclude this issue after the change is merged. I've gone though the code for a while and I don't think there are any other problems that would lead to jobs wrongly scheduled without a group.
Updated by mkittler 3 days ago
- Status changed from In Progress to Feedback
The PR has been merged and deployed as of 16.04.25 07:21.
This query can be used to check for problematic jobs (jobs which should have a group as they were scheduled via a scheduled product but have none):
select id, t_created, test from jobs where group_id is null and scheduled_product_id is not null and (select count(id) from scheduled_products where scheduled_products.id = scheduled_product_id and settings ->> '_GROUP_ID' = '0') = 0 order by id desc;
So the following query can be used to check whether there are still problematic jobs after my change:
select id, t_created, test from jobs where t_created > '2025-04-16 08:00:00' and group_id is null and scheduled_product_id is not null and (select count(id) from scheduled_products where scheduled_products.id = scheduled_product_id and settings ->> '_GROUP_ID' = '0') = 0 order by id desc;
I'll do this again next week to see whether my change was effective. I guess one has to watch out for false-positives, tough. (The query might return false-positives for jobs scheduled via scenario definitions YAML.)
Maybe this will return false-positives for jobs that