action #65142: Make scheduling errors more accessible - openQA Project - openSUSE Project Management Tool

Actions

Copy link

action #65142

closed

Make scheduling errors more accessible

Added by mkittler about 4 years ago. Updated about 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

mkittler

Category:

Feature requests

Target version:

Start date:

2020-04-01

Due date:

% Done:

Estimated time:

Description

Problem¶

As user which is not familiar with the scheduling details of openQA and how to look up the "scheduled products" table it is hard to trace scheduling problems, e.g. to find out why dependencies are not created as expected. Even when knowing such details it is very inconvenient to check scheduling problems for a particular job because there is no link from the test details page to the corresponding scheduled product. The scheduled products table is also cumbersome to work with as it only shows a limited number of entries and has limited search capabilities.

Suggestions¶

Add a link to the scheduled product on the job details page.
Show a warning about scheduling errors directly on the job details page if that does not slow down the loading time of the page too much.
Improve the scheduled products table.
1. At least allow to show a specific scheduled product for 1. (e.g. add a dedicated "details pages" to show a single scheduled product).
2. With 1. not so important anymore but still worth considering: Use server-side rendering for the scheduled products table to show more than only a limited number of scheduled products at a time.

Notes¶

The errors which would be interesting are stored as JSON in the scheduled products table, e.g.:

"failed_job_info": [
        {
            "error_messages": [
                "START_AFTER_TEST=create_hdd_gnome@64bit not found - check for dependency typos and dependency cycles"
            ],
            "job_id": 4063938
        },
        {
            "error_messages": [
                "allmodules+allpatterns+registration@svirt-hyperv has no child, check its machine placed or dependency setting typos"
            ],
            "job_id": [
                4063960
            ]
        },
        {
            "error_messages": [
                "allmodules+allpatterns+registration@svirt-hyperv-uefi has no child, check its machine placed or dependency setting typos"
            ],
            "job_id": [
                4063986
            ]
        },
        {
            "error_messages": [
                "allmodules+allpatterns+registration@svirt-xen-hvm has no child, check its machine placed or dependency setting typos"
            ],
            "job_id": [
                4063961
            ]
        },
        {
            "error_messages": [
                "allmodules+allpatterns+registration@svirt-xen-pv has no child, check its machine placed or dependency setting typos"
            ],
            "job_id": [
                4063967
            ]
        }
    ]

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by okurz about 4 years ago

Priority changed from Normal to Low

no new story that the UX of cluster scheduling still has room for improvement :) However the suggestions sound sensible so it's good to keep in "Workable"'

Actions

Copy link

Updated by okurz about 4 years ago

Is duplicate of action #51716: No scheduling error generated for faulty PARALLEL_WITH config added

Actions

Copy link

Updated by okurz about 4 years ago

Status changed from Workable to Rejected
Assignee set to okurz

merged content into #51716

Actions

Copy link

Updated by mkittler about 4 years ago

Status changed from Rejected to Workable
Assignee deleted (~~okurz~~)
Priority changed from Low to Normal
Target version changed from future to Ready

@okurz I don't like how you've merged the tickets. The steps to reproduce in the other ticket are way too specific in my opinion and this is not a MM specific problem. This is about any error message which might be generated when scheduling a product. Besides, you've copied almost everything else from the description of this ticket to the other ticket. I could "fix" the other ticket but actually I would end up with just having it like this ticket again.

Additionally, when I read the other ticket correctly, it is actually about something different: In a certain case the there's no error message generated when scheduling a product but an error message should have been generated. So the other ticket is about a missing error message. This ticket is about displaying generated error messages. Maybe one should revert your changes in the other ticket so the actual point of the other ticket is not lost.

From my point of view it is also workable and the importance is not low because it starts to annoy me that people ask me questions about broken features and then it turns out that not even the dependencies have been created correctly. It usually is also quite some effort for myself to investigate these problems because I have to resort to manual SQL queries as the web UI is often too limiting. So I actually like to pick up this ticket as one of my next task. At least a partial implementation of the suggestions would already help.

Actions

Copy link

Updated by mkittler about 4 years ago

Is duplicate of deleted (action #51716: No scheduling error generated for faulty PARALLEL_WITH config)

Actions

Copy link

Updated by mkittler about 4 years ago

Related to action #51716: No scheduling error generated for faulty PARALLEL_WITH config added

Actions

Copy link

Updated by mkittler about 4 years ago

Status changed from Workable to New
Assignee set to mkittler
Target version changed from Ready to Current Sprint

Actions

Copy link

Updated by mkittler about 4 years ago

Status changed from New to In Progress

PR for all points mentioned in the description except 2.: https://github.com/os-autoinst/openQA/pull/3061

Actions

Copy link

Updated by mkittler about 4 years ago

Status changed from In Progress to Resolved
Target version deleted (~~Current Sprint~~)

It seems to work on o3. I don't think it is worth implementing suggestion 2. at this point. There are scheduling errors¹ we so far successfully ignore so it might not make sense to show this on each and every test details pages. Besides I'm not sure how/whether the JSON data can be efficiently queried with PostgreSQL (and likely DBIx won't help here much).

¹mainly:

    "failed_job_info": [
        {
            "error_messages": [
                "START_AFTER_TEST=RAID0@64bit not found - check for dependency typos and dependency cycles"
            ],
            "job_id": 1264161
        }
    ]

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA » openQA Project

Tags

Custom queries

action #65142

Make scheduling errors more accessible

Problem¶

Suggestions¶

Notes¶

Updated by okurz about 4 years ago

Updated by okurz about 4 years ago

Updated by okurz about 4 years ago

Updated by mkittler about 4 years ago

Updated by mkittler about 4 years ago

Updated by mkittler about 4 years ago

Updated by mkittler about 4 years ago

Updated by mkittler about 4 years ago

Updated by mkittler about 4 years ago