Project

General

Profile

Actions

action #65142

closed

Make scheduling errors more accessible

Added by mkittler about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
-
Start date:
2020-04-01
Due date:
% Done:

0%

Estimated time:

Description

Problem

As user which is not familiar with the scheduling details of openQA and how to look up the "scheduled products" table it is hard to trace scheduling problems, e.g. to find out why dependencies are not created as expected. Even when knowing such details it is very inconvenient to check scheduling problems for a particular job because there is no link from the test details page to the corresponding scheduled product. The scheduled products table is also cumbersome to work with as it only shows a limited number of entries and has limited search capabilities.

Suggestions

  1. Add a link to the scheduled product on the job details page.
  2. Show a warning about scheduling errors directly on the job details page if that does not slow down the loading time of the page too much.
  3. Improve the scheduled products table.
    1. At least allow to show a specific scheduled product for 1. (e.g. add a dedicated "details pages" to show a single scheduled product).
    2. With 1. not so important anymore but still worth considering: Use server-side rendering for the scheduled products table to show more than only a limited number of scheduled products at a time.

Notes

The errors which would be interesting are stored as JSON in the scheduled products table, e.g.:

"failed_job_info": [
        {
            "error_messages": [
                "START_AFTER_TEST=create_hdd_gnome@64bit not found - check for dependency typos and dependency cycles"
            ],
            "job_id": 4063938
        },
        {
            "error_messages": [
                "allmodules+allpatterns+registration@svirt-hyperv has no child, check its machine placed or dependency setting typos"
            ],
            "job_id": [
                4063960
            ]
        },
        {
            "error_messages": [
                "allmodules+allpatterns+registration@svirt-hyperv-uefi has no child, check its machine placed or dependency setting typos"
            ],
            "job_id": [
                4063986
            ]
        },
        {
            "error_messages": [
                "allmodules+allpatterns+registration@svirt-xen-hvm has no child, check its machine placed or dependency setting typos"
            ],
            "job_id": [
                4063961
            ]
        },
        {
            "error_messages": [
                "allmodules+allpatterns+registration@svirt-xen-pv has no child, check its machine placed or dependency setting typos"
            ],
            "job_id": [
                4063967
            ]
        }
    ]

Related issues 1 (1 open0 closed)

Related to openQA Project - action #51716: No scheduling error generated for faulty PARALLEL_WITH configWorkable2019-05-21

Actions
Actions #1

Updated by okurz about 4 years ago

  • Priority changed from Normal to Low

no new story that the UX of cluster scheduling still has room for improvement :) However the suggestions sound sensible so it's good to keep in "Workable"'

Actions #2

Updated by okurz about 4 years ago

  • Is duplicate of action #51716: No scheduling error generated for faulty PARALLEL_WITH config added
Actions #3

Updated by okurz about 4 years ago

  • Status changed from Workable to Rejected
  • Assignee set to okurz

merged content into #51716

Actions #4

Updated by mkittler almost 4 years ago

  • Status changed from Rejected to Workable
  • Assignee deleted (okurz)
  • Priority changed from Low to Normal
  • Target version changed from future to Ready

@okurz I don't like how you've merged the tickets. The steps to reproduce in the other ticket are way too specific in my opinion and this is not a MM specific problem. This is about any error message which might be generated when scheduling a product. Besides, you've copied almost everything else from the description of this ticket to the other ticket. I could "fix" the other ticket but actually I would end up with just having it like this ticket again.

Additionally, when I read the other ticket correctly, it is actually about something different: In a certain case the there's no error message generated when scheduling a product but an error message should have been generated. So the other ticket is about a missing error message. This ticket is about displaying generated error messages. Maybe one should revert your changes in the other ticket so the actual point of the other ticket is not lost.


From my point of view it is also workable and the importance is not low because it starts to annoy me that people ask me questions about broken features and then it turns out that not even the dependencies have been created correctly. It usually is also quite some effort for myself to investigate these problems because I have to resort to manual SQL queries as the web UI is often too limiting. So I actually like to pick up this ticket as one of my next task. At least a partial implementation of the suggestions would already help.

Actions #5

Updated by mkittler almost 4 years ago

  • Is duplicate of deleted (action #51716: No scheduling error generated for faulty PARALLEL_WITH config)
Actions #6

Updated by mkittler almost 4 years ago

  • Related to action #51716: No scheduling error generated for faulty PARALLEL_WITH config added
Actions #7

Updated by mkittler almost 4 years ago

  • Status changed from Workable to New
  • Assignee set to mkittler
  • Target version changed from Ready to Current Sprint
Actions #8

Updated by mkittler almost 4 years ago

  • Status changed from New to In Progress

PR for all points mentioned in the description except 2.: https://github.com/os-autoinst/openQA/pull/3061

Actions #9

Updated by mkittler almost 4 years ago

  • Status changed from In Progress to Resolved
  • Target version deleted (Current Sprint)

It seems to work on o3. I don't think it is worth implementing suggestion 2. at this point. There are scheduling errors¹ we so far successfully ignore so it might not make sense to show this on each and every test details pages. Besides I'm not sure how/whether the JSON data can be efficiently queried with PostgreSQL (and likely DBIx won't help here much).

¹mainly:

    "failed_job_info": [
        {
            "error_messages": [
                "START_AFTER_TEST=RAID0@64bit not found - check for dependency typos and dependency cycles"
            ],
            "job_id": 1264161
        }
    ]
Actions

Also available in: Atom PDF