action #65142
closedMake scheduling errors more accessible
0%
Description
Problem¶
As user which is not familiar with the scheduling details of openQA and how to look up the "scheduled products" table it is hard to trace scheduling problems, e.g. to find out why dependencies are not created as expected. Even when knowing such details it is very inconvenient to check scheduling problems for a particular job because there is no link from the test details page to the corresponding scheduled product. The scheduled products table is also cumbersome to work with as it only shows a limited number of entries and has limited search capabilities.
Suggestions¶
- Add a link to the scheduled product on the job details page.
- Show a warning about scheduling errors directly on the job details page if that does not slow down the loading time of the page too much.
- Improve the scheduled products table.
- At least allow to show a specific scheduled product for 1. (e.g. add a dedicated "details pages" to show a single scheduled product).
- With 1. not so important anymore but still worth considering: Use server-side rendering for the scheduled products table to show more than only a limited number of scheduled products at a time.
Notes¶
The errors which would be interesting are stored as JSON in the scheduled products table, e.g.:
"failed_job_info": [
{
"error_messages": [
"START_AFTER_TEST=create_hdd_gnome@64bit not found - check for dependency typos and dependency cycles"
],
"job_id": 4063938
},
{
"error_messages": [
"allmodules+allpatterns+registration@svirt-hyperv has no child, check its machine placed or dependency setting typos"
],
"job_id": [
4063960
]
},
{
"error_messages": [
"allmodules+allpatterns+registration@svirt-hyperv-uefi has no child, check its machine placed or dependency setting typos"
],
"job_id": [
4063986
]
},
{
"error_messages": [
"allmodules+allpatterns+registration@svirt-xen-hvm has no child, check its machine placed or dependency setting typos"
],
"job_id": [
4063961
]
},
{
"error_messages": [
"allmodules+allpatterns+registration@svirt-xen-pv has no child, check its machine placed or dependency setting typos"
],
"job_id": [
4063967
]
}
]
Updated by okurz over 4 years ago
- Priority changed from Normal to Low
no new story that the UX of cluster scheduling still has room for improvement :) However the suggestions sound sensible so it's good to keep in "Workable"'
Updated by okurz over 4 years ago
- Is duplicate of action #51716: No scheduling error generated for faulty PARALLEL_WITH config added
Updated by okurz over 4 years ago
- Status changed from Workable to Rejected
- Assignee set to okurz
merged content into #51716
Updated by mkittler over 4 years ago
- Status changed from Rejected to Workable
- Assignee deleted (
okurz) - Priority changed from Low to Normal
- Target version changed from future to Ready
@okurz I don't like how you've merged the tickets. The steps to reproduce in the other ticket are way too specific in my opinion and this is not a MM specific problem. This is about any error message which might be generated when scheduling a product. Besides, you've copied almost everything else from the description of this ticket to the other ticket. I could "fix" the other ticket but actually I would end up with just having it like this ticket again.
Additionally, when I read the other ticket correctly, it is actually about something different: In a certain case the there's no error message generated when scheduling a product but an error message should have been generated. So the other ticket is about a missing error message. This ticket is about displaying generated error messages. Maybe one should revert your changes in the other ticket so the actual point of the other ticket is not lost.
From my point of view it is also workable and the importance is not low because it starts to annoy me that people ask me questions about broken features and then it turns out that not even the dependencies have been created correctly. It usually is also quite some effort for myself to investigate these problems because I have to resort to manual SQL queries as the web UI is often too limiting. So I actually like to pick up this ticket as one of my next task. At least a partial implementation of the suggestions would already help.
Updated by mkittler over 4 years ago
- Is duplicate of deleted (action #51716: No scheduling error generated for faulty PARALLEL_WITH config)
Updated by mkittler over 4 years ago
- Related to action #51716: No scheduling error generated for faulty PARALLEL_WITH config added
Updated by mkittler over 4 years ago
- Status changed from Workable to New
- Assignee set to mkittler
- Target version changed from Ready to Current Sprint
Updated by mkittler over 4 years ago
- Status changed from New to In Progress
PR for all points mentioned in the description except 2.: https://github.com/os-autoinst/openQA/pull/3061
Updated by mkittler over 4 years ago
- Status changed from In Progress to Resolved
- Target version deleted (
Current Sprint)
It seems to work on o3. I don't think it is worth implementing suggestion 2. at this point. There are scheduling errors¹ we so far successfully ignore so it might not make sense to show this on each and every test details pages. Besides I'm not sure how/whether the JSON data can be efficiently queried with PostgreSQL (and likely DBIx won't help here much).
¹mainly:
"failed_job_info": [
{
"error_messages": [
"START_AFTER_TEST=RAID0@64bit not found - check for dependency typos and dependency cycles"
],
"job_id": 1264161
}
]