action #97733
closedBot fails on Failed to query latest publiccloud tools image using {settings['PUBLICCLOUD_TOOLS_IMAGE_QUERY']} and no aggregates are scheduled
0%
Description
Here is example of failing run.
The PUBLICCLOUD_TOOLS_IMAGE_QUERY
variable is set to https://openqa.suse.de/group_overview/276.json
.
This variable is used as parameter to get_latest_tools_image
function which then returns publiccloud_tools_0020.qcow2
.
Updated by okurz over 3 years ago
- Assignee set to okurz
- Target version set to Ready
@pdostal https://progress.opensuse.org/issues/97733 sounds really like a "QE Container & Public Cloud" internal issue. I don't know how SUSE QE Tools is supposed to help here?
Updated by jbaier_cz over 3 years ago
Initial investigation¶
- last good run: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/545076
- no known changes in bot code and/or metadata
- a KeyError during error logging caused bot to die, that was fixed by introducing a quick patch
Subsequent run successfully scheduled at least the rest of the jobs, the public cloud error remains, now with a more specific error message:
ERROR: Failed to query latest publiccloud tools image using https://openqa.suse.de/group_overview/276.json
Updated by jbaier_cz over 3 years ago
okurz wrote:
@pdostal https://progress.opensuse.org/issues/97733 sounds really like a "QE Container & Public Cloud" internal issue. I don't know how SUSE QE Tools is supposed to help here?
It is a bug in the bot actually, just found the issue. The bug was introduced by https://gitlab.suse.de/qa-maintenance/bot-ng/-/commit/317bf0bbc011a6b1ce3a07de06d93ea0f430fa37
Updated by jbaier_cz over 3 years ago
- Status changed from New to Resolved
- Assignee changed from okurz to jbaier_cz
Updated by jbaier_cz over 3 years ago
- Status changed from Resolved to Feedback
Updated by jbaier_cz over 3 years ago
- Status changed from Feedback to Resolved
I see several improvements here we can probably evaluate:
- The CI pipeline is still not yet ideal. As there are a lot of runs, we are hitting https://progress.opensuse.org/issues/96827 quite often, that unfortunately hides some of the problems.
- For public cloud, there are at least three different prefixes for variables:
PUBLICCLOUD_
,PUBLIC_CLOUD
,PC_
; that should be unified. - It would be nice to have at least a basic test suite to better distinguish between metadata error and code error.
Updated by pdostal over 3 years ago
- Assignee deleted (
jbaier_cz) - Target version deleted (
Ready)
jbaier_cz wrote:
- For public cloud, there are at least three different prefixes for variables:
PUBLICCLOUD_
,PUBLIC_CLOUD
,PC_
; that should be unified.
I created #97742 for this.
Updated by pdostal over 3 years ago
The bug was in public cloud specific part of the bot but it was affecting all aggregates.
Thank you @jbaier_cz for the fix!
Updated by pdostal over 3 years ago
- Assignee set to jbaier_cz
- Target version set to Ready