action #97733
Bot fails on Failed to query latest publiccloud tools image using {settings['PUBLICCLOUD_TOOLS_IMAGE_QUERY']} and no aggregates are scheduled
0%
Description
Here is example of failing run.
The PUBLICCLOUD_TOOLS_IMAGE_QUERY
variable is set to https://openqa.suse.de/group_overview/276.json
.
This variable is used as parameter to get_latest_tools_image
function which then returns publiccloud_tools_0020.qcow2
.
History
#1
Updated by pdostal almost 2 years ago
Here is possible hotfix.
#2
Updated by pdostal almost 2 years ago
- Priority changed from Normal to Urgent
#3
Updated by okurz almost 2 years ago
- Assignee set to okurz
- Target version set to Ready
pdostal https://progress.opensuse.org/issues/97733 sounds really like a "QE Container & Public Cloud" internal issue. I don't know how SUSE QE Tools is supposed to help here?
#4
Updated by jbaier_cz almost 2 years ago
Initial investigation¶
- last good run: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/545076
- no known changes in bot code and/or metadata
- a KeyError during error logging caused bot to die, that was fixed by introducing a quick patch
Subsequent run successfully scheduled at least the rest of the jobs, the public cloud error remains, now with a more specific error message:
ERROR: Failed to query latest publiccloud tools image using https://openqa.suse.de/group_overview/276.json
#5
Updated by jbaier_cz almost 2 years ago
okurz wrote:
pdostal https://progress.opensuse.org/issues/97733 sounds really like a "QE Container & Public Cloud" internal issue. I don't know how SUSE QE Tools is supposed to help here?
It is a bug in the bot actually, just found the issue. The bug was introduced by https://gitlab.suse.de/qa-maintenance/bot-ng/-/commit/317bf0bbc011a6b1ce3a07de06d93ea0f430fa37
#6
Updated by jbaier_cz almost 2 years ago
- Status changed from New to Resolved
- Assignee changed from okurz to jbaier_cz
#7
Updated by jbaier_cz almost 2 years ago
- Status changed from Resolved to Feedback
#8
Updated by jbaier_cz almost 2 years ago
- Status changed from Feedback to Resolved
I see several improvements here we can probably evaluate:
- The CI pipeline is still not yet ideal. As there are a lot of runs, we are hitting https://progress.opensuse.org/issues/96827 quite often, that unfortunately hides some of the problems.
- For public cloud, there are at least three different prefixes for variables:
PUBLICCLOUD_
,PUBLIC_CLOUD
,PC_
; that should be unified. - It would be nice to have at least a basic test suite to better distinguish between metadata error and code error.
#9
Updated by pdostal almost 2 years ago
- Assignee deleted (
jbaier_cz) - Target version deleted (
Ready)
jbaier_cz wrote:
- For public cloud, there are at least three different prefixes for variables:
PUBLICCLOUD_
,PUBLIC_CLOUD
,PC_
; that should be unified.
I created #97742 for this.
#10
Updated by pdostal almost 2 years ago
The bug was in public cloud specific part of the bot but it was affecting all aggregates.
Thank you jbaier_cz for the fix!
#11
Updated by pdostal almost 2 years ago
- Assignee set to jbaier_cz
- Target version set to Ready