action #119767
closedFailed pipeline for "openqa-worker" in salt-states-openqa size:M
0%
Description
Observation¶
https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1217506
Retrieving: os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm [not found]
Abort, retry, ignore? [a/r/i/...? shows all options] (a): a
[ERROR ] stderr: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/'
Problem occurred during or after installation or removal of packages:
Installation has been aborted as directed.
Please see the above error message for a hint.
[ERROR ] retcode: 8
[ERROR ] An error was encountered while installing package(s): Zypper command failure: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/'
Problem occurred during or after installation or removal of packages:
Installation has been aborted as directed.
…
ID: worker.packages
Function: pkg.installed
Result: False
Comment: Attempt 1: Returned a result of "False", with the following comment: "An error was encountered while installing package(s): Zypper command failure: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/'
Problem occurred during or after installation or removal of packages:
Installation has been aborted as directed.
Please see the above error message for a hint."
Attempt 2: Returned a result of "False", with the following comment: "An error was encountered while installing package(s): Zypper command failure: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/'
Problem occurred during or after installation or removal of packages:
Installation has been aborted as directed.
Please see the above error message for a hint."
Attempt 3: Returned a result of "False", with the following comment: "An error was encountered while installing package(s): Zypper command failure: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/'
Problem occurred during or after installation or removal of packages:
Installation has been aborted as directed.
Please see the above error message for a hint."
Attempt 4: Returned a result of "False", with the following comment: "An error was encountered while installing package(s): Zypper command failure: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/'
Problem occurred during or after installation or removal of packages:
Installation has been aborted as directed.
Please see the above error message for a hint."
An error was encountered while installing package(s): Zypper command failure: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/'
Problem occurred during or after installation or removal of packages:
Installation has been aborted as directed.
Please see the above error message for a hint.
Retried the pipeline for now: https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1222315
Acceptance criteria¶
- AC1: Pipeline passes again
- AC2: It is known why the pipeline failed
Suggestions¶
- Read the git history of what changes we applied in the past to the package installations
- We already have instructed salt to call zypper multiple times for retry. But it looks like the repository data is not refreshed between each call. So we need to ensure that also the refreshing is done multiple times. In https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/server.sls#L8 we say "refresh: False" to save time but here it does not help us. So we should check if we change back to refresh how long it takes in comparison.
- Maybe we can find something that only applies the refresh where it's actually necessary, e.g. split the repo statements for devel:openQA and do explicit refresh there, but not in other cases
- Make sure to comment explicitly why certain things are done, e.g. why we would need a refresh
- Conduct simple benchmark to find out what the impact of no-refresh vs. refresh vs. salt-default is in the gitlab CI pipeline and applying of state
- According to salt docs the salt-default and not specifying should ensure that refresh is only done once but okurz doubts this works so needs to be verified, e.g. check salt with debug log level output
Updated by mkittler about 2 years ago
- Subject changed from Failed pipeline for "openqa-worker" in salt-states-opensuse to Failed pipeline for "openqa-worker" in salt-states-opensuse size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by dheidler about 2 years ago
- Status changed from Workable to In Progress
- Assignee set to dheidler
This looks like a repo issue or an issue regarding local copy of repo metadata being out of date.
PR as suggested: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/765
Updated by dheidler about 2 years ago
- Status changed from In Progress to Feedback
Updated by livdywan about 2 years ago
dheidler wrote:
This looks like a repo issue or an issue regarding local copy of repo metadata being out of date.
PR as suggested: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/765
This is still under review. Might be worth discussing with others since I feel like Dominik was expecting a more trivial fix.
Updated by okurz about 2 years ago
- Subject changed from Failed pipeline for "openqa-worker" in salt-states-opensuse size:M to Failed pipeline for "openqa-worker" in salt-states-opensuse
- Due date deleted (
2022-11-18) - Status changed from Feedback to New
- Assignee deleted (
dheidler)
cdywan wrote:
dheidler wrote:
This looks like a repo issue or an issue regarding local copy of repo metadata being out of date.
PR as suggested: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/765This is still under review. Might be worth discussing with others since I feel like Dominik was expecting a more trivial fix.
Then we need to rediscuss although I think the original ticket description already covers it:
We already have instructed salt to call zypper multiple times for retry. But it looks like the repository data is not refreshed between each call. So we need to ensure that also the refreshing is done multiple times. In https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/server.sls#L8 we say "refresh: False" to save time but here it does not help us. So we should check if we change back to refresh how long it takes in comparison.
meaning: It's not as simple is just putting "refresh: True" there. Also it wouldn't be "size:M" if it's just that, right?
Updated by okurz about 2 years ago
- Subject changed from Failed pipeline for "openqa-worker" in salt-states-opensuse to Failed pipeline for "openqa-worker" in salt-states-opensuse size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler about 2 years ago
According to the documentation https://docs.saltproject.io/en/latest/ref/states/all/salt.states.pkg.html using refresh: True
will slow us down as we have multiple pkg
states and then a refresh would be done for all of them. I can nevertheless create a MR to see how bad it'll be. Keeping Salt's default might not be helpful. At least the documentation doesn't state that then a refresh would be done in case a retry is done. Neither the mentioned documentation nor https://docs.saltproject.io/en/latest/ref/states/requisites.html#retrying-states describe the interaction between refresh
and retry
. I'm also not sure how we would test ourselves how the behavior. We'd somehow needed to provoke the error and somehow trace whether a refresh is done.
Updated by mkittler about 2 years ago
MR: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/776
CI runtimes on master (with refresh: False
):
- test-storage: 00:02:46
- test-monitor: 00:03:38
- test-worker: 00:12:30
- test-webui: 00:05:28
CI runtimes with refresh: True
:
- test-storage: 00:03:37
- test-monitor: 00:05:14
- test-worker: 00:12:30
- test-webui: 00:08:10
So it generally takes a few minutes longer. Strangely test-worker
had the same runtime. Not sure whether that's acceptable.
Updated by mkittler about 2 years ago
- Assignee deleted (
mkittler)
I currently have enough tickets assigned. Maybe I'll pick this one up later. It would also make sense to discuss the outcome of my test (mentioned in the previous comment).
Updated by dheidler about 2 years ago
- Status changed from Workable to Feedback
- Assignee set to dheidler
I personally would consider everything below 15 minutes as acceptable - especially as it saves us time reacting on issues.
So I would go for merging this.
Any objections?
Updated by okurz about 2 years ago
Well, as mkittler tested the runtime does increase but not for the worker. However the additional time is not only necessary during CI runs but any time someone or a service tries to apply a salt high state which I consider significant. As we need to do some retrying anyway I would favor if we find a more efficient solution that tries the fastest way first and only refresh in any retries as necessary
Updated by dheidler about 2 years ago
Hm - we could set retry to true maybe with some env var that is only set when the pipeline is applied from gitlab. WDYT?
I don't know any way how (or even if) your idea could be achieved using salt.
Updated by okurz about 2 years ago
dheidler wrote:
Hm - we could set retry to true maybe with some env var that is only set when the pipeline is applied from gitlab. WDYT?
I don't know any way how (or even if) your idea could be achieved using salt.
This brought me to an idea: When we only want to effectively "retry" when running in CI jobs then let's do that, but not "refresh" but simply CI level retry:
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/778
Updated by mkittler about 2 years ago
- Subject changed from Failed pipeline for "openqa-worker" in salt-states-opensuse size:M to Failed pipeline for "openqa-worker" in salt-states-openqa size:M
Updated by dheidler about 2 years ago
- Status changed from Feedback to Resolved
Let's see if it happens again:
https://gitlab.suse.de/openqa/salt-states-openqa/-/pipelines/545047