action #119767
Failed pipeline for "openqa-worker" in salt-states-openqa size:M
0%
Description
Observation¶
https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1217506
Retrieving: os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm [not found] Abort, retry, ignore? [a/r/i/...? shows all options] (a): a [ERROR ] stderr: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/' Problem occurred during or after installation or removal of packages: Installation has been aborted as directed. Please see the above error message for a hint. [ERROR ] retcode: 8 [ERROR ] An error was encountered while installing package(s): Zypper command failure: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/' Problem occurred during or after installation or removal of packages: Installation has been aborted as directed. … ID: worker.packages Function: pkg.installed Result: False Comment: Attempt 1: Returned a result of "False", with the following comment: "An error was encountered while installing package(s): Zypper command failure: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/' Problem occurred during or after installation or removal of packages: Installation has been aborted as directed. Please see the above error message for a hint." Attempt 2: Returned a result of "False", with the following comment: "An error was encountered while installing package(s): Zypper command failure: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/' Problem occurred during or after installation or removal of packages: Installation has been aborted as directed. Please see the above error message for a hint." Attempt 3: Returned a result of "False", with the following comment: "An error was encountered while installing package(s): Zypper command failure: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/' Problem occurred during or after installation or removal of packages: Installation has been aborted as directed. Please see the above error message for a hint." Attempt 4: Returned a result of "False", with the following comment: "An error was encountered while installing package(s): Zypper command failure: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/' Problem occurred during or after installation or removal of packages: Installation has been aborted as directed. Please see the above error message for a hint." An error was encountered while installing package(s): Zypper command failure: File './x86_64/os-autoinst-4.6.1666985981.c33e9ef-1421.1.x86_64.rpm' not found on medium 'http://download.opensuse.org/repositories/devel:/openQA/openSUSE_Tumbleweed/' Problem occurred during or after installation or removal of packages: Installation has been aborted as directed. Please see the above error message for a hint.
Retried the pipeline for now: https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1222315
Acceptance criteria¶
- AC1: Pipeline passes again
- AC2: It is known why the pipeline failed
Suggestions¶
- Read the git history of what changes we applied in the past to the package installations
- We already have instructed salt to call zypper multiple times for retry. But it looks like the repository data is not refreshed between each call. So we need to ensure that also the refreshing is done multiple times. In https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/server.sls#L8 we say "refresh: False" to save time but here it does not help us. So we should check if we change back to refresh how long it takes in comparison.
- Maybe we can find something that only applies the refresh where it's actually necessary, e.g. split the repo statements for devel:openQA and do explicit refresh there, but not in other cases
- Make sure to comment explicitly why certain things are done, e.g. why we would need a refresh
- Conduct simple benchmark to find out what the impact of no-refresh vs. refresh vs. salt-default is in the gitlab CI pipeline and applying of state
- According to salt docs the salt-default and not specifying should ensure that refresh is only done once but okurz doubts this works so needs to be verified, e.g. check salt with debug log level output
History
#4
Updated by dheidler 3 months ago
- Status changed from Workable to In Progress
- Assignee set to dheidler
This looks like a repo issue or an issue regarding local copy of repo metadata being out of date.
PR as suggested: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/765
#7
Updated by cdywan 3 months ago
dheidler wrote:
This looks like a repo issue or an issue regarding local copy of repo metadata being out of date.
PR as suggested: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/765
This is still under review. Might be worth discussing with others since I feel like Dominik was expecting a more trivial fix.
#8
Updated by okurz 3 months ago
- Subject changed from Failed pipeline for "openqa-worker" in salt-states-opensuse size:M to Failed pipeline for "openqa-worker" in salt-states-opensuse
- Due date deleted (
2022-11-18) - Status changed from Feedback to New
- Assignee deleted (
dheidler)
cdywan wrote:
dheidler wrote:
This looks like a repo issue or an issue regarding local copy of repo metadata being out of date.
PR as suggested: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/765This is still under review. Might be worth discussing with others since I feel like Dominik was expecting a more trivial fix.
Then we need to rediscuss although I think the original ticket description already covers it:
We already have instructed salt to call zypper multiple times for retry. But it looks like the repository data is not refreshed between each call. So we need to ensure that also the refreshing is done multiple times. In https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/server.sls#L8 we say "refresh: False" to save time but here it does not help us. So we should check if we change back to refresh how long it takes in comparison.
meaning: It's not as simple is just putting "refresh: True" there. Also it wouldn't be "size:M" if it's just that, right?
#11
Updated by mkittler 2 months ago
According to the documentation https://docs.saltproject.io/en/latest/ref/states/all/salt.states.pkg.html using refresh: True
will slow us down as we have multiple pkg
states and then a refresh would be done for all of them. I can nevertheless create a MR to see how bad it'll be. Keeping Salt's default might not be helpful. At least the documentation doesn't state that then a refresh would be done in case a retry is done. Neither the mentioned documentation nor https://docs.saltproject.io/en/latest/ref/states/requisites.html#retrying-states describe the interaction between refresh
and retry
. I'm also not sure how we would test ourselves how the behavior. We'd somehow needed to provoke the error and somehow trace whether a refresh is done.
#12
Updated by mkittler 2 months ago
MR: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/776
CI runtimes on master (with refresh: False
):
- test-storage: 00:02:46
- test-monitor: 00:03:38
- test-worker: 00:12:30
- test-webui: 00:05:28
CI runtimes with refresh: True
:
- test-storage: 00:03:37
- test-monitor: 00:05:14
- test-worker: 00:12:30
- test-webui: 00:08:10
So it generally takes a few minutes longer. Strangely test-worker
had the same runtime. Not sure whether that's acceptable.
#15
Updated by okurz 2 months ago
Well, as mkittler tested the runtime does increase but not for the worker. However the additional time is not only necessary during CI runs but any time someone or a service tries to apply a salt high state which I consider significant. As we need to do some retrying anyway I would favor if we find a more efficient solution that tries the fastest way first and only refresh in any retries as necessary
#17
Updated by okurz 2 months ago
dheidler wrote:
Hm - we could set retry to true maybe with some env var that is only set when the pipeline is applied from gitlab. WDYT?
I don't know any way how (or even if) your idea could be achieved using salt.
This brought me to an idea: When we only want to effectively "retry" when running in CI jobs then let's do that, but not "refresh" but simply CI level retry:
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/778
#19
Updated by dheidler about 2 months ago
- Status changed from Feedback to Resolved
Let's see if it happens again:
https://gitlab.suse.de/openqa/salt-states-openqa/-/pipelines/545047