action #123064
closedbot-ng - pipelines in GitLab fail to pull qam-ci-leap:latest
0%
Description
The the following recent failures:
- https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1337812
- https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1337811
- https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1337810
WARNING: Failed to pull image with policy "always": Error response from daemon: Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout (manager.go:237:10s)
ERROR: Job failed: failed to pull image "registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest" with specified policies [always]: Error response from daemon: Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout (manager.go:237:10s)
Acceptance criteria¶
- AC1: bot-ng synchronize is executed successfully
Suggestions¶
- This occurred outside the maintenance window - we can't assume it'll go away?
- Inform INFRA about the issue and make sure it doesn't happen anymore
Updated by livdywan over 1 year ago
- Subject changed from bot-ng - synchronize pipeline in GitLab fails to pull qam-ci-leap:latest to bot-ng - pipelines in GitLab fail to pull qam-ci-leap:latest
- Description updated (diff)
Updated by jbaier_cz over 1 year ago
Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout
-> Problem with connecting to the IBS (image registry), the "make sure it doesn't happen anymore" part is IMHO impossible.
Updated by livdywan over 1 year ago
jbaier_cz wrote:
Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout
-> Problem with connecting to the IBS (image registry), the "make sure it doesn't happen anymore" part is IMHO impossible.
Let me re-phrase it then. As long as this happens the pipeline will fail. The pipeline can only succeed if the image can be pulled.
Updated by jbaier_cz over 1 year ago
cdywan wrote:
jbaier_cz wrote:
Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout
-> Problem with connecting to the IBS (image registry), the "make sure it doesn't happen anymore" part is IMHO impossible.Let me re-phrase it then. As long as this happens the pipeline will fail. The pipeline can only succeed if the image can be pulled.
Sure, I am just pointing out, that the problem lies between GitLab runner and IBS. As both components are outside our zone of control, our options are a little limited.
Updated by okurz over 1 year ago
- Status changed from New to Blocked
- Assignee set to okurz
Updated by okurz over 1 year ago
- Status changed from Blocked to Resolved
SD ticket is closed. Problem was fixed.
Updated by livdywan over 1 year ago
- Status changed from Resolved to Feedback
It seems like the issue is back, and as mentioned in the SD ticket there wasn't a fix at the time but rather we stopped seeing it:
Running with gitlab-runner 15.8.1 (f86890c6)
on gitlab-worker4:sle15.3 sHAdmiLV, system ID: s_d2d8982b55c6
Preparing the "docker" executor
00:14
Using Docker executor with image registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest ...
Pulling docker image registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper:x86_64-latest ...
Using docker image sha256:649d9ede15244b72762d76cea5750534c8187fe53657e86435e28f6bbc99cfa8 for registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper:x86_64-latest with digest registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper@sha256:eef6070f2ed7e2bb744fd8a107cd2f8922550f2b73e871db7c35ec830f113d92 ...
WARNING: Container based cache volumes creation is disabled. Will not create volume for "/cache"
Using docker image sha256:649d9ede15244b72762d76cea5750534c8187fe53657e86435e28f6bbc99cfa8 for registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper:x86_64-latest with digest registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper@sha256:eef6070f2ed7e2bb744fd8a107cd2f8922550f2b73e871db7c35ec830f113d92 ...
Pulling docker image registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest ...
WARNING: Failed to pull image with policy "always": Error response from daemon: Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout (manager.go:237:10s)
ERROR: Job failed: failed to pull image "registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest" with specified policies [always]: Error response from daemon: Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout (manager.go:237:10s)
See for example this failed schedule incidents pipeline in bot-no and several others that seem to fail in the same way if I checked it correctly.
Updated by livdywan over 1 year ago
- Status changed from Feedback to Blocked
- Assignee changed from okurz to livdywan
I filed SD-112285, and I'm marking this as blocking on that.
Updated by livdywan over 1 year ago
- Status changed from Blocked to Feedback
cdywan wrote:
I filed SD-112285, and I'm marking this as blocking on that.
From talking to Jiří it seems like this was/is actually a problem with the registry rather than GitLab. I'm not sure where to report this.
For now it appears the pipelines are once more fine and we don't know what changed in the meantime.
Updated by livdywan over 1 year ago
- Status changed from Feedback to In Progress
And we're back to multiple failing pipelines... so I guess I'll find out who to report this to now.
Updated by openqa_review over 1 year ago
- Due date set to 2023-03-05
Setting due date based on mean cycle time of SUSE QE Tools
Updated by livdywan over 1 year ago
Most recent currence was openQABot on Sun, 19 Feb 2023 01:25:16 +0000 with a slightly different error:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock
Updated by livdywan over 1 year ago
- Status changed from In Progress to Feedback
Marcus Rueckert tracked this down to [mpm_worker:error] [pid 1725:tid 139840871804800] AH00288: scoreboard is full, not at MaxRequestWorkers
and increased the number of slots, so ideally pipelines should run fine again. Will monitor this further.
Updated by mkittler over 1 year ago
- Status changed from Feedback to Resolved
We've seen no further alerts anymore.
Updated by jbaier_cz over 1 year ago
- Related to action #126872: bot-ng pipeline(s) fail(s) to pull openSUSE container images added
Updated by livdywan about 1 year ago
- Copied to action #133454: bot-ng - pipelines in GitLab fail to pull qam-ci-leap:latest size:M added