Project

General

Profile

Actions

action #123064

closed

bot-ng - pipelines in GitLab fail to pull qam-ci-leap:latest

Added by livdywan about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2021-09-02
Due date:
2023-03-05
% Done:

0%

Estimated time:

Description

The the following recent failures:

WARNING: Failed to pull image with policy "always": Error response from daemon: Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout (manager.go:237:10s)
ERROR: Job failed: failed to pull image "registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest" with specified policies [always]: Error response from daemon: Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout (manager.go:237:10s)

Acceptance criteria

  • AC1: bot-ng synchronize is executed successfully

Suggestions

  • This occurred outside the maintenance window - we can't assume it'll go away?
  • Inform INFRA about the issue and make sure it doesn't happen anymore

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #126872: bot-ng pipeline(s) fail(s) to pull openSUSE container imagesRejected

Actions
Copied to QA - action #133454: bot-ng - pipelines in GitLab fail to pull qam-ci-leap:latest size:MResolvedlivdywan2023-09-08

Actions
Actions #1

Updated by livdywan about 1 year ago

  • Subject changed from bot-ng - synchronize pipeline in GitLab fails to pull qam-ci-leap:latest to bot-ng - pipelines in GitLab fail to pull qam-ci-leap:latest
  • Description updated (diff)
Actions #2

Updated by jbaier_cz about 1 year ago

Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout -> Problem with connecting to the IBS (image registry), the "make sure it doesn't happen anymore" part is IMHO impossible.

Actions #3

Updated by livdywan about 1 year ago

jbaier_cz wrote:

Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout -> Problem with connecting to the IBS (image registry), the "make sure it doesn't happen anymore" part is IMHO impossible.

Let me re-phrase it then. As long as this happens the pipeline will fail. The pipeline can only succeed if the image can be pulled.

Actions #4

Updated by jbaier_cz about 1 year ago

cdywan wrote:

jbaier_cz wrote:

Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout -> Problem with connecting to the IBS (image registry), the "make sure it doesn't happen anymore" part is IMHO impossible.

Let me re-phrase it then. As long as this happens the pipeline will fail. The pipeline can only succeed if the image can be pulled.

Sure, I am just pointing out, that the problem lies between GitLab runner and IBS. As both components are outside our zone of control, our options are a little limited.

Actions #5

Updated by okurz about 1 year ago

  • Status changed from New to Blocked
  • Assignee set to okurz
Actions #7

Updated by okurz about 1 year ago

  • Status changed from Blocked to Resolved

SD ticket is closed. Problem was fixed.

Actions #8

Updated by livdywan about 1 year ago

  • Status changed from Resolved to Feedback

It seems like the issue is back, and as mentioned in the SD ticket there wasn't a fix at the time but rather we stopped seeing it:

Running with gitlab-runner 15.8.1 (f86890c6)
  on gitlab-worker4:sle15.3 sHAdmiLV, system ID: s_d2d8982b55c6
Preparing the "docker" executor
00:14
Using Docker executor with image registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest ...
Pulling docker image registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper:x86_64-latest ...
Using docker image sha256:649d9ede15244b72762d76cea5750534c8187fe53657e86435e28f6bbc99cfa8 for registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper:x86_64-latest with digest registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper@sha256:eef6070f2ed7e2bb744fd8a107cd2f8922550f2b73e871db7c35ec830f113d92 ...
WARNING: Container based cache volumes creation is disabled. Will not create volume for "/cache"
Using docker image sha256:649d9ede15244b72762d76cea5750534c8187fe53657e86435e28f6bbc99cfa8 for registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper:x86_64-latest with digest registry.opensuse.org/home/darix/apps/containers/gitlab-runner-helper@sha256:eef6070f2ed7e2bb744fd8a107cd2f8922550f2b73e871db7c35ec830f113d92 ...
Pulling docker image registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest ...
WARNING: Failed to pull image with policy "always": Error response from daemon: Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout (manager.go:237:10s)
ERROR: Job failed: failed to pull image "registry.suse.de/qa/maintenance/containers/qam-ci-leap:latest" with specified policies [always]: Error response from daemon: Get "https://registry.suse.de/v2/": net/http: TLS handshake timeout (manager.go:237:10s)

See for example this failed schedule incidents pipeline in bot-no and several others that seem to fail in the same way if I checked it correctly.

Actions #9

Updated by livdywan about 1 year ago

  • Status changed from Feedback to Blocked
  • Assignee changed from okurz to livdywan

I filed SD-112285, and I'm marking this as blocking on that.

Actions #10

Updated by livdywan about 1 year ago

  • Status changed from Blocked to Feedback

cdywan wrote:

I filed SD-112285, and I'm marking this as blocking on that.

From talking to Jiří it seems like this was/is actually a problem with the registry rather than GitLab. I'm not sure where to report this.

For now it appears the pipelines are once more fine and we don't know what changed in the meantime.

Actions #11

Updated by livdywan about 1 year ago

  • Status changed from Feedback to In Progress

And we're back to multiple failing pipelines... so I guess I'll find out who to report this to now.

Actions #12

Updated by openqa_review about 1 year ago

  • Due date set to 2023-03-05

Setting due date based on mean cycle time of SUSE QE Tools

Actions #13

Updated by livdywan about 1 year ago

Most recent currence was openQABot on Sun, 19 Feb 2023 01:25:16 +0000 with a slightly different error:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock

Asking for help in #team-buildops.

Actions #14

Updated by livdywan about 1 year ago

  • Status changed from In Progress to Feedback

Marcus Rueckert tracked this down to [mpm_worker:error] [pid 1725:tid 139840871804800] AH00288: scoreboard is full, not at MaxRequestWorkers and increased the number of slots, so ideally pipelines should run fine again. Will monitor this further.

Actions #15

Updated by mkittler about 1 year ago

  • Status changed from Feedback to Resolved

We've seen no further alerts anymore.

Actions #16

Updated by jbaier_cz almost 1 year ago

  • Related to action #126872: bot-ng pipeline(s) fail(s) to pull openSUSE container images added
Actions #17

Updated by livdywan 8 months ago

  • Copied to action #133454: bot-ng - pipelines in GitLab fail to pull qam-ci-leap:latest size:M added
Actions

Also available in: Atom PDF