action #98415
closedDeploy container private registry for openQA tests
0%
Description
Currently, we have an unsecured registry deployed in a VM in a Amazon Cloud.
The setup is very basic and it just follows the steps in this confluence page.
As an openQA test developer, I would like to have a more reliable and secure private container registry to be used by the container tests run in:
- openqa.suse.de
- openqa.opensuse.org
The registry should contain (at least) these images:
- hello-world
- alpine:latest
- centos:latest
- fedora:latest
- debian:latest
One possible implementation is Harbor registry, which is an open source registry with some interesting features.
Another option is to use existing Container Engines in the Public Cloud providers (e.g. Amazon ECR, Google Container Registry, etc).
The proposed implementation should have some cost planning and reasoning (e.g. we just justify if we choose container engine because it might be more expensive than having a VM in the cloud with Harbor).
Additional implementation ideas are welcome.
Updated by jlausuch over 3 years ago
- Subject changed from Improve private registry for openQA tests to Deploy container private registry for openQA tests
Updated by jlausuch over 2 years ago
- Status changed from Workable to In Progress
- Assignee set to jlausuch
- Priority changed from Low to Normal
Updated by jlausuch over 2 years ago
I setup a registry using Amazon ECR instead of creating a new VM in the new Amazon account.
I have documented how to do it here: https://confluence.suse.com/pages/viewpage.action?pageId=1016692995
This is the PR to switch to that new registry: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15034
We need to remove library
from the pull command.
Updated by jlausuch over 2 years ago
While doing verification runs, I bumped into an issue:
# docker image pull public.ecr.aws/r6r9k8r3/hello-world; echo isnlI-$?-
Using default tag: latest
latest: Pulling from r6r9k8r3/hello-world
toomanyrequests: Rate exceeded
Example: https://openqa.suse.de/tests/8929837#step/docker/172
This only happens when using docker, never in podman. Not sure why, I asked this in stackoverflow.
There is some information about this here:
https://docs.aws.amazon.com/AmazonECR/latest/public/public-service-quotas.html
Apparently, there is a hard limit of 1 in Rate of unauthenticated image pulls
(not sure what that 1 means).
I am trying to workaround this.. apparently, doing retries work.
Updated by jlausuch over 2 years ago
Mean time, I realized there is an AWS public registry, called AWS Public Gallery
https://gallery.ecr.aws/
It contains images from trusted sources (official accounts that also push stuff to docker.io), and it also contains the images from the public registries created by individuals, as I have done in our AWS account...
The good thing, is that we can use the trusted sources:
For example, for Alpine we have
https://gallery.ecr.aws/docker/library/alpine
Same for other images (ubuntu, centos, debian, etc...).
The problem of toomanyrequests: Rate exceeded
when using docker still exists, but I am trying some code changes for that to not happen in our tests.
Updated by jlausuch over 2 years ago
New PR using Public Gallery:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15040
should be merged together with
https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/831
https://github.com/os-autoinst/opensuse-jobgroups/pull/156
Updated by jlausuch over 2 years ago
- Status changed from In Progress to Workable
- Priority changed from Normal to Low
After some investigation, using our own ECR registry or the public gallery has the same limitations. I have ran sets of 500 tests to see how often this "rate exceeded" happens and it's not so sporadic, maybe 20% of the test cases were failing with this limitation.
So, what I did in the end is to create another VM in the new AWS account following https://confluence.suse.com/display/qasle/Building+a+private+registry
Then, I have updated the IP of that new registry in all the jobs and test suites in OSD and O3:
https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/831
https://github.com/os-autoinst/opensuse-jobgroups/pull/156
So, the situation of this ticket is the same. We still have an insecure registry in a VM with all the drawbacks that this approach has. The ticket is still open for implementation with better solution.
Updated by rbranco almost 2 years ago
What about using the Github Container Registry?
Updated by jlausuch almost 2 years ago
rbranco wrote:
What about using the Github Container Registry?
Could you please further investigate this? It sounds very interesting.
Basically, we need a place to store some test images for testing that dont' belong to us (alpine, ubuntu, fedora, etc). Those images will be pulled by the tests. So, if it's public, we don't care as we are not exposing any internal image.
Updated by rbranco almost 2 years ago
- Status changed from Workable to In Progress
Updated by rbranco almost 2 years ago
jlausuch wrote:
So, what I did in the end is to create another VM in the new AWS account following https://confluence.suse.com/display/qasle/Building+a+private+registry
I modified the document to add -e REGISTRY_STORAGE_DELETE_ENABLED=true -e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io
to docker run
. The first variable enables the scheduler to garbage delete old stuff according to https://docs.docker.com/registry/recipes/mirror/
I found one problem with the approach of the pull through cache described in the document. The server is easily DOS'able by a malicious actor filling up the disk by pulling irrelevant images. Though this can be fixed with Nginx as reverse proxy and an IP whitelist.
I even managed to render it unable to cache anything by just pulling busybox and then running my Registry listing tool: the cache proxied the requests for listing the busybox repository, downloading the manifests for all tags, eventually getting the TOOMANYREQUESTS error: "You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit".
For this reason I believe that a vanilla Registry server with "manually" pushed images is better than a pull through cache.
So, the situation of this ticket is the same. We still have an insecure registry in a VM with all the drawbacks that this approach has. The ticket is still open for implementation with better solution.
If we're serving base images I don't see the issue with having an "insecure" (it's really just plain HTTP) Registry as the images have a hash. Ubuntu repositories are plain HTTP with a hash signed by GPG key.
Updated by jlausuch almost 2 years ago
rbranco wrote:
jlausuch wrote:
So, what I did in the end is to create another VM in the new AWS account following https://confluence.suse.com/display/qasle/Building+a+private+registry
I modified the document to add
-e REGISTRY_STORAGE_DELETE_ENABLED=true -e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io
todocker run
. The first variable enables the scheduler to garbage delete old stuff according to https://docs.docker.com/registry/recipes/mirror/I found one problem with the approach of the pull through cache described in the document. The server is easily DOS'able by a malicious actor filling up the disk by pulling irrelevant images. Though this can be fixed with Nginx as reverse proxy and an IP whitelist.
I even managed to render it unable to cache anything by just pulling busybox and then running my Registry listing tool: the cache proxied the requests for listing the busybox repository, downloading the manifests for all tags, eventually getting the TOOMANYREQUESTS error: "You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit".
For this reason I believe that a vanilla Registry server with "manually" pushed images is better than a pull through cache.
So, the situation of this ticket is the same. We still have an insecure registry in a VM with all the drawbacks that this approach has. The ticket is still open for implementation with better solution.
If we're serving base images I don't see the issue with having an "insecure" (it's really just plain HTTP) Registry as the images have a hash. Ubuntu repositories are plain HTTP with a hash signed by GPG key.
Ok, insecure should be fine for this case...
What about asking for a VM in our Infra with access from OSD and O3?
Updated by rbranco almost 2 years ago
Ticket opened by @jlausuch for a VM: https://sd.suse.com/servicedesk/customer/portal/1/SD-113323
The garbage collector has a bug when dealing with multi-arch manifests, so we should do manual cleanup or use Gitlab:
Updated by rbranco almost 2 years ago
Updated by jlausuch almost 2 years ago
- Related to action #126947: Create Ansible configuration for registry.qe.suse.de added
Updated by jlausuch almost 2 years ago
Should we resolve this one and continue in #126947?
Updated by jlausuch over 1 year ago
- Status changed from In Progress to Blocked
Need to fix infra issues. Some workers can't access to this registry yet.
Updated by jlausuch over 1 year ago
- Related to action #130222: Update host and refresh container images we have in our registry added
Updated by jlausuch over 1 year ago
- Related to action #130237: [investigation] Possibility to host some container images in registry.opensuse.org added
Updated by rbranco over 1 year ago
An approach to (ab)use the GitHub Container Registry: https://github.com/ricardobranco777/images/pull/1
Updated by ph03nix over 1 year ago
Solution suggestion:
- Use
k2.qe.suse.de
as container registry for OSD and OOO. This machine would reside within the RD network of SUSE and would not be reachable from the outside - For PublicCloud test runs, keep using our registry server on AWS (3.71.98.16)
For the first point we would need to ask the infra team to open port 5000.
Updated by jlausuch over 1 year ago
ph03nix wrote:
Solution suggestion:
- Use
k2.qe.suse.de
as container registry for OSD and OOO. This machine would reside within the RD network of SUSE and would not be reachable from the outside- For PublicCloud test runs, keep using our registry server on AWS (3.71.98.16)
For the first point we would need to ask the infra team to open port 5000.
I am inclined to disagree. I don't see a point in maintaining 2 instances for the registry, it's even worse than current situation. It's duplicating the work... I would just keep the AWS for everything...
Let Ricardo investigate the opensuse registry solution. If we see it's too much burden for us, then let's keep AWS machine only and improve it.
Updated by ph03nix over 1 year ago
Let Ricardo investigate the opensuse registry solution. If we see it's too much burden for us, then let's keep AWS machine only and improve it.
Ricardo, can you give us a status update on this one? I renewed the AWS machine, so we can just use that one, if necessary or convenient.
Updated by rbranco over 1 year ago
ph03nix wrote:
Let Ricardo investigate the opensuse registry solution. If we see it's too much burden for us, then let's keep AWS machine only and improve it.
Ricardo, can you give us a status update on this one? I renewed the AWS machine, so we can just use that one, if necessary or convenient.
We can't use registry.opensuse.org and the issues we have with setting up our own with access from o.s.d & o3 can't be solved by IT alone.
So we have to stick with that VM.