Project

General

Profile

action #110524

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #81060: [epic] openQA web UI in kubernetes

[timeboxed:20h][spike] openQA proof-of-concept within kubernetes size:M

Added by okurz 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2022-05-02
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Acceptance criteria

  • AC1: Some draft (branch, gist, whatever) exists with results in the ticket

Suggestions


Related issues

Related to openQA Project - action #110725: Unexpected behavior for cache service under k3s when the CACHE_MIN_FREE_PERCENTAGE is set size:MWorkable2022-05-06

History

#1 Updated by okurz 2 months ago

  • Status changed from New to Workable

#2 Updated by cdywan 2 months ago

openqa-staging-2.qa.suse.de could be used for that - we discussed briefly after the daily that it's in need of an upgrade and could also be setup anew for k8s since we don't currently need two machines with the same setup

#3 Updated by jbaier_cz 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to jbaier_cz

#4 Updated by openqa_review about 2 months ago

  • Due date set to 2022-05-20

Setting due date based on mean cycle time of SUSE QE Tools

#5 Updated by jbaier_cz about 2 months ago

After initial investigation, it turned out that the conversion from docker-compose to kubernetes/helm is pretty trivial. By using kompose convert or kompose convert --chart, one can easily create kubernetes yaml files (or corresponding helm chart templates) which can be then deployed to kubernetes. Of course, the resulting files are only as good as the docker-compose file itself. In our case, it is more targeted to a development on the localhost than a production grade setup, so some tweaking in the file is still needed. Nevertheless, it still gives a nice file to start with. So basically with zypper in kompose helm kubernetes1.23-client one can get the proper tools to do the necessary bits.

sed -i -e 's#image: openqa_worker#image: registry.opensuse.org/devel/openqa/containers15.3/openqa_worker:latest#' docker-compose.yaml
export OPENQA_WORKER_REPLICAS=1
kompose convert --with-kompose-annotation=false --volumes emptyDir -c -o worker
helm install --generate-name worker

That will basically install the worker into the cluster. Of course, it will not start as the configuration is missing and there are no data. I covered the configuration by creating a ConfigMap. The data (factory and tests folders) can be solved in two different ways:

  1. Creating a PersistentVolumeClaims with the proper content, either by having a separate container (similar to what openqa_data container provides) which will obtain the data somehow; by having a NFS mount, or some other option. In this way, the worker should also be able to install missing dependencies if the install_deps.* script is found. I will need to invest some time to explore this way more.

  2. By enabling worker-cache, the worker can download assets/tests/needles itself. This needs rsync to be installed in the worker image (this is missing in the current Dockerfile) and before the main /run_openqa_worker.sh, two additional binaries needs to be started (with proper permissions): openqa-worker-cacheservice-minion and openqa-workercache-daemon. In this case, there is also no install_deps.sh, so the test dependencies will not be installed automatically. I can probably make this work by introducing another key in the ConfigMap and mounting that key directly into tests/install_deps.sh

After those manual modifications, I was able to successfully install a worker inside k3s cluster, register it against my personal openQA instance and clone a job: http://polaris.suse.cz/tests/2418; the next steps are create the necessary yaml files to automate the changes I made, quickly investigate some of the pitfalls (like non working CACHE_MIN_FREE_PERCENTAGE) and wrap this up in form of a very basic helm chart.

#6 Updated by jbaier_cz about 2 months ago

  • Related to action #110725: Unexpected behavior for cache service under k3s when the CACHE_MIN_FREE_PERCENTAGE is set size:M added

#7 Updated by jbaier_cz about 2 months ago

The item "Figure out if is necessary to publish the helm chart and where" could be simply answered by a howto page https://helm.sh/docs/howto/chart_releaser_action/ where a separate GitHub repository for charts and accompanied GitHub page for the Helm repository are suggested.

#8 Updated by jbaier_cz about 2 months ago

  • Status changed from In Progress to Feedback

I drafted a pull request https://github.com/os-autoinst/openQA/pull/4650 with a very simple Helm chart. It is far from perfect and it only offers a very few customization. In the default setting, it will create one container with the worker and without any tests/needles (although it will have a volume prepared for the data -- similarly to the container setup for docker/podman). The test against such worker will of course fail: http://polaris.suse.cz/tests/2455#details

The alternative is to configure an asset cache. In that case, there will be several containers inside the worker pod with a shared cache volume, so the assets/tests/needles can be downloaded via rsync. So far, there is no way how to install test dependencies automatically (will be probably added later). An example of a cloned test running on such asset cache enabled worker can be found on http://polaris.suse.cz/tests/2451.

#9 Updated by jbaier_cz about 2 months ago

More has been added to the PR. We now can create a fully functioning openQA webui and worker: http://odin.qam.suse.cz/tests/3. The setup so far can be done only using the k3s ingress (so without nginx), the only small change needed is in https://github.com/os-autoinst/openQA/pull/4653.

The chart still misses some features, there are for example no simple way, how to get data inside the containers. The customization of the charts is also kept on minimal level (at least the basic variables provided by a helm sample template should be added in the near feature). Also, there is no tests yet.

#10 Updated by jbaier_cz about 2 months ago

  • Status changed from Feedback to Resolved

Draft PR with all necessary info exists.

#11 Updated by jbaier_cz about 2 months ago

  • Due date deleted (2022-05-20)

Also available in: Atom PDF