Project

General

Profile

action #43715

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #43706: [epic] Generate "download&use" docker image of openQA for SUSE QA

Update upstream dockerfiles to provide an easy to use docker image of workers

Added by SLindoMansilla over 2 years ago. Updated 4 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2018-11-13
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Acceptance criteria

  • AC1: DONE Ensure there is a working Docker file that builds an image
  • AC2: DONE Documented steps to run the worker from the image
  • AC3: DONE The worker can connect to a webui and run tests

Suggestions


Related issues

Related to openQA Project - action #69355: [spike] redundant/load-balancing webui deployments of openQAResolved2020-07-25

Related to openQA Project - action #73450: POC: Create openQA worker container image (feature)Resolved2020-10-16

Blocked by openQA Project - action #43712: Update upstream dockerfiles to provide an easy to use docker image of openQA-webuiResolved2018-11-13

History

#1 Updated by okurz over 2 years ago

  • Target version set to Milestone 22

#2 Updated by okurz over 2 years ago

  • Description updated (diff)

#3 Updated by szarate over 2 years ago

  • Related to action #43718: Docker image for webui and workers are versioned and uploaded to obs registry added

#4 Updated by szarate over 2 years ago

  • Related to deleted (action #43718: Docker image for webui and workers are versioned and uploaded to obs registry)

#5 Updated by okurz over 2 years ago

  • Target version changed from Milestone 22 to Milestone 24

#6 Updated by okurz over 2 years ago

  • Subject changed from [functional][u] Update upstream dockerfiles to provide an easy to use docker image of workers to Update upstream dockerfiles to provide an easy to use docker image of workers
  • Category set to Feature requests
  • Priority changed from Normal to Low
  • Target version deleted (Milestone 24)

#7 Updated by ilausuch about 1 year ago

  • Assignee set to ilausuch

#8 Updated by okurz about 1 year ago

  • Target version set to Ready

#9 Updated by cdywan 11 months ago

  • Blocked by action #43712: Update upstream dockerfiles to provide an easy to use docker image of openQA-webui added

#10 Updated by cdywan 11 months ago

Let's consider this Blocked in the sense that the steps required are the same with a focus on the worker vs. the web UI.

#11 Updated by cdywan 10 months ago

  • Status changed from Workable to Blocked

#12 Updated by ilausuch 10 months ago

Before to work on this ticket I would like to complete this one https://progress.opensuse.org/issues/69355 because it's related and maybe dependent on how this is resolved

#13 Updated by cdywan 10 months ago

  • Related to action #69355: [spike] redundant/load-balancing webui deployments of openQA added

#14 Updated by ilausuch 10 months ago

  • Status changed from Blocked to In Progress

#15 Updated by ilausuch 10 months ago

One interesting thing should we consider is about the --link parameter in the docker run for the workers. There is an alert about that in this link https://docs.docker.com/network/links/

#16 Updated by ilausuch 10 months ago

Two initial problems to fix during the build

  • Package 'qemu-uefi-aarch64' not found
  • /root/qemu/kvm-mknod.sh: line 6: gunzip: command not found

#18 Updated by ilausuch 10 months ago

New problems to solve

  • gzip: /proc/config.gz: No such file or directory
  • mknod: /dev/kvm: File exists
  • Unable to make /dev/kvm node; software emulation will be used (This can happen if the container is run without -privileged)

In the last case, I think is because of configuration

#19 Updated by okurz 10 months ago

  • Related to action #73450: POC: Create openQA worker container image (feature) added

#20 Updated by cdywan 9 months ago

  • Description updated (diff)

#21 Updated by okurz 9 months ago

  • Due date set to 2020-10-28

As I think you are actually working on this we should aim for not exceeding our usual cycle times for tickets, hence setting the due date to what I consider feasible and useful. Please make sure to provide an update soon and feel free to unassign again if you are not actually (anymore?) working on this

#22 Updated by ilausuch 9 months ago

Doing some tests on the worker I realized that the worker cannot start because:

[info] [pid:44] Project dir for host http://webui_haproxy_1 is /var/lib/openqa/share
[info] [pid:44] Registering with openQA http://webui_haproxy_1
[info] [pid:44] Establishing ws connection via ws://webui_haproxy_1/api/v1/ws/1
[warn] [pid:44] Unable to upgrade to ws connection via http://webui_haproxy_1/api/v1/ws/1 - trying again in 10 seconds

Facts:

  • The worker can connect to the web UI API and authentificate the user
  • The worker cannot connect to the websockets

To solve that Christian and me were working on a replacement for haproxy with nginx allowing the reverse proxy for the websokets. This seems to work

server {
  listen       80;
  listen       9526;
  server_name  localhost;

  location ~ /api/v1/ws/(.*) {
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host $http_host;
    proxy_set_header X-NginX-Proxy true;

    rewrite ^//api/v1/ws/(.*)$ http://webui_websockets_1:9527/ws/$1;
    proxy_pass http://webui_websockets_1:9527;
    proxy_redirect off;
  }

  location / {
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host $http_host;
    proxy_set_header X-NginX-Proxy true;

    proxy_pass http://webui_haproxy_1:9526;
    proxy_redirect off;
  }
}

Facts:

  • nginx pass to the websockets the query
  • nginx always return Not Found in plain text (tested from the worker, and also in the Christian environment)
  • Christian checked that in O3 the websockets service also returns Not Found for the same path /api/v1/ws/1 however in O3 the worker is starting and in this test not. Why is the reason for that?

Ideas:

  • check in a fake websockets service that nginx is sending the correct path
  • Check what the worker spect from the websockets

Closely related with https://progress.opensuse.org/issues/69355. Both tickets are interdependent

#23 Updated by ilausuch 9 months ago

  • Status changed from In Progress to Feedback

#24 Updated by mkittler 9 months ago

I see that your NGINX config differs from what we have in our repository: https://github.com/os-autoinst/openQA/blob/master/etc/nginx/vhosts.d/openqa.conf

Maybe the lack of proxy_set_header Upgrade leads to the 404 response.

As mentioned in the chat yesterday: It is also possible to specify the web UI port directly within the worker config. The worker will then use this port +2 for the web sockets route making it unnecessary to proxy the web socket connection. Having NGINX proxying the web socket connection would be the nicer solution of course.

#25 Updated by ilausuch 9 months ago

Works, preparing the https://progress.opensuse.org/issues/69355 finishing before

#26 Updated by ilausuch 9 months ago

  • Status changed from Feedback to In Progress

#27 Updated by ilausuch 9 months ago

Thanks Marius, this worked and now we have a new PR for the web UI that solves all these problems. Now I am preparing this PR to use api keys and hosts for use with https://progress.opensuse.org/issues/69355

#28 Updated by ilausuch 9 months ago

Additionally and following the subject of this task "provide an easy to use..." I created an script to launch a pool of workers
https://github.com/os-autoinst/openQA/pull/3495

#29 Updated by cdywan 9 months ago

  • Description updated (diff)

#30 Updated by ilausuch 9 months ago

  • Description updated (diff)

The acceptance criteria AC1, AC2 and AC3 are covered by https://github.com/os-autoinst/openQA/pull/3475 and https://github.com/os-autoinst/openQA/pull/3495
My next step is to build the docker image

#32 Updated by cdywan 9 months ago

  • Due date changed from 2020-10-28 to 2020-11-13

For the record, these questions were still being discussed today:

  • We might need different containers for specific architectures with their own tags
  • What base image to use
  • Fetching repos via http://download.opensuse.org

Setting up working builds in a home project was far from straightforward even with OBS expertise on hand so I recommend this gets documented in a blog post or wiki page after, although it's not required to finish the ticket.

#33 Updated by pdostal 9 months ago

As well as I know, no tags are required for different architectures. There can be multiple builds of the same Dockerfile with the same tag for different architectures.

#35 Updated by cdywan 9 months ago

  • Description updated (diff)
  • Status changed from In Progress to Feedback

Publishing images on OBS is raising a lot of questions on top of a usable image that have nothing to do with containerizing the worker in general, and we actually have #43718 so I'm removing it from the AC here.

#37 Updated by okurz 9 months ago

Please keep in mind, even if this is not explictly mentioned: We probably all agree that without automatic tests we would not call any contributions properly long-term supportable. These tests could be very simple, e.g. something like podman run --rm -it .... openqa-worker --help or something. But if you do not plan to add tests as part of this ticket which is of course ok then please create a follow-up so that we don't forget that.

#39 Updated by cdywan 9 months ago

okurz wrote:

Please keep in mind, even if this is not explictly mentioned: We probably all agree that without automatic tests we would not call any contributions properly long-term supportable. These tests could be very simple, e.g. something like podman run --rm -it .... openqa-worker --help or something. But if you do not plan to add tests as part of this ticket which is of course ok then please create a follow-up so that we don't forget that.

Ack. The specific AC however had build and publish in OBS in it (which #43718 basically is), and we don't even know that we can run tests in OBS after building the image. Although adding two lines in our existing setup might be more straightforward.

#40 Updated by ilausuch 8 months ago

  • Status changed from Feedback to Resolved

The PR is merged and there is a build in OBS (https://build.opensuse.org/package/show/home:ilausuch:branches:devel:openQA/openQA_container_image_worker_x86) that builds the container image

#41 Updated by okurz 8 months ago

unfortunately it seems you overlooked #43715#note-37 . I created #43706 for that now.

#42 Updated by okurz 4 months ago

  • Due date deleted (2020-11-13)

Also available in: Atom PDF