Project

General

Profile

coordination #80142

[saga][epic] Scale out openQA: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

Added by okurz about 2 months ago. Updated 8 days ago.

Status:
Blocked
Priority:
High
Assignee:
Category:
Feature requests
Target version:
Start date:
2018-09-26
Due date:
2021-01-15
% Done:

54%

Estimated time:
(Total: 0.00 h)
Difficulty:

Description

Motivation

Single instances of the webui can cause longer downtimes and make upgrades of OS more risky, e.g. when we do not have management access to VMs that might fail to reboot. Also, load-balancing can help as well as having switch-over deployments possible for easier testing, staging, etc. Nowadays a container based deployment becomes industry standard which we should fully support and prominently feature as supported both for simple single-instance setups of individual persons as well as multi-node setups in clusters

Acceptance criteria

  • AC1: an openQA infrastructure deployed on kubernetes is part of our continuous testing setup
  • AC2: documentation exists how to setup redundant load-balancing infrastructures
  • AC3: The support for openQA on container management frameworks is prominently presented
  • AC4: documentation exists for simple single-instance setups, e.g. "get your openQA tests to run in less than 5 minutes"

Suggestions

Based on the spike conducted in #69355 we can streamline the support, add documentation, introduce proper testing, consider running that setup as part of our DevOps structure, etc.

state-of-the-art is k8s so we should aim for that. Maybe a "docker compose" file is a good intermediate step, then k8s with a helm chart, potentially also some setup based on gitlab, see
https://docs.gitlab.com/ee/ci/environments/incremental_rollouts.html#blue-green-deployment


Subtasks

action #41600: fallback mechanism for apache, e.g. on osd New

coordination #43706: [epic] Generate "download&use" docker image of openQA for SUSE QABlockedokurz

action #43712: Update upstream dockerfiles to provide an easy to use docker image of openQA-webuiResolvedilausuch

action #43715: Update upstream dockerfiles to provide an easy to use docker image of workersResolvedilausuch

action #43718: Docker image for webui and workers are versioned and uploaded to obs registryResolvedilausuch

action #80516: Docker image for webui and workers on docker hub reflect current stateWorkable

action #80518: provide container images for aarch64New

action #80520: Automatic tests for our openQA containers - webUI onlyIn Progressilausuch

action #80534: publication+demo for updated openQA containersBlockedilausuch

action #80682: Automatic tests for our openQA containers - worker onlyFeedbackilausuch

action #80684: Automatic tests for our openQA containers - worker+webui connectionWorkable

action #81118: automatic container tests for os-autoinstResolvedokurz

action #55262: Install Pgpool-II or PgBouncer before PostgreSQL for openQA instances, e.g. to be used on OSDNew

action #69355: [spike] redundant/load-balancing webui deployments of openQAResolvedilausuch

openQA Infrastructure - action #70978: automatic reboots on o3 to activate new kernel versionsResolvedokurz

openQA Infrastructure - action #71098: openqaworker3 down but no alert was raisedResolvedokurz

openQA Infrastructure - action #73174: [osd][alert] Job age (scheduled) (median) alertResolvedokurz

action #73447: POC: Create openQA Web Application container image (feature)Resolvedilausuch

action #73450: POC: Create openQA worker container image (feature)Resolvedilausuch

openQA Infrastructure - action #76786: Configure static hostnames with salt for all salt nodesResolvedokurz

openQA Infrastructure - action #76876: Find a better (automated) way to inform infra about hanging (arm) workersResolvedcdywan

action #76990: Improve documentation for redundant/load-balancing webui deployments of openQAWorkable

coordination #77698: [epic] synchronous qemu based system level test in pull request CI runs, e.g. standalone isotovideo or openQA testsNew

action #77905: CI pipeline proof-of-concept running isotovideoResolvedokurz

openQA Infrastructure - coordination #78206: [epic] 2020-11-18 nbg power outage aftermathBlockedokurz

openQA Infrastructure - action #80540: idea: Conduct "power outage drills", e.g. once every half-year?Workable

openQA Infrastructure - action #80542: Configure "automatic power-on" after power loss for openqaworker1Workable

openQA Infrastructure - action #80544: Ensure that IPMI for powerqaworker-qam works reliablyWorkable

openQA Infrastructure - action #78218: [openQA][worker] Almost all openQA workers become offlineResolvedokurz

action #78390: Worker is stuck in "broken" state due to unavailable cache service (was: and even continuously fails to (re)connect to some configured web UIs)Feedbackmkittler

action #87898: Add grafana alert for "broken workers" as reported by openQAIn Progressokurz

openQA Infrastructure - action #78438: openQA webui entry "Assigned worker" shows ip instead of names as formerly - manual cleanup workResolvedokurz

coordination #80150: [epic] Scale out openQA: Easier openQA setupBlockedokurz

action #76978: How to run an openQA test in 5 minutesWorkable

action #80382: Provide installation recipes for automatic installations of openQA worker machinesWorkable

openQA Infrastructure - action #80482: qa-power8-5-kvm has been down for days, use more robust filesystem setupIn Progresscdywan

action #80908: [epic] Continuous deployment (package upgrade or config update) without interrupting currently running openQA jobsFeedbackokurz

action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobsIn Progressmkittler

action #80986: terminate worker process after executing all currently assigned jobs based on config/env variableIn Progressmkittler

action #81060: Create a helm chart to deploy web UI in kubernetesWorkable


Related issues

Related to openQA Project - action #80466: docker: Base the webUI and worker Dockerfiles in TumbleweedWorkable2020-11-26

History

#1 Updated by okurz about 2 months ago

  • Subject changed from [saga][epic] redundant/load-balancing webui deployments of openQA to [saga][epic] redundant/load-balancing webui deployments of openQA (container on kubernetes)
  • Status changed from Workable to Blocked
  • Assignee set to okurz

I created this saga as apparently the whole "container" feature block was not apparent enough for some stakeholders.

Some subtasks are there, ready to be worked on. Setting to "Blocked" by subtasks.

#2 Updated by okurz about 2 months ago

  • Subject changed from [saga][epic] redundant/load-balancing webui deployments of openQA (container on kubernetes) to [saga][epic] Scale out openQA: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
  • Description updated (diff)

#3 Updated by okurz about 2 months ago

  • Tracker changed from action to coordination

#4 Updated by ilausuch about 2 months ago

  • Related to action #80466: docker: Base the webUI and worker Dockerfiles in Tumbleweed added

#5 Updated by cdywan about 1 month ago

  • AC4: documentation exists for simple single-instance setups, e.g. "get your openQA tests to run in less than 5 minutes"

This seems oddly phrased considering how the Motivation states:

Single instances of the webui can cause longer downtimes and make upgrades of OS more risky

I would expect that as a side-effect of using kubernetes users decide what resources they want to use.

Also available in: Atom PDF