Actions
action #89731
closedcoordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
coordination #89842: [epic] Scalable and streamlined docker-compose based openQA setup
containers: The deploy using docker-compose is not stable and eventually fails
Description
Motivation¶
The command 'docker-compose up' is executed without errors in normal circustancies, but sometimes some of the containers fail later after the docker-compose has finished.
$ docker-compose up -d
Creating webui_db_1 ... done
Creating webui_nginx_1 ... done
Creating webui_data_1 ... done
Creating webui_scheduler_1 ... done
Creating webui_webui_1 ... done
Creating webui_webui_2 ... done
Creating webui_gru_1 ... done
Creating webui_websockets_1 ... done
Creating webui_livehandler_1 ... done
$ echo $?
0
docker-compose ps
Name Command State Ports
----------------------------------------------------------------------------------------------------------------------------------------
webui_data_1 /bin/sh -c /usr/bin/tail - ... Up
webui_db_1 docker-entrypoint.sh postgres Up 5432/tcp
webui_gru_1 /root/run_openqa.sh Up 443/tcp, 80/tcp, 9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
webui_livehandler_1 /root/run_openqa.sh Up 443/tcp, 80/tcp, 9526/tcp, 9527/tcp, 0.0.0.0:9528->9528/tcp, 9529/tcp
webui_nginx_1 /entrypoint.sh Up 0.0.0.0:9526->9526/tcp
webui_scheduler_1 /root/run_openqa.sh Exit 255
webui_websockets_1 /root/run_openqa.sh Up 443/tcp, 80/tcp, 9526/tcp, 0.0.0.0:9527->9527/tcp, 9528/tcp, 9529/tcp
webui_webui_1 /root/run_openqa.sh Up 443/tcp, 80/tcp, 0.0.0.0:32789->9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
webui_webui_2 /root/run_openqa.sh Up 443/tcp, 80/tcp, 0.0.0.0:32790->9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
The errors in schedulers are:
scheduler_1 | failed to run SQL in /usr/share/openqa/script/../dbicdh/PostgreSQL/deploy/90/001-auto-__VERSION.sql: DBIx::Class::DeploymentHandler::DeployMethod::SQL::Translator::try {...} (): DBI Exception: DBD::Pg::db do failed: ERROR: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
scheduler_1 | DETAIL: Key (typname, typnamespace)=(dbix_class_deploymenthandler_versions_id_seq, 2200) already exists. at inline delegation in DBIx::Class::DeploymentHandler for deploy_method->deploy (attribute declared in /usr/lib/perl5/vendor_perl/5.26.1/DBIx/Class/DeploymentHandler/WithApplicatorDumple.pm at line 51) line 18
scheduler_1 | (running line 'CREATE TABLE dbix_class_deploymenthandler_versions ( id serial NOT NULL, version character varying(50) NOT NULL, ddl text, upgrade_sql text, PRIMARY KEY (id), CONSTRAINT dbix_class_deploymenthandler_versions_version UNIQUE (version) )') at /usr/lib/perl5/vendor_perl/5.26.1/DBIx/Class/DeploymentHandler/DeployMethod/SQL/Translator.pm line 263.
scheduler_1 | DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error. Rolling back. at /usr/share/openqa/script/openqa-scheduler line 0
scheduler_1 | DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error. Rolling back. at /usr/share/openqa/script/openqa-scheduler line 0
The problem is that every container that uses openqa_webui image (webui_webui, webui_websockets, webui_scheduler, webui_livehandler) try to initialize the DB tables. And as all the containers are initialized at the same time surges conflicts.
Acceptance Criteria¶
- AC 1: All the containers remain up after execute docker-compose up
* AC 2: Expand the docker-compose CI test to include this case
Suggestions¶
- Use dependencies (depends_on) based on health-checks to sort the startup of all the containers.
- Check current solution on https://github.com/os-autoinst/openQA/pull/3755
Actions