Project

General

Profile

Actions

action #91046

closed

CI: "webui-docker-compose" seems that eventually fails again

Added by ilausuch almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
-
Start date:
2021-04-13
Due date:
% Done:

0%

Estimated time:

Description

Motivation

In #89731 we introduced a initial webui container in charge of initializing the database. We have a test where the health check failed
https://github.com/os-autoinst/openQA/pull/3838/checks?check_run_id=2329551052

The problem is that the docker-compose exit with an error because the health check of the webuid_db_init container failed

         Name                       Command                   State                                        Ports                                 
------------------------------------------------------------------------------------------------------------------------------------------------
webui_db_1              docker-entrypoint.sh postgres    Up (healthy)     5432/tcp                                                              
webui_webui_db_init_1   sh -c chmod -R a+rwX /data ...   Up (unhealthy)   443/tcp, 80/tcp, 0.0.0.0:49153->9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
make: *** [Makefile:306: test-containers-compose] Error 1

The healthcheck is this one
https://github.com/os-autoinst/openQA/blob/abd9a2297430377cd9876c3cbcec8b2cb4302722/container/webui/docker-compose.yaml#L116

Take in consideration the DB error lines

db_1             | 2021-04-13 02:43:08.038 UTC [98] ERROR:  relation "api_keys" does not exist at character 15
db_1             | 2021-04-13 02:43:08.038 UTC [98] STATEMENT:  select * from api_keys;
db_1             | 2021-04-13 02:43:10.441 UTC [100] ERROR:  relation "dbix_class_deploymenthandler_versions" does not exist at character 24
db_1             | 2021-04-13 02:43:10.441 UTC [100] STATEMENT:  SELECT me.version FROM dbix_class_deploymenthandler_versions me ORDER BY id DESC LIMIT $1
db_1             | 2021-04-13 02:43:10.446 UTC [100] ERROR:  relation "dbix_class_deploymenthandler_versions" does not exist at character 24

Acceptance criteria

  • AC 1: Determine the cause of the failure
  • AC 2: Fix the problem

Related issues 2 (0 open2 closed)

Related to openQA Project - action #89731: containers: The deploy using docker-compose is not stable and eventually fails Resolvedilausuch2021-03-09

Actions
Related to openQA Project - action #90614: CI test webui-docker-compose failed but PR was merged anywayResolvedilausuch2021-04-012021-04-23

Actions
Actions #1

Updated by ilausuch almost 3 years ago

  • Description updated (diff)
Actions #2

Updated by ilausuch almost 3 years ago

  • Description updated (diff)
Actions #3

Updated by ilausuch almost 3 years ago

  • Related to action #89731: containers: The deploy using docker-compose is not stable and eventually fails added
Actions #4

Updated by ilausuch almost 3 years ago

Some discoveries

We have in the healthcheck for the DB (
https://github.com/os-autoinst/openQA/blob/abd9a2297430377cd9876c3cbcec8b2cb4302722/container/webui/docker-compose.yaml#L133)

select * from api_keys;' | psql -U openqa openqa

This check is not valid because we don't have an error code != 0

{"Status":"unhealthy","FailingStreak":0,"Log":[{"Start":"2021-04-13T12:01:03.499631783+02:00","End":"2021-04-13T12:01:03.698939023+02:00","ExitCode":0,"Output":"ERROR:  relation \"api_keys\" does not exist\nLINE
 1: select * from api_keys;\n                      ^\n"}]}

Then in spiteof this error, the healthcheck is OK, and docker-compose continues with all the rest of the script.

However, this is a undetected death lock because this table cannot exists until the webui_init starts, and the webui_init cannot start until DB has this table. So the further solution is:

  • Change the healthcheck to something correct for the defined workflow
  • Check if healthchecks are used on dependences
Actions #5

Updated by ilausuch almost 3 years ago

More investigation

I discovered that the psql doesn't generate an exit 1 when the SQL command fails

root@cd6126e970f3:/# echo 'select * from api_keys2;' | psql -U openqa openqa
ERROR:  relation "api_keys2" does not exist
LINE 1: select * from api_keys2;
                      ^
root@cd6126e970f3:/# echo  $?
0
Actions #7

Updated by ilausuch almost 3 years ago

In spite of the sequence seems correct now I am getting the same DB errors

db_1             | 2021-04-13 12:00:30.838 UTC [75] LOG:  database system was shut down at 2021-04-13 12:00:30 UTC
db_1             | 2021-04-13 12:00:30.843 UTC [1] LOG:  database system is ready to accept connections
db_1             | 2021-04-13 12:00:32.556 UTC [83] ERROR:  relation "dbix_class_deploymenthandler_versions" does not exist at character 24
db_1             | 2021-04-13 12:00:32.556 UTC [83] STATEMENT:  SELECT me.version FROM dbix_class_deploymenthandler_versions me ORDER BY id DESC LIMIT $1
db_1             | 2021-04-13 12:00:32.559 UTC [83] ERROR:  relation "dbix_class_deploymenthandler_versions" does not exist at character 24
db_1             | 2021-04-13 12:00:32.559 UTC [83] STATEMENT:  SELECT COUNT( * ) FROM dbix_class_deploymenthandler_versions me
db_1             | 2021-04-13 12:00:52.597 UTC [137] ERROR:  relation "mojo_migrations" does not exist at character 21
db_1             | 2021-04-13 12:00:52.597 UTC [137] STATEMENT:  SELECT version FROM mojo_migrations WHERE name = $1

I am not sure if this is directly related. However seems doesn't affect to the docker-composer workflow

        Name                       Command                  State                                       Ports
----------------------------------------------------------------------------------------------------------------------------------------------
webui_db_1              docker-entrypoint.sh postgres    Up (healthy)   5432/tcp
webui_gru_1             sh -c /root/run_openqa.sh| ...   Up (healthy)   443/tcp, 80/tcp, 9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
webui_livehandler_1     /root/run_openqa.sh              Up (healthy)   443/tcp, 80/tcp, 9526/tcp, 9527/tcp, 0.0.0.0:9528->9528/tcp, 9529/tcp
webui_nginx_1           /entrypoint.sh                   Up (healthy)   0.0.0.0:9526->9526/tcp
webui_scheduler_1       /root/run_openqa.sh              Up (healthy)   443/tcp, 80/tcp, 9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
webui_websockets_1      /root/run_openqa.sh              Up (healthy)   443/tcp, 80/tcp, 9526/tcp, 0.0.0.0:9527->9527/tcp, 9528/tcp, 9529/tcp
webui_webui_1           /root/run_openqa.sh              Up (healthy)   443/tcp, 80/tcp, 0.0.0.0:32793->9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
webui_webui_2           /root/run_openqa.sh              Up (healthy)   443/tcp, 80/tcp, 0.0.0.0:32792->9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
webui_webui_db_init_1   sh -c chmod -R a+rwX /data ...   Up (healthy)   443/tcp, 80/tcp, 0.0.0.0:32791->9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp

Actions #8

Updated by ilausuch almost 3 years ago

  • Related to action #90614: CI test webui-docker-compose failed but PR was merged anyway added
Actions #9

Updated by livdywan almost 3 years ago

ilausuch wrote:

I created this PR https://github.com/os-autoinst/openQA/pull/3840

PR got merged

@ilausuch The ticket is still New, did you want to update that?

Actions #10

Updated by ilausuch almost 3 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF