action #91046
closedCI: "webui-docker-compose" seems that eventually fails again
0%
Description
Motivation¶
In #89731 we introduced a initial webui container in charge of initializing the database. We have a test where the health check failed
https://github.com/os-autoinst/openQA/pull/3838/checks?check_run_id=2329551052
The problem is that the docker-compose exit with an error because the health check of the webuid_db_init container failed
Name Command State Ports
------------------------------------------------------------------------------------------------------------------------------------------------
webui_db_1 docker-entrypoint.sh postgres Up (healthy) 5432/tcp
webui_webui_db_init_1 sh -c chmod -R a+rwX /data ... Up (unhealthy) 443/tcp, 80/tcp, 0.0.0.0:49153->9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
make: *** [Makefile:306: test-containers-compose] Error 1
The healthcheck is this one
https://github.com/os-autoinst/openQA/blob/abd9a2297430377cd9876c3cbcec8b2cb4302722/container/webui/docker-compose.yaml#L116
Take in consideration the DB error lines
db_1 | 2021-04-13 02:43:08.038 UTC [98] ERROR: relation "api_keys" does not exist at character 15
db_1 | 2021-04-13 02:43:08.038 UTC [98] STATEMENT: select * from api_keys;
db_1 | 2021-04-13 02:43:10.441 UTC [100] ERROR: relation "dbix_class_deploymenthandler_versions" does not exist at character 24
db_1 | 2021-04-13 02:43:10.441 UTC [100] STATEMENT: SELECT me.version FROM dbix_class_deploymenthandler_versions me ORDER BY id DESC LIMIT $1
db_1 | 2021-04-13 02:43:10.446 UTC [100] ERROR: relation "dbix_class_deploymenthandler_versions" does not exist at character 24
Acceptance criteria¶
- AC 1: Determine the cause of the failure
- AC 2: Fix the problem
Updated by ilausuch over 3 years ago
- Related to action #89731: containers: The deploy using docker-compose is not stable and eventually fails added
Updated by ilausuch over 3 years ago
Some discoveries
We have in the healthcheck for the DB (
https://github.com/os-autoinst/openQA/blob/abd9a2297430377cd9876c3cbcec8b2cb4302722/container/webui/docker-compose.yaml#L133)
select * from api_keys;' | psql -U openqa openqa
This check is not valid because we don't have an error code != 0
{"Status":"unhealthy","FailingStreak":0,"Log":[{"Start":"2021-04-13T12:01:03.499631783+02:00","End":"2021-04-13T12:01:03.698939023+02:00","ExitCode":0,"Output":"ERROR: relation \"api_keys\" does not exist\nLINE
1: select * from api_keys;\n ^\n"}]}
Then in spiteof this error, the healthcheck is OK, and docker-compose continues with all the rest of the script.
However, this is a undetected death lock because this table cannot exists until the webui_init starts, and the webui_init cannot start until DB has this table. So the further solution is:
- Change the healthcheck to something correct for the defined workflow
- Check if healthchecks are used on dependences
Updated by ilausuch over 3 years ago
More investigation
I discovered that the psql doesn't generate an exit 1 when the SQL command fails
root@cd6126e970f3:/# echo 'select * from api_keys2;' | psql -U openqa openqa
ERROR: relation "api_keys2" does not exist
LINE 1: select * from api_keys2;
^
root@cd6126e970f3:/# echo $?
0
Updated by ilausuch over 3 years ago
I created this PR https://github.com/os-autoinst/openQA/pull/3840
Updated by ilausuch over 3 years ago
In spite of the sequence seems correct now I am getting the same DB errors
db_1 | 2021-04-13 12:00:30.838 UTC [75] LOG: database system was shut down at 2021-04-13 12:00:30 UTC
db_1 | 2021-04-13 12:00:30.843 UTC [1] LOG: database system is ready to accept connections
db_1 | 2021-04-13 12:00:32.556 UTC [83] ERROR: relation "dbix_class_deploymenthandler_versions" does not exist at character 24
db_1 | 2021-04-13 12:00:32.556 UTC [83] STATEMENT: SELECT me.version FROM dbix_class_deploymenthandler_versions me ORDER BY id DESC LIMIT $1
db_1 | 2021-04-13 12:00:32.559 UTC [83] ERROR: relation "dbix_class_deploymenthandler_versions" does not exist at character 24
db_1 | 2021-04-13 12:00:32.559 UTC [83] STATEMENT: SELECT COUNT( * ) FROM dbix_class_deploymenthandler_versions me
db_1 | 2021-04-13 12:00:52.597 UTC [137] ERROR: relation "mojo_migrations" does not exist at character 21
db_1 | 2021-04-13 12:00:52.597 UTC [137] STATEMENT: SELECT version FROM mojo_migrations WHERE name = $1
I am not sure if this is directly related. However seems doesn't affect to the docker-composer workflow
Name Command State Ports
----------------------------------------------------------------------------------------------------------------------------------------------
webui_db_1 docker-entrypoint.sh postgres Up (healthy) 5432/tcp
webui_gru_1 sh -c /root/run_openqa.sh| ... Up (healthy) 443/tcp, 80/tcp, 9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
webui_livehandler_1 /root/run_openqa.sh Up (healthy) 443/tcp, 80/tcp, 9526/tcp, 9527/tcp, 0.0.0.0:9528->9528/tcp, 9529/tcp
webui_nginx_1 /entrypoint.sh Up (healthy) 0.0.0.0:9526->9526/tcp
webui_scheduler_1 /root/run_openqa.sh Up (healthy) 443/tcp, 80/tcp, 9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
webui_websockets_1 /root/run_openqa.sh Up (healthy) 443/tcp, 80/tcp, 9526/tcp, 0.0.0.0:9527->9527/tcp, 9528/tcp, 9529/tcp
webui_webui_1 /root/run_openqa.sh Up (healthy) 443/tcp, 80/tcp, 0.0.0.0:32793->9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
webui_webui_2 /root/run_openqa.sh Up (healthy) 443/tcp, 80/tcp, 0.0.0.0:32792->9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
webui_webui_db_init_1 sh -c chmod -R a+rwX /data ... Up (healthy) 443/tcp, 80/tcp, 0.0.0.0:32791->9526/tcp, 9527/tcp, 9528/tcp, 9529/tcp
Updated by ilausuch over 3 years ago
- Related to action #90614: CI test webui-docker-compose failed but PR was merged anyway added
Updated by livdywan over 3 years ago
ilausuch wrote:
I created this PR https://github.com/os-autoinst/openQA/pull/3840
PR got merged
@ilausuch The ticket is still New, did you want to update that?