action #167524
closedtest fails in multiple_container_webui - psql error: column d.daticulocale does not exist size:S
Description
Observation¶
openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_from_containers@64bit-2G fails in
multiple_container_webui
Output:
Waiting for DB creation
psql: ERROR: column d.daticulocale does not exist
LINE 8: d.daticulocale as "ICU Locale",
The webui-docker-compose test in our CI runs into the same problem:
https://github.com/os-autoinst/openQA/actions/runs/11071019461/job/30761910367?pr=5954
Reproducible¶
Fails since (at least) Build :TW.31628
Expected result¶
Last good: :TW.31627 (or more recent)
Further details¶
Always latest result in this scenario: latest
Suggestions¶
- Add a workaround DONE, see https://github.com/os-autoinst/openQA/pull/5957 and https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/212
- Consider the easiest option which is to replace
psql -h db -U openqa --list | grep -qe openqa
withpsql -h db -U openqa openqa -c 'select 1'
- Reconsider the healthcheck logic in the container / docker-compose setup
- Remove workarounds again in both openQA and os-autoinst-distri-openqa, see PRs above
Updated by tinita 3 months ago
Looks like the webui-docker-compose failures are about the same problem:
https://github.com/os-autoinst/openQA/actions/runs/11071019461/job/30761910367?pr=5954
Updated by mkittler 3 months ago
- Status changed from In Progress to New
- Assignee deleted (
mkittler)
Looks like this test setup is hard to reproduce locally but within openQA one cannot debug it nicely. I also tried the compose setup but it failed getting the PostgeSQL container:
Trying to pull registry.opensuse.org/postgres:latest...
Error: initializing source docker://registry.opensuse.org/postgres:latest: pinging container registry registry.opensuse.org: received unexpected HTTP status: 504 Gateway Time-out
Leading to further errors:
[webui-db-init] | Waiting for DB creation
[webui-db-init] | psql: error: could not translate host name "db" to address: Name or service not known
I have to go soon so I'll unassign for now.
Probably the PostgreSQL container we use here and in the compose setup has changed and we need to adapt our code. Considering I couldn't find any mentioning of daticulocale
in our code the container might also just be completely broken.
Updated by tinita 3 months ago
The d.daticulocale
comes from psql, apparently:
This is what I get for a registry.opensuse.org/opensuse/postgres:16
:
podman run -it --rm --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d registry.opensuse.org/opensuse/postgres:16
...
% podman exec -it some-postgres bash
# su postgres
postgres@933967b14711:/> psql --version
psql (PostgreSQL) 16.4
postgres@933967b14711:/> psql --list -E
********* QUERY **********
SELECT
d.datname as "Name",
pg_catalog.pg_get_userbyid(d.datdba) as "Owner",
pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding",
CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS "Locale Provider",
d.datcollate as "Collate",
d.datctype as "Ctype",
d.daticulocale as "ICU Locale", # <-------------------------
d.daticurules as "ICU Rules",
pg_catalog.array_to_string(d.datacl, E'\n') AS "Access privileges"
FROM pg_catalog.pg_database d
ORDER BY 1;
...
and for registry.opensuse.org/opensuse/postgres:17
:
postgres@3104b2222fc9:/> psql --version
psql (PostgreSQL) 17rc1
postgres@3104b2222fc9:/> psql --list -E
/******** QUERY *********/
SELECT
d.datname as "Name",
pg_catalog.pg_get_userbyid(d.datdba) as "Owner",
pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding",
CASE d.datlocprovider WHEN 'b' THEN 'builtin' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS "Locale Provider",
d.datcollate as "Collate",
d.datctype as "Ctype",
d.datlocale as "Locale", # <-------------------------
d.daticurules as "ICU Rules",
CASE WHEN pg_catalog.cardinality(d.datacl) = 0 THEN '(none)' ELSE pg_catalog.array_to_string(d.datacl, E'\n') END AS "Access privileges"
FROM pg_catalog.pg_database d
ORDER BY 1;
So that changed between versions, like you guessed.
Since it's not our code, but code generated by psql, we need to make sure both the server and the psql client are running on the same version.
It's a bit hard to find out which version we have because the image pulling only tells us a sha, for example in https://github.com/os-autoinst/openQA/actions/runs/11071019461/job/30761910367?pr=5954 it's
2024-09-27T12:56:41.5810503Z e5ff51eeda62 Pull complete
2024-09-27T12:56:41.5873499Z db Pulled
and from the zypper command elsewhere I see postgresql16 postgresql16-server
.
I wasn't able to find the e5ff51eeda62 sha anywhere on the suse registry or docker hub. But from the symptom we're running a 16 client and a 17 server.
I found the reverse problem mentioned here:
https://www.postgresql.org/message-id/f80433139cdac73dd65c6c8841ebe70f007891f7.camel%40j-davis.com
Updated by tinita 3 months ago
- Status changed from New to Feedback
- Assignee set to tinita
- Priority changed from Urgent to High
https://github.com/os-autoinst/openQA/pull/5957 ci: Set postgres container version to 16 to ensure compatibility
Updated by okurz 3 months ago
I assume that problem can generally always happen on major releases of PostgreSQL. Maybe we better use the client from the database server container to be sure to always use a consistent version. Or instead of installing a psql client using zypper we should also use an upstream psql client container
Updated by tinita 3 months ago
https://github.com/os-autoinst/openQA/pull/5957 merged.
https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/212 Set postgres container version to 16 to ensure compatibility
Updated by tinita 3 months ago
okurz wrote in #note-7:
Maybe we better use the client from the database server container to be sure to always use a consistent version. Or instead of installing a psql client using zypper we should also use an upstream psql client container
The problem is that the psql --list
is executed inside of the webui container, as part of the entrypoint.
We would have to ssh into the postgres container instead. For that we need sshd.
Updated by tinita 3 months ago
- Status changed from Feedback to Workable
- Assignee deleted (
tinita)
okurz wrote in #note-11:
Sounds good. Also why should we need ssh? It's a local container, isn't it? So e.g.
podman exec …
should work
So you mean running podman from within a webui docker container?
Maybe that whole check can be improved; the docker-compose setup seems quite complicated and was done to avoid that multiple containers accessing the database would try to create the openqa database at the same time, if I get it right.
The checking might be a bit easier in the openqa-in-openqa tests, but we need to think about both cases.
Unassigning myself as it's not urgent anymore and maybe there are better ideas.
Updated by tinita 3 months ago
- Status changed from Workable to New
Just to give some starting points for others not yet familiar:
In distri-openqa we call:
https://github.com/os-autoinst/os-autoinst-distri-openQA/blob/261b2a4f69502e0d26a91817100548ca98460c96/tests/containers/single_container_webui.pm#L11
In our ci we call docker compose
:
https://github.com/os-autoinst/openQA/blob/master/container/webui/docker-compose.yaml#L120
So either we call run_openqa.sh explicitly or we run the default entrypoint:
https://github.com/os-autoinst/openQA/blob/master/container/webui/Dockerfile#L54
And in
https://github.com/os-autoinst/openQA/blob/master/container/webui/run_openqa.sh#L6
we call psql --list
to wait for the openqa database to be present:
function wait_for_db_creation() {
echo "Waiting for DB creation"
while ! su geekotest -c 'PGPASSWORD=openqa psql -h db -U openqa --list | grep -qe openqa'; do sleep .1; done
}
Setting to new as it wasn't estimated yet.
Updated by livdywan 3 months ago
- Subject changed from test fails in multiple_container_webui - psql error: column d.daticulocale does not exist to test fails in multiple_container_webui - psql error: column d.daticulocale does not exist size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by jbaier_cz 3 months ago
The docker healthcheck should make sure the db is up and running before the webui container is even started: https://github.com/os-autoinst/openQA/pull/5976
Updated by okurz 3 months ago
- Status changed from In Progress to Resolved
https://github.com/os-autoinst/openQA/pull/5976 merged. As the change is already passing CI tests in the PR we shouldn't need to wait more.
Updated by jbaier_cz 3 months ago
Yeah... but we still should at least remove the workaround: https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/213
Updated by okurz 3 months ago
right. Great that you are more diligent than me :) https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/213 merged.