Project

General

Profile

Actions

action #167524

closed

test fails in multiple_container_webui - psql error: column d.daticulocale does not exist size:S

Added by tinita 4 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-09-27
Due date:
% Done:

0%

Estimated time:

Description

Observation

openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_from_containers@64bit-2G fails in
multiple_container_webui

Output:
Waiting for DB creation
psql: ERROR:  column d.daticulocale does not exist
LINE 8:   d.daticulocale as "ICU Locale",

The webui-docker-compose test in our CI runs into the same problem:
https://github.com/os-autoinst/openQA/actions/runs/11071019461/job/30761910367?pr=5954

Reproducible

Fails since (at least) Build :TW.31628

Expected result

Last good: :TW.31627 (or more recent)

Further details

Always latest result in this scenario: latest

Suggestions

Actions #1

Updated by okurz 4 months ago

  • Priority changed from High to Urgent
Actions #2

Updated by mkittler 4 months ago

  • Status changed from New to In Progress
  • Assignee set to mkittler
Actions #3

Updated by tinita 4 months ago

Looks like the webui-docker-compose failures are about the same problem:
https://github.com/os-autoinst/openQA/actions/runs/11071019461/job/30761910367?pr=5954

Actions #4

Updated by mkittler 4 months ago

  • Status changed from In Progress to New
  • Assignee deleted (mkittler)

Looks like this test setup is hard to reproduce locally but within openQA one cannot debug it nicely. I also tried the compose setup but it failed getting the PostgeSQL container:

Trying to pull registry.opensuse.org/postgres:latest...
Error: initializing source docker://registry.opensuse.org/postgres:latest: pinging container registry registry.opensuse.org: received unexpected HTTP status: 504 Gateway Time-out

Leading to further errors:

[webui-db-init] | Waiting for DB creation
[webui-db-init] | psql: error: could not translate host name "db" to address: Name or service not known

I have to go soon so I'll unassign for now.


Probably the PostgreSQL container we use here and in the compose setup has changed and we need to adapt our code. Considering I couldn't find any mentioning of daticulocale in our code the container might also just be completely broken.

Actions #5

Updated by tinita 4 months ago

The d.daticulocale comes from psql, apparently:
This is what I get for a registry.opensuse.org/opensuse/postgres:16:

podman run -it --rm --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d registry.opensuse.org/opensuse/postgres:16
...
% podman exec -it some-postgres bash
# su postgres
postgres@933967b14711:/> psql --version
psql (PostgreSQL) 16.4
postgres@933967b14711:/> psql --list -E
********* QUERY **********
SELECT
  d.datname as "Name",
  pg_catalog.pg_get_userbyid(d.datdba) as "Owner",
  pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding",
  CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS "Locale Provider",
  d.datcollate as "Collate",
  d.datctype as "Ctype",
  d.daticulocale as "ICU Locale",           # <-------------------------
  d.daticurules as "ICU Rules",
  pg_catalog.array_to_string(d.datacl, E'\n') AS "Access privileges"
FROM pg_catalog.pg_database d
ORDER BY 1;
...

and for registry.opensuse.org/opensuse/postgres:17:

postgres@3104b2222fc9:/> psql --version
psql (PostgreSQL) 17rc1
postgres@3104b2222fc9:/> psql --list -E
/******** QUERY *********/
SELECT
  d.datname as "Name",
  pg_catalog.pg_get_userbyid(d.datdba) as "Owner",
  pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding",
  CASE d.datlocprovider WHEN 'b' THEN 'builtin' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS "Locale Provider",
  d.datcollate as "Collate",
  d.datctype as "Ctype",
  d.datlocale as "Locale",           # <-------------------------
  d.daticurules as "ICU Rules",
  CASE WHEN pg_catalog.cardinality(d.datacl) = 0 THEN '(none)' ELSE pg_catalog.array_to_string(d.datacl, E'\n') END AS "Access privileges"
FROM pg_catalog.pg_database d
ORDER BY 1;

So that changed between versions, like you guessed.

Since it's not our code, but code generated by psql, we need to make sure both the server and the psql client are running on the same version.
It's a bit hard to find out which version we have because the image pulling only tells us a sha, for example in https://github.com/os-autoinst/openQA/actions/runs/11071019461/job/30761910367?pr=5954 it's

2024-09-27T12:56:41.5810503Z  e5ff51eeda62 Pull complete
2024-09-27T12:56:41.5873499Z  db Pulled

and from the zypper command elsewhere I see postgresql16 postgresql16-server.

I wasn't able to find the e5ff51eeda62 sha anywhere on the suse registry or docker hub. But from the symptom we're running a 16 client and a 17 server.

I found the reverse problem mentioned here:
https://www.postgresql.org/message-id/f80433139cdac73dd65c6c8841ebe70f007891f7.camel%40j-davis.com

Actions #6

Updated by tinita 4 months ago

  • Status changed from New to Feedback
  • Assignee set to tinita
  • Priority changed from Urgent to High

https://github.com/os-autoinst/openQA/pull/5957 ci: Set postgres container version to 16 to ensure compatibility

Actions #7

Updated by okurz 4 months ago

I assume that problem can generally always happen on major releases of PostgreSQL. Maybe we better use the client from the database server container to be sure to always use a consistent version. Or instead of installing a psql client using zypper we should also use an upstream psql client container

Actions #8

Updated by tinita 4 months ago

Actions #9

Updated by tinita 4 months ago

okurz wrote in #note-7:

Maybe we better use the client from the database server container to be sure to always use a consistent version. Or instead of installing a psql client using zypper we should also use an upstream psql client container

The problem is that the psql --list is executed inside of the webui container, as part of the entrypoint.

We would have to ssh into the postgres container instead. For that we need sshd.

Actions #10

Updated by tinita 4 months ago

But we could use a different check.
Instead of

psql -h db -U openqa --list | grep -qe openqa

we could

psql -h db -U openqa openqa -c 'select 1'
Actions #11

Updated by okurz 4 months ago

Sounds good. Also why should we need ssh? It's a local container, isn't it? So e.g. podman exec … should work

Actions #12

Updated by tinita 4 months ago

  • Status changed from Feedback to Workable
  • Assignee deleted (tinita)

okurz wrote in #note-11:

Sounds good. Also why should we need ssh? It's a local container, isn't it? So e.g. podman exec … should work

So you mean running podman from within a webui docker container?

Maybe that whole check can be improved; the docker-compose setup seems quite complicated and was done to avoid that multiple containers accessing the database would try to create the openqa database at the same time, if I get it right.
The checking might be a bit easier in the openqa-in-openqa tests, but we need to think about both cases.

Unassigning myself as it's not urgent anymore and maybe there are better ideas.

Actions #13

Updated by tinita 4 months ago

  • Status changed from Workable to New

Just to give some starting points for others not yet familiar:

In distri-openqa we call:
https://github.com/os-autoinst/os-autoinst-distri-openQA/blob/261b2a4f69502e0d26a91817100548ca98460c96/tests/containers/single_container_webui.pm#L11

In our ci we call docker compose:
https://github.com/os-autoinst/openQA/blob/master/container/webui/docker-compose.yaml#L120

So either we call run_openqa.sh explicitly or we run the default entrypoint:
https://github.com/os-autoinst/openQA/blob/master/container/webui/Dockerfile#L54

And in
https://github.com/os-autoinst/openQA/blob/master/container/webui/run_openqa.sh#L6
we call psql --list to wait for the openqa database to be present:

function wait_for_db_creation() {
    echo "Waiting for DB creation"
    while ! su geekotest -c 'PGPASSWORD=openqa psql -h db -U openqa --list | grep -qe openqa'; do sleep .1; done
}

Setting to new as it wasn't estimated yet.

Actions #14

Updated by tinita 4 months ago

  • Description updated (diff)
Actions #15

Updated by livdywan 4 months ago

  • Subject changed from test fails in multiple_container_webui - psql error: column d.daticulocale does not exist to test fails in multiple_container_webui - psql error: column d.daticulocale does not exist size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #16

Updated by jbaier_cz 4 months ago

  • Status changed from Workable to In Progress
  • Assignee set to jbaier_cz
Actions #17

Updated by jbaier_cz 4 months ago

The docker healthcheck should make sure the db is up and running before the webui container is even started: https://github.com/os-autoinst/openQA/pull/5976

Actions #18

Updated by okurz 4 months ago

  • Status changed from In Progress to Resolved

https://github.com/os-autoinst/openQA/pull/5976 merged. As the change is already passing CI tests in the PR we shouldn't need to wait more.

Actions #19

Updated by jbaier_cz 4 months ago

Yeah... but we still should at least remove the workaround: https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/213

Actions #20

Updated by okurz 4 months ago

right. Great that you are more diligent than me :) https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/213 merged.

Actions

Also available in: Atom PDF