Project

General

Profile

Actions

action #95374

closed

All pull request openQA CI checks fail now in "webui-docker-compose": "Container is unhealthy" size:M

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-07-12
Due date:
2021-07-27
% Done:

0%

Estimated time:

Description

Observation

All pull request openQA CI checks fail now in "webui-docker-compose", e.g. see
https://github.com/os-autoinst/openQA/pull/4032/checks?check_run_id=3044565112#step:3:2083

ERROR: for webui  Container "3dc2c3dbe671" is unhealthy.
Encountered errors while bringing up the project.

Error executing a job

Steps to reproduce

DONE: TBC if this works locally -> reproducible, confirmed by ilausuch

make test-containers-compose

or find failed examples in any open openQA pull requests

Expected result

Problem

The "last good" PR seems to be https://github.com/os-autoinst/openQA/pull/3987 . Maybe that introduced a problem that after merge our package in devel:openQA was built and only then all subsequent tests were impacted in a harmful way

Suggestions

  • DONE: Try to reproduce locally -> ilausuch confirmed that it is
  • If not, then reproduce in CI
  • Try out if reverting https://github.com/os-autoinst/openQA/pull/3987 helps. If it does not, then try to use an older version of packages as baseline in the docker compose test if this is applicable at all

Impact

All PR checks fail hence blocking clean merges of new changes.


Related issues 3 (2 open1 closed)

Related to openQA Project (public) - action #92092: containers: openQA test eventually fails because of timeoutsNew2021-05-03

Actions
Related to openQA Project (public) - action #95296: openQA-in-openQA container tests fail with "/root/run_openqa.sh: line 6: This: command not found"Resolveddheidler2021-07-09

Actions
Copied to openQA Project (public) - action #95437: The "webui-docker-compose" CI check should fail if the package is impacted by the PR itself in a harmful wayNew

Actions
Actions #1

Updated by okurz over 3 years ago

  • Related to action #92092: containers: openQA test eventually fails because of timeouts added
Actions #2

Updated by ilausuch over 3 years ago

  • Status changed from New to In Progress
  • Assignee set to ilausuch
Actions #3

Updated by okurz over 3 years ago

  • Subject changed from All pull request openQA CI checks fail now in "webui-docker-compose": "Container is unhealthy" to All pull request openQA CI checks fail now in "webui-docker-compose": "Container is unhealthy" size:M
  • Description updated (diff)
Actions #4

Updated by okurz over 3 years ago

As dcermak is unavailable for some time and because I suspect that this ticket might be caused by https://github.com/os-autoinst/openQA/pull/3987 as well I merged the revert PR https://github.com/os-autoinst/openQA/pull/4033

Actions #5

Updated by ilausuch over 3 years ago

Reason of the failure

webui_db_init_1  | /root/run_openqa.sh: line 6: This: command not found
Actions #6

Updated by ilausuch over 3 years ago

Following the investigation. The line that is failing in the run_openqa.sh is

webui_db_init_1  | ++ su geekotest -c 'PGPASSWORD=openqa psql -h db -U openqa --list | grep -qe openqa'
webui_db_init_1  | + This account is currently not available.
webui_db_init_1  | /root/run_openqa.sh: line 7: This: command not found
webui_db_init_1  | + sleep .1
Actions #7

Updated by okurz over 3 years ago

  • Related to action #95296: openQA-in-openQA container tests fail with "/root/run_openqa.sh: line 6: This: command not found" added
Actions #8

Updated by okurz over 3 years ago

same as in #95296 then

Actions #9

Updated by ilausuch over 3 years ago

The problem is related with the gekotest user
I removed the execution of psql using the gekotest and it workded, but fails in the next execution using this user

webui_db_init_1  | + su geekotest -c /usr/share/openqa/script/openqa-webui-daemon
webui_db_init_1  | This account is currently not available.
Actions #10

Updated by ilausuch over 3 years ago

We cannot execute su on a non login user

webui_db_init_1  | geekotest:x:479:479:openQA user:/dev/null:/sbin/nologin
Actions #11

Updated by ilausuch over 3 years ago

I created this PR to extend the information that we have in case of docker-compose up failure
https://github.com/os-autoinst/openQA/pull/4038

Actions #12

Updated by okurz over 3 years ago

Actions #13

Updated by ilausuch over 3 years ago

Thanks,
It could help in the future
https://github.com/os-autoinst/openQA/pull/4039

Actions #14

Updated by okurz over 3 years ago

  • Priority changed from Urgent to Normal

multiple PRs were now merged after I retriggered some failed jobs supporting my hypothesis.

@ilausuch I consider the ticket done as we defined it originally. You can still have it and try to find improvements but we can definitely reduce prio now.

Actions #15

Updated by openqa_review over 3 years ago

  • Due date set to 2021-07-27

Setting due date based on mean cycle time of SUSE QE Tools

Actions #16

Updated by okurz over 3 years ago

  • Copied to action #95437: The "webui-docker-compose" CI check should fail if the package is impacted by the PR itself in a harmful way added
Actions #17

Updated by ilausuch over 3 years ago

  • Status changed from In Progress to Resolved

okurz wrote:

multiple PRs were now merged after I retriggered some failed jobs supporting my hypothesis.

@ilausuch I consider the ticket done as we defined it originally. You can still have it and try to find improvements but we can definitely reduce prio now.

Yes, let's close this ticket. Maybe in the future we'll have other situations but because of other reasons.

Actions

Also available in: Atom PDF