Project

General

Profile

Actions

action #160814

open

coordination #160808: [epic] BCI image stat collection

Create dedicated BCI database for image stats

Added by ph03nix about 1 month ago. Updated 4 days ago.

Status:
Workable
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
2024-05-23
Due date:
% Done:

0%

Estimated time:

Description

Currently we collect all image stats in the same sizemon database. We should create a new "bci" database for BCI image stats to avoid overusage of a single database

Acceptance criteria

  • Create new "bci" database on the k2 host
  • Update the grafana dashboard to use this database instead of "sizemon"
  • Update the openQA jobs to push the container stats to "bci" and not to "sizemon"
  • Update the k2 documentation to include the new schema

References


Files

SLES-15-SP5-x86_64-containers_qcow2.png (37.8 KB) SLES-15-SP5-x86_64-containers_qcow2.png curl to k2 postgress db alternated pass-fail mdati, 2024-06-25 17:51

Related issues 1 (1 open0 closed)

Blocks Containers - action #160817: Collect BCI package statsBlocked2024-05-23

Actions
Actions #1

Updated by ph03nix about 1 month ago

Actions #2

Updated by mdati about 1 month ago

  • Assignee set to mdati
Actions #3

Updated by ph03nix about 1 month ago

FYI, you can connect to the postgresql database on k2 via

podman exec -ti -u postgres img-size-db psql --user sizemon
Actions #4

Updated by ph03nix about 1 month ago

  • Description updated (diff)
Actions #5

Updated by mdati about 1 month ago · Edited

The openqa tests actually pushing data to this POSTGRES URL http://k2.qe.suse.de:8080/size are all yaml jeos and containers/[latest | bci/[15sp*|app|lang]]_images.yaml, but only for podman runtime on SLE-15-SP5 host, enabled by IMAGE_STORE_DATA=1.

The routines to push or check the data are in os-autoinst-distri-opensuse/lib/db_utils.pm: push_image_data_to_db and check_postgres_db.

A simple check of db-alive, can be done from any host (in VPN): curl -ILf http://k2.qe.suse.de:8080/size

Noted that, while working yesterday on it, running the check N times, I had always (regular) alternated ok then no reply from host, therefore suggested to also add a timeout, in loop, to improve check, for curl option -m 10 i.e.:
curl -ILf -m 10 http://k2.qe.suse.de:8080/size.

More investigation on this behavior would be useful, too.

See on this, Felix's PR https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/19404

Actions #6

Updated by mdati 30 days ago · Edited

Noted that:

while running the connection to k2 server on port 8080, like:
curl -I -m 10 http://k2.qe.suse.de:8080/size
regularly: first run reply is ok, second next no-reply;

on k2:/var/log/nginx/ we see corresponding the messages 1-2 in the logs:

a) img-mon.access

...
2024/05/31 12:32:53 [info] 1247#1247: *449 client 10.149.209.54 closed keepalive connection 
2024/05/31 12:33:06 [info] 1247#1247: *451 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 10.149.209.54, server: image-monitoring.suse.de, request: "HEAD /size HTTP/1.1", upstream: "http://[::1]:5444/size", host: "k2.qe.suse.de:8080"

b) img-mon.a.error:

...
10.149.209.54 - - [31/May/2024:13:01:46 +0000] "HEAD /size HTTP/1.1" 200 0 "-" "curl/8.7.1" 
10.149.209.54 - - [31/May/2024:13:01:59 +0000] "HEAD /size HTTP/1.1" 499 0 "-" "curl/8.7.1"
...

always alternating results.

Looking in the log (a) the port 5444, if we run the same connections on that port http://k2.qe.suse.de:5444/size, the result is always ok, but no log written in a) b).

Actions #7

Updated by mdati 16 days ago

Work someway delayed for other urgencies

Actions #8

Updated by mdati 5 days ago · Edited

Referring to https://progress.opensuse.org/issues/160814#note-6. the port 8080 of Progress db still has alternated reply issue when queried or pushed: 1 pass, then 1 fail, in loop, and this affects jobs push-to-db to fail when somewhere else the push passed ok.

See i.e. results in this job list [main test pass, only postgress db url is the issue here]:

FAIL 25/6 h18:47 https://openqa.suse.de/tests/14729250#step/bci_version_check/35
PASS 24/6 h22:47 https://openqa.suse.de/tests/14720942#step/bci_version_check/34

PASS 25/6 h18:22 https://openqa.suse.de/tests/14729240#step/bci_version_check/34
FAIL 24/5 h22.21 https://openqa.suse.de/tests/14720932#step/bci_version_check/35

FAIL 24/6 h23:31 https://openqa.suse.de/tests/14721268#step/bci_version_check/35
FAIL 24/6 h11.39 https://openqa.suse.de/tests/14716407#step/bci_version_check/35

FAIL 07/6 h14 https://openqa.suse.de/tests/14554346#step/bci_version_check/35
PASS 12/6 h18.30 https://openqa.suse.de/tests/14582125#step/bci_version_check/34

Note: all such tests have set POSTGRES_PORT=8080, but test code default port is 5444.

Then I checked url http://k2.qe.suse.de:8080/size from my local computer in vpn, I had alternated FAIL / OK (used opt. -I for simple header reply):

date; curl -m5 -I  http://k2.qe.suse.de:8080/size;date
Tue 25 Jun 19:35:39 CEST 2024
curl: (28) Operation timed out after 5001 milliseconds with 0 bytes received
Tue 25 Jun 19:35:44 CEST 2024

mdati@susepc21ktr9:~[] date; curl -m5 -I  http://k2.qe.suse.de:8080/size;date
Tue 25 Jun 19:35:45 CEST 2024
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Tue, 25 Jun 2024 17:35:45 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-18811/*
Content-Location: /size

Tue 25 Jun 19:35:46 CEST 2024

mdati@susepc21ktr9:~[] date; curl -m5 -I  http://k2.qe.suse.de:8080/size;date
Tue 25 Jun 19:35:47 CEST 2024
curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
Tue 25 Jun 19:35:52 CEST 2024
...

Then I checked a job test, paused and ran same curl sequence from VNC terminal: same problem, see attached screenshot

On browsers too, same behavior.

Actions #9

Updated by mdati 4 days ago

  • Assignee deleted (mdati)
Actions #10

Updated by ph03nix 4 days ago

I would nail the issue down to IPv6. Using IPv4 I could query the instance in 100/100 times. For IPv6, every second request fails.

phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Wed, 26 Jun 2024 12:49:11 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-99/*
Content-Location: /size

phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
^C
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Wed, 26 Jun 2024 12:49:35 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-99/*
Content-Location: /size

phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
^C
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Wed, 26 Jun 2024 12:50:30 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-99/*
Content-Location: /size

phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
^C
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Wed, 26 Jun 2024 12:51:28 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-99/*
Content-Location: /size

phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
^C
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Wed, 26 Jun 2024 12:51:30 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-99/*
Content-Location: /size

phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
^C
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Wed, 26 Jun 2024 12:51:33 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-99/*
Content-Location: /size
Actions #11

Updated by ph03nix 4 days ago

I now also see the issue with IPv4.

Actions

Also available in: Atom PDF