action #160814
opencoordination #160808: [epic] BCI image stat collection
Create dedicated BCI database for image stats
0%
Description
Currently we collect all image stats in the same sizemon
database. We should create a new "bci" database for BCI image stats to avoid overusage of a single database
Acceptance criteria¶
- Create new "bci" database on the k2 host
- Update the grafana dashboard to use this database instead of "sizemon"
- Update the openQA jobs to push the container stats to "bci" and not to "sizemon"
- Update the k2 documentation to include the new schema
References¶
Files
Updated by ph03nix about 1 month ago
- Blocks action #160817: Collect BCI package stats added
Updated by ph03nix about 1 month ago
FYI, you can connect to the postgresql database on k2 via
podman exec -ti -u postgres img-size-db psql --user sizemon
Updated by mdati about 1 month ago · Edited
The openqa tests actually pushing data to this POSTGRES URL http://k2.qe.suse.de:8080/size
are all yaml jeos
and containers/[latest | bci/[15sp*|app|lang]]_images.yaml
, but only for podman runtime on SLE-15-SP5 host, enabled by IMAGE_STORE_DATA=1
.
The routines to push or check the data are in os-autoinst-distri-opensuse/lib/db_utils.pm
: push_image_data_to_db
and check_postgres_db
.
A simple check of db-alive, can be done from any host (in VPN): curl -ILf http://k2.qe.suse.de:8080/size
Noted that, while working yesterday on it, running the check N times, I had always (regular) alternated ok
then no reply
from host, therefore suggested to also add a timeout, in loop, to improve check, for curl option -m 10
i.e.:
curl -ILf -m 10 http://k2.qe.suse.de:8080/size
.
More investigation on this behavior would be useful, too.
See on this, Felix's PR https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/19404
Updated by mdati 30 days ago · Edited
Noted that:
while running the connection to k2 server on port 8080, like:
curl -I -m 10 http://k2.qe.suse.de:8080/size
regularly: first run reply is ok, second next no-reply;
on k2:/var/log/nginx/ we see corresponding the messages 1-2 in the logs:
a) img-mon.access
...
2024/05/31 12:32:53 [info] 1247#1247: *449 client 10.149.209.54 closed keepalive connection
2024/05/31 12:33:06 [info] 1247#1247: *451 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 10.149.209.54, server: image-monitoring.suse.de, request: "HEAD /size HTTP/1.1", upstream: "http://[::1]:5444/size", host: "k2.qe.suse.de:8080"
b) img-mon.a.error:
...
10.149.209.54 - - [31/May/2024:13:01:46 +0000] "HEAD /size HTTP/1.1" 200 0 "-" "curl/8.7.1"
10.149.209.54 - - [31/May/2024:13:01:59 +0000] "HEAD /size HTTP/1.1" 499 0 "-" "curl/8.7.1"
...
always alternating results.
Looking in the log (a) the port 5444, if we run the same connections on that port http://k2.qe.suse.de:5444/size
, the result is always ok, but no log written in a) b).
Updated by mdati 5 days ago · Edited
Referring to https://progress.opensuse.org/issues/160814#note-6. the port 8080 of Progress db still has alternated reply issue when queried or pushed: 1 pass, then 1 fail, in loop, and this affects jobs push-to-db to fail when somewhere else the push passed ok.
See i.e. results in this job list [main test pass, only postgress db url is the issue here]:
FAIL 25/6 h18:47 https://openqa.suse.de/tests/14729250#step/bci_version_check/35
PASS 24/6 h22:47 https://openqa.suse.de/tests/14720942#step/bci_version_check/34
PASS 25/6 h18:22 https://openqa.suse.de/tests/14729240#step/bci_version_check/34
FAIL 24/5 h22.21 https://openqa.suse.de/tests/14720932#step/bci_version_check/35
FAIL 24/6 h23:31 https://openqa.suse.de/tests/14721268#step/bci_version_check/35
FAIL 24/6 h11.39 https://openqa.suse.de/tests/14716407#step/bci_version_check/35
FAIL 07/6 h14 https://openqa.suse.de/tests/14554346#step/bci_version_check/35
PASS 12/6 h18.30 https://openqa.suse.de/tests/14582125#step/bci_version_check/34
Note: all such tests have set POSTGRES_PORT=8080
, but test code default port is 5444.
Then I checked url http://k2.qe.suse.de:8080/size from my local computer in vpn, I had alternated FAIL / OK (used opt. -I for simple header reply):
date; curl -m5 -I http://k2.qe.suse.de:8080/size;date
Tue 25 Jun 19:35:39 CEST 2024
curl: (28) Operation timed out after 5001 milliseconds with 0 bytes received
Tue 25 Jun 19:35:44 CEST 2024
mdati@susepc21ktr9:~[] date; curl -m5 -I http://k2.qe.suse.de:8080/size;date
Tue 25 Jun 19:35:45 CEST 2024
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Tue, 25 Jun 2024 17:35:45 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-18811/*
Content-Location: /size
Tue 25 Jun 19:35:46 CEST 2024
mdati@susepc21ktr9:~[] date; curl -m5 -I http://k2.qe.suse.de:8080/size;date
Tue 25 Jun 19:35:47 CEST 2024
curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
Tue 25 Jun 19:35:52 CEST 2024
...
Then I checked a job test, paused and ran same curl sequence from VNC terminal: same problem, see attached screenshot
On browsers too, same behavior.
Updated by ph03nix 4 days ago
I would nail the issue down to IPv6. Using IPv4 I could query the instance in 100/100 times. For IPv6, every second request fails.
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Wed, 26 Jun 2024 12:49:11 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-99/*
Content-Location: /size
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
^C
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Wed, 26 Jun 2024 12:49:35 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-99/*
Content-Location: /size
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
^C
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Wed, 26 Jun 2024 12:50:30 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-99/*
Content-Location: /size
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
^C
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Wed, 26 Jun 2024 12:51:28 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-99/*
Content-Location: /size
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
^C
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Wed, 26 Jun 2024 12:51:30 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-99/*
Content-Location: /size
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
^C
phoenix@racetrack-7290:~> curl -6 -I http://k2.qe.suse.de:8080/size
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Wed, 26 Jun 2024 12:51:33 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Range: 0-99/*
Content-Location: /size