action #72139
closedopenQA services on OSD failed to connect to database
Start date:
Due date:
% Done:
Estimated time:
All openQA services which use the database showed connection errors. That's the first error logged by PostgreSQL:
2020-09-30 12:30:53.437 CEST openqa geekotest [7311]FATAL: remaining connection slots are reserved for non-replication superuser connections
From the openQA-side the errors look like:
Sep 30 12:47:45 openqa openqa[32459]: [error] [vJyMDc-a] DBIx::Class::Storage::DBI::catch {...} (): DBI Connection failed: DBI connect('dbname=openqa','geekotest',...) failed: FATAL: remaining connection slots are reserved for non-replication superuser connections at /usr/lib/perl5/vendor_perl/5.26.1/DBIx/Class/Storage/ line 1517. at /usr/share/openqa/script/../lib/OpenQA/ line 172
This lead to various alerts being triggered (Minion jobs alert, HTTP Response alert, Workers alert). A restart of the main openqa-webui
service and posgresql
service helped to fix the error. (Likely the restart of openqa-webui
was unnecessary considering the other services could restore themselves without a restart.)
I also retried the failed Minion jobs and all of them passed. So there shouldn't be any active warnings anymore.
The question is what caused the connection limit to be exceeded. Theoretically we have a fixed number of services using a fixed number of connections.