action #60926
closedpostgres error "duplicate key value violates unique constraint "screenshots_filename", Key (filename)=(8ca/3c9/98a00d8bb2ccba5a2de1d403b5.png) already exists. INSERT INTO screenshots …"
0%
Description
Observation¶
There is an error message repeated multiple times per second repeatedly in postgres error logs, first occurence of the big repetition on osd in postgresql-2019-10-23_000000.log:
2019-10-23 08:29:38.235 CEST openqa geekotest [30552]ERROR: duplicate key value violates unique constraint "screenshots_filename"
2019-10-23 08:29:38.235 CEST openqa geekotest [30552]DETAIL: Key (filename)=(68c/0ee/3ceb7071b7b9364aa721b3421e.png) already exists.
2019-10-23 08:29:38.235 CEST openqa geekotest [30552]STATEMENT: INSERT INTO screenshots ( filename, t_created) VALUES ( $1, $2 ) RETURNING id
on o3 grep 'duplicate key value violates unique constraint "screenshots_filename"' * | less
shows that the error was in already in the oldest still preserved log files, that is postgresql-2019-09-08_071605.log currently but also since 2019-10-17 03:00 it appears way more often. As the issue repeats since 03:00 I assume an openQA change to cause this, not os-autoinst or openQA-worker which we both install 30m later on workers (I think).
The diff of openQA that was installed from /var/log/zypp/history:
2019-10-16 03:00:38|install|openQA|4.6.1571122761.67cc75da9-lp151.1919.1|noarch||devel_openQA|f8e591a5a2176268263689cc99a5e95c3421187c30c3ed368b8e9a02349f1713|
2019-10-17 03:02:01|install|openQA|4.6.1571253176.1a322744e-lp151.1926.1|noarch||devel_openQA|84a559884b250b25372631df6fc542c8f5318b2ed69dcb99a8d501f71befcf4d|
so 67cc75da9..1a322744e with the git log:
$ git log1 --no-merges 67cc75da9..1a322744e
916c45f5c (kraih/local_postgres) PostgreSQL errors can be localized, so just use the name of the unique constraint
ce83ab943 worker: Do not treat reconnect attempts as errors but with warning only
8811ad46c Remove wrong error handling code when sending ws messages
7506e0ae4 Prevent potential deadlocks in scheduler and ws server
81d318dd5 Hide old job templates editor for new groups
3172996fd (kraih/screenshots_resultset) Handle unique constraint correctly
51967db7f Add missing resultset for screenshots and make a few small optimizations
94afcda64 Drop -v flag on test runs and avoid noisy job "name"
e7c3f3cff clone job: Support specifying a port in host URL
A commit named "Handle unique constraint correctly" looks very suspicious in this regard :)
I blame https://github.com/os-autoinst/openQA/pull/2406 therefore.
Updated by okurz almost 5 years ago
- Copied from action #60923: [alert] /srv about to run full, postgres logs very big due to repeated error "duplicate key value violates unique constraint "screenshots_filename", Key (filename)=(8ca/3c9/98a00d8bb2ccba5a2de1d403b5.png) already exists. INSERT INTO screenshots …" added
Updated by coolo almost 5 years ago
- Priority changed from Urgent to Normal
- Target version changed from Current Sprint to Ready
Not urgent at all - it's just postgresql logs. we catch the error
Updated by coolo almost 5 years ago
it's not the only constraint filling the logs btw
Updated by okurz almost 5 years ago
coolo wrote:
Not urgent at all - it's just postgresql logs. we catch the error
I was expecting this from you and this is one more reason why I created this ticket as a copy of the infrastructure one. It's a valid opinion to consider it less important. I compare it to global warming: Just because you did not do anything about it yet does not make it less urgent though ;) For tickets with "Normal" priority I am working with the expectation of "months" until it is done. This means just more work for us on the admin side to handle the huge logs. A proper should help with the problem at the source and maybe even helps with performance: Multiple log messages per second also have some I/O impact.
Updated by kraih almost 5 years ago
- Priority changed from Normal to Low
The code in question should be here. And i'm not even sure there is much we can do with DBIx::Class
, that error is expected and handled correctly on the openQA side.
Updated by okurz almost 5 years ago
Do you have a suggestion how we handle the error filling up GBs of logs on servers then?
I am not sure you both understood the implications when setting the priority to Normal/Low. If we do not do anything about this error then either postgresql logs will grow to a large steady state (I estimate 200GB) or partitions will run full. My time expectations for "Low" tickets is "years". I don't think we have that long, rather weeks.
Updated by mkittler almost 5 years ago
- Priority changed from Low to High
That there's not obvious solution doesn't mean the prio is low. I think we should handle this case correctly and without flooding the logs. It is not a good idea to wait with that until the production instance runs out of disk space.
Maybe we can use manual SQL using UPSERT
here? We don't use the DBIx object here for anything else but the ID.
it's not the only constraint filling the logs btw
That's likely true. There are also still places in the code which are racy. If we handle more of them in the way @kraih pointed out this problem will be even more notable. That actually calls for a more general solution. It would be nice if we could turn off logging for these errors specifically.
Updated by kraih almost 5 years ago
I'm not sure, will check with DBIx::Class
folks if they have a better way to handle this. If we used raw SQL we could just append a ON CONFLICT screenshots_filename DO NOTHING
to the insert query.
Updated by kraih almost 5 years ago
Maybe we can use manual SQL using
UPSERT
here? We don't use the DBIx object here for anything else but the ID.
Maybe not on this one, but i remember working on another of our unique constraints and it triggered a lot of Perl code on insert in the related Result classes.
Updated by mkittler almost 5 years ago
I know, in some cases we do too much DBIx magic which would be annoying to do manually. Here we are likely fine of course we should check before just replacing the code with raw SQL. But maybe DBIx even allows one to mix in custom SQL like ON CONFLICT screenshots_filename DO NOTHING
.
Updated by okurz almost 5 years ago
- Status changed from New to In Progress
- Assignee set to okurz
- Target version changed from Ready to Current Sprint
I will try out something.
Updated by kraih almost 5 years ago
Maybe we can use manual SQL using
UPSERT
here? We don't use the DBIx object here for anything else but the ID.Maybe not on this one, but i remember working on another of our unique constraints and it triggered a lot of Perl code on insert in the related Result classes.
I think this was the one i remembered.
Updated by kraih almost 5 years ago
I know, in some cases we do too much DBIx magic which would be annoying to do manually. Here we are likely fine of course we should check before just replacing the code with raw SQL. But maybe DBIx even allows one to mix in custom SQL like
ON CONFLICT screenshots_filename DO NOTHING
.
I looked through the code a bit. On the SQL::Abstract
layer it seems possible by exploiting the returning
feature, but in DBIx::Class
you have to go down all the way to the storage layer to reach the SQL generating part. So i don't think it is possible without monkey patching DBIx::Class
(but will double check).
Updated by okurz almost 5 years ago
Updated by okurz almost 5 years ago
- Status changed from In Progress to Resolved
merged and deployed to o3 immediately. No more errors in postgres log about the screenshot insert conflict.