action #64096
closedpartition /srv was nearly depleted but now fixed (itself?)
0%
Description
Observation¶
Received an alert email notification in http://mailman.suse.de/mailman/private/osd-admins/2020-March/000958.html at Mon Mar 2 06:52:02 UTC 2020.
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?fullscreen&edit&tab=alert&panelId=74&orgId=1&from=1580815452576&to=1583211256414 shows that there is a rapid increase in space usage since 2020-02-17 until we hit the alert threshold. On 2020-03-02 22:00 suddenly the space usage was gone. Maybe a cleanup job took care. Still, this looks worrysome and should be investigated from logs on osd.
Updated by okurz about 5 years ago
- Related to action #60923: [alert] /srv about to run full, postgres logs very big due to repeated error "duplicate key value violates unique constraint "screenshots_filename", Key (filename)=(8ca/3c9/98a00d8bb2ccba5a2de1d403b5.png) already exists. INSERT INTO screenshots …" added
Updated by okurz about 5 years ago
This seems to also affect system journal log retention periods as I could not find a long history within journalctl -u logrotate
which impacted me trying to debug #62306
openqa:/srv # du --max-depth=3 -BG | grep -v '^[01]G'
5G ./log/journal/84f4f4f356b525388b60f0ae547597e0
5G ./log/journal
6G ./log
31G ./PSQL10/data/base
24G ./PSQL10/data/log
54G ./PSQL10/data
54G ./PSQL10
61G .
So again postgres logs growing big?
Logs are full with entries like:
2020-03-08 03:12:02.042 CET openqa geekotest [12783]ERROR: duplicate key value violates unique constraint "job_modules_job_id_name_category_script"
2020-03-08 03:12:02.042 CET openqa geekotest [12783]DETAIL: Key (job_id, name, category, script)=(3967446, pthread_barrier_init_3-1, kernel, tests/kernel/run_ltp.pm) already exists.
2020-03-08 03:12:02.042 CET openqa geekotest [12783]STATEMENT: INSERT INTO job_modules ( always_rollback, category, fatal, important, job_id, milestone, name, script, t_created, t_updated) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 ) RETURNING id
2020-03-08 03:12:02.045 CET openqa geekotest [12783]ERROR: duplicate key value violates unique constraint "job_modules_job_id_name_category_script"
2020-03-08 03:12:02.045 CET openqa geekotest [12783]DETAIL: Key (job_id, name, category, script)=(3967446, pthread_barrier_init_4-1, kernel, tests/kernel/run_ltp.pm) already exists.
2020-03-08 03:12:02.045 CET openqa geekotest [12783]STATEMENT: INSERT INTO job_modules ( always_rollback, category, fatal, important, job_id, milestone, name, script, t_created, t_updated) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 ) RETURNING id
2020-03-08 03:12:02.053 CET openqa geekotest [12783]ERROR: duplicate key value violates unique constraint "job_modules_job_id_name_category_script"
2020-03-08 03:12:02.053 CET openqa geekotest [12783]DETAIL: Key (job_id, name, category, script)=(3967446, pthread_barrier_wait_1-1, kernel, tests/kernel/run_ltp.pm) already exists.
2020-03-08 03:12:02.053 CET openqa geekotest [12783]STATEMENT: INSERT INTO job_modules ( always_rollback, category, fatal, important, job_id, milestone, name, script, t_created, t_updated) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 ) RETURNING id
2020-03-08 03:12:02.058 CET openqa geekotest [12783]ERROR: duplicate key value violates unique constraint "job_modules_job_id_name_category_script"
2020-03-08 03:12:02.058 CET openqa geekotest [12783]DETAIL: Key (job_id, name, category, script)=(3967446, pthread_barrier_wait_2-1, kernel, tests/kernel/run_ltp.pm) already exists.
2020-03-08 03:12:02.058 CET openqa geekotest [12783]STATEMENT: INSERT INTO job_modules ( always_rollback, category, fatal, important, job_id, milestone, name, script, t_created, t_updated) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 ) RETURNING id
which is the second most common entry already mentioned in #60923#note-3
Updated by okurz about 5 years ago
- Copied to action #64298: postgres error "duplicate key value violates unique constraint "job_modules_job_id_name_category_script" ... INSERT INTO job_modules" filling up postgres server log files quickly added
Updated by okurz about 5 years ago
- Status changed from Workable to Blocked
- Assignee set to okurz
- Priority changed from Urgent to Low
reported problem in #64298 , will see if there is any remaining alerts or if the postgres log rotation prevents /srv depletion.
Updated by okurz about 5 years ago
- Due date set to 2020-03-20
- Status changed from Blocked to Feedback
fix for #64298 merged and showing good effect on o3. Waiting for deployment on osd tomorrow.
Updated by okurz about 5 years ago
- Status changed from Feedback to Resolved