tickets #121993: unable to process held_messages (reject, discard) in mailman - openSUSE admin - openSUSE Project Management Tool

Actions

Copy link

tickets #121993

closed

unable to process held_messages (reject, discard) in mailman

Added by lkocman about 2 years ago. Updated over 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

pjessen

Category:

Mailing lists

Target version:

Start date:

2022-12-14

Due date:

2022-12-22

% Done:

100%

Estimated time:

Description

Hello team

this has been situation for past two weeks or similar
I'm the person handling any potential source-dvd requests for openSUSE Lea
https://en.opensuse.org/Source_code

For past two or three weeks I was not able reject nor discard numerous spam messages that we're getting on daily basis.
That increases a chance that I could eventually miss a valid request.
https://lists.opensuse.org/manage/lists/sourcedvd.lists.opensuse.org/held_messages

I've tried both discard and reject, single or many, however, any action leads to ~30 sec waiting and ends with 502 bad gateway request.

Could you please look into this?

Files

502.png (17.8 KB) 502.png

lkocman, 2022-12-14 12:08

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by pjessen about 2 years ago

Is duplicate of tickets #116084: lists.opensuse.org / mailing list web archive - timeouts, sluggishness, nonresponsive, nginx timeout etc added

Actions

Copy link

Updated by pjessen about 2 years ago

Private changed from Yes to No

This issue is well known.

Actions

Copy link

Updated by pjessen about 2 years ago

Looking at /var/log/nginx/error.logs, I see e.g.

/var/log/nginx/error.log-20221014.xz:2022/10/13 13:43:35 [error] 2062#2062: *624197 client intended to send too large body: 5071396 bytes, client: 127.0.0.1, server: lists.opensuse.org, request: "POST /archives/api/mailman/archive HTTP/1.1", host: "localhost"

This started on 14 October and has been going on ever since. Was some limit reset? As far as I can tell, we have "client_max_body_size 400M;", but that obviously does not work, somehow.

Actions

Copy link

Updated by pjessen about 2 years ago

I also messages like this (when trying to discard a message):

2022/12/14 13:05:36 [error] 8935#8935: *5233 upstream prematurely closed connection while reading response header from upstream, client: 2a03:7520:4c68:1:ff99:ffff:0:98fc, server: lists.opensuse.org, request: "POST /manage/lists/sourcedvd.lists.opensuse.org/held_messages HTTP/1.1", upstream: "http://127.0.0.1:8000/manage/lists/sourcedvd.lists.opensuse.org/held_messages", host: "lists.opensuse.org", referrer: "https://lists.opensuse.org/manage/lists/sourcedvd.lists.opensuse.org/held_messages"

Upstream being http://127.0.0.1:8000 - that is gunicorn.

In /var/log/postorius/gunicorn.log, I see numerous "[CRITICAL] WORKER TIMEOUT".

Actions

Copy link

Updated by pjessen about 2 years ago

pjessen wrote:

This started on 14 October and has been going on ever since. Was some limit reset? As far as I can tell, we have "client_max_body_size 400M;", but that obviously does not work, somehow.

I have reduced to "client_max_body_size 10M;" and this seems to work ??

Actions

Copy link

Updated by pjessen about 2 years ago

Have changed the gunicorn timeout to 0, over the commandline, with mailman-web.service.d/timeout.conf. Instead of a 502, I'm now getting a 504, which suggests it it nginx timing out.

Actions

Copy link

Updated by pjessen about 2 years ago

By default nginx has a 60 second timeout - I have added this to the http{} section in /etc/nginx/nginx.conf:

proxy_send_timeout          600;
proxy_read_timeout          600;
send_timeout                600;

This does the trick - why discarding a message takes about 90secs, I have no idea.

Actions

Copy link

Updated by pjessen about 2 years ago

Due date set to 2022-12-22
Status changed from New to In Progress
Assignee set to pjessen
Priority changed from High to Normal
% Done changed from 0 to 30

Three changes:

nginx - /etc/nginx/vhosts.d/lists.opensuse.org.conf - client_max_body_size 10M; I don't see any reason why this should work any better than the previous client_max_body_size 400M;
gunicorn - mailman-web.service.d/timeout.conf - --timeout=0
nginx - /etc/nginx/nginx.conf - the three timeouts as above.

I'll leave this for now and review next week.

Actions

Copy link

Updated by pjessen about 2 years ago

pjessen wrote:

I'll leave this for now and review next week.

So far it is looking good. The "worker timeout" messages have gone from the gunicorn log, but there are still some "upstream prematurely closed connection" messages in the nginx errorlog, interestingly all on exports of archives in mailbox format.

Actions

Copy link

#11

Updated by pjessen about 2 years ago

luc14n0 wrote:

Use MySQL for redirection based on table look-ups, instead of Nginx.

I don't see nginx and the huge redirection maps as being the culprit here, but it ought to be easy to prove/disprove.
Just a little earlier, I wanted to remove some non-members addresses from users.lists:

After hitting "delete", it took 2min16sec for the Confirm page to appear
after hitting "Confirm", it took 2m12sec to return to the list page.

Actions

Copy link

#12

Updated by pjessen about 2 years ago

pjessen wrote:

luc14n0 wrote:

Use MySQL for redirection based on table look-ups, instead of Nginx.

I don't see nginx and the huge redirection maps as being the culprit here, but it ought to be easy to prove/disprove.

I removed the include for thee mails redirects, almost 3million lines, did an "nginx reload" and tried to remove a non-member again:

After hitting "delete", it took 2min8sec for the Confirm page to appear.
after hitting "Confirm", it took 2min7sec to return to the list page.

Actions

Copy link

#13

Updated by pjessen over 1 year ago

luc14n0 wrote:

Use MySQL for redirection based on table look-ups, instead of Nginx.

FWIW, I wrote a small proxy daemon for that, see #101842. It reduced the memory footprint, but did nothing for this issue. Discarding a held message currently takes 3 minutes and 15 seconds.

Actions

Copy link

#14

Updated by pjessen over 1 year ago

Related to tickets #129463: mailman3 - the admin-auto list has 10000 held messages added

Actions

Copy link

#15

Updated by pjessen over 1 year ago

Status changed from In Progress to Resolved
% Done changed from 30 to 100

It looks like all that was needed was a clean-up - with too many held messages, processing time increased, three minutes and more.
It is clearly a poor design when such a relatively small amount of data can affect the processing in this way. Even with 1000 held messages, it was very noticeable - once I had the total down to double-digits, the UI was responding in single-digit seconds.
I know, I know - I ought to open a bug with the mailman3 project .....

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

openSUSE admin

Tags

Custom queries

tickets #121993

unable to process held_messages (reject, discard) in mailman

Updated by pjessen about 2 years ago

Updated by pjessen about 2 years ago

Updated by pjessen about 2 years ago

Updated by pjessen about 2 years ago

Updated by pjessen about 2 years ago

Updated by pjessen about 2 years ago

Updated by pjessen about 2 years ago

Updated by pjessen about 2 years ago

Updated by pjessen about 2 years ago

Updated by pjessen about 2 years ago

Updated by pjessen about 2 years ago

Updated by pjessen over 1 year ago

Updated by pjessen over 1 year ago

Updated by pjessen over 1 year ago