tickets #104899: moderation of lists.opensuse.org very slow / timing out - openSUSE admin - openSUSE Project Management Tool

Actions

Copy link

tickets #104899

closed

moderation of lists.opensuse.org very slow / timing out

Added by meissner@suse.de about 3 years ago. Updated almost 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

pjessen

Category:

Mailing lists

Target version:

Start date:

2022-01-14

Due date:

% Done:

100%

Estimated time:

Description

hi,

when trying to moderating emails via the lists.opensuse.org interface
i only get timeouts, as if it runs into database locks / timeouts.

readonly is quick and fine, but any write operation is timing out.

Ciao, Marcus

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by pjessen about 3 years ago

Category set to Mailing lists
Private changed from Yes to No

Confirm, I saw this too in the last couple of days, when trying to discard messages from the moderation queue.

Actions

Copy link

Updated by pjessen almost 3 years ago

From /var/log/mailman/uwsgi-error.log:

Mon Jan 17 15:51:10 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (100/100) ***
Mon Jan 17 15:51:12 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (100/100) ***
Mon Jan 17 15:51:14 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (100/100) ***
Mon Jan 17 15:51:15 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (101/100) ***
Mon Jan 17 15:51:16 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (101/100) ***
Mon Jan 17 15:51:17 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (101/100) ***
Mon Jan 17 15:51:18 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (101/100) ***
Mon Jan 17 15:51:19 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (101/100) ***
Mon Jan 17 15:51:20 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (100/100) ***
Mon Jan 17 15:51:21 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (101/100) ***
Mon Jan 17 15:51:22 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (100/100) ***
Mon Jan 17 15:51:23 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (101/100) ***
Mon Jan 17 15:51:24 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (101/100) ***
Mon Jan 17 15:51:25 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (100/100) ***
Mon Jan 17 15:51:26 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (101/100) ***
Mon Jan 17 15:51:27 2022 - *** uWSGI listen queue of socket "0.0.0.0:8000" (fd: 8) full !!! (100/100) ***

Actions

Copy link

Updated by pjessen almost 3 years ago

FWIW, discarding 6 held messages from packaging.lists.o.o just took 1m35s.

Actions

Copy link

Updated by pjessen almost 3 years ago

Tuesday 18 Jan I added a robots.txt file to restrict some robots from accessing the archives. I had a hunch that too frequent archive access might cause performance issue. Granted, it's a shot in the dark.

/var/lib/mailman_webui/static/robots.txt :

User-Agent: AhrefsBot
Disallow: /archives/
User-Agent: YisouSpider
Disallow: /archives/
User-Agent: Bytespider
Disallow: /archives/
User-Agent: SeznamBot
Disallow: /archives/
User-Agent: SemrushBot
Disallow: /archives/
User-Agent: magpie-crawler
Disallow: /archives/
User-Agent: *
Crawl-Delay: 10

Actions

Copy link

Updated by msmeissn almost 3 years ago

it feels a bit faster. but it is still slow and i still run into timeouts

Actions

Copy link

Updated by pjessen almost 3 years ago

msmeissn wrote:

it feels a bit faster. but it is still slow and i still run into timeouts

Thanks for the feedback - that is my experience too. I have a feeling it is not the discarding of messages that is slow, but the re-fetching of the list afterwards. I have tried to spot any long-running database trx, but no luck.

Actions

Copy link

Updated by pjessen almost 3 years ago

pjessen wrote:

I have a feeling it is not the discarding of messages that is slow, but the re-fetching of the list afterwards.

Nah, ignore that. It simply takes 15-20seconds to discard one message. I have just timed it for 1-2-3-4.

Actions

Copy link

Updated by msmeissn almost 3 years ago

i would expect more database slowness or application slowness than overload by bots ... are there perhaps a lot (thousands) of emails in other moderation queues?

Actions

Copy link

Updated by pjessen almost 3 years ago

msmeissn wrote:

i would expect more database slowness or application slowness than overload by bots ...

I agree, but I can't find any evidence. I simply don't know what the time is spent on.

are there perhaps a lot (thousands) of emails in other moderation queues?

Many lists have a few hundred messages in queues, yes.

Actions

Copy link

#10

Updated by pjessen almost 3 years ago

On 'users-hu', we currently have 529 held messages and it takes 50-60seconds to discard one.

Actions

Copy link

#11

Updated by cboltz almost 3 years ago

~500 messages should be nothing for the database server ("let's talk again when we have 5000000" ;-)

OTOH, I wouldn't be surprised if the mailman developers only test with only 5 held messages because they expect everybody to moderate quickly, and never even thought about the performance in this area. (And, to be honest, with 500 mails in the queue for one list it might be a better idea to switch to reject-nonmembers or even reject-what-would-be-moderated. That's better than deleting these mails a month later.)

Since the overall mailman performance isn't that bad, my guess is that the terrible slowdown is caused by a query or maybe "just" a missing index around the held messages.

Is there a way to find out which queries run for a) deleting a held message and b) getting the list of held messages? And what does EXPLAIN $query *) say for them?

*) Disclaimer: I only have some experience with debugging MySQL queries, and hope that similar methods work on Postgres. And while I can probably give some hints, I'll need someone who is familiar with Postgres to actually do whatever is needed.

BTW: Are you sure that it's really discarding a message that is so slow? Or is it (re)loading the list of held messages afterwards?

Actions

Copy link

#12

Updated by pjessen almost 3 years ago

cboltz wrote:

~500 messages should be nothing for the database server ("let's talk again when we have 5000000" ;-)

Agree.

OTOH, I wouldn't be surprised if the mailman developers only test with only 5 held messages because they expect everybody to
moderate quickly, and never even thought about the performance in this area. (And, to be honest, with 500 mails in the queue
for one list it might be a better idea to switch to reject-nonmembers or even reject-what-would-be-moderated.
That's better than deleting these mails a month later.

Generally I agree, although we would still end up with a couple of lists open to all (a board decision IIRC?)

Since the overall mailman performance isn't that bad, my guess is that the terrible slowdown is caused by a query or maybe
"just" a missing index around the held messages.

Been there, had that suspicion. There are some slow queries, but I cannot make them coincide with discarding any held messages.
With strace, all I see is lots of communication (sendto, recvfrom) with something.

Is there a way to find out which queries run for a) deleting a held message and b) getting the list of held messages? And what does EXPLAIN $query *) say for them?

It really does not appear to be a database issue. Besides, a 50sec delay for 500 rows?

BTW: Are you sure that it's really discarding a message that is so slow? Or is it (re)loading the list of held messages afterwards?

I was asking myself that, see my comment nr. 7 above. For each message, it took 15-20sec.

Actions

Copy link

#13

Updated by pjessen almost 3 years ago

Related to tickets #101848: Poor performance of lists.opensuse.org - web ui, running very slowly. added

Actions

Copy link

#14

Updated by pjessen almost 3 years ago

Just one observation - yesterday I discarded a single message from a moderation queue of only one, it also took 20-30 seconds.

Actions

Copy link

#15

Updated by pjessen almost 3 years ago

Today, I am able to discard messages in blocks of 10, only takes 5sec apiece.

Actions

Copy link

#16

Updated by pjessen almost 3 years ago

Status changed from New to Resolved
Assignee set to pjessen
% Done changed from 0 to 100

pjessen wrote:

Today, I am able to discard messages in blocks of 10, only takes 5sec apiece.

Same now, takes about 1 minute to discard 10 messages from the moderation queue. This is still very poor performance, but I doubt if we can do much about that. I am closing as resolved, but feel free to re-open.

Actions

Copy link

#17

Updated by pjessen almost 3 years ago

Status changed from Resolved to New

Re-opening. Discarding a message now takes about 20sek.

Actions

Copy link

#18

Updated by pjessen almost 3 years ago

% Done changed from 100 to 0

Currently, it takes 25-30 seconds to discard a single held message from the xfce list. Back in Feb, I did a count of held messages, and got to about 18'000, for all lists.

Actions

Copy link

#19

Updated by pjessen almost 3 years ago

Status changed from New to In Progress

I have managed to figure out, more or less, how to use the mailman3 REST API via wget and I can now discard messages "in batch", ie. without human interaction. It still takes 20-25-30 seconds per message, but I can now leave it to run unattended overnight or just in the background while I do something else.
The basic idea is:

on mailman3, get a list of held messages for a list, "wget -O list.json http://localhost:8001/3.0/lists/kde@lists.opensuse.org/held"
the resulting json data can be easily processed, use whatever your favourite tool is. I used php.
I printed out the request_ids and the email subjects, easy to grep.
use a list of request_ids and produce urls: http://localhost:8001/3.0/lists/kde@lists.opensuse.org/held/REQID
wget with --post-data="action=discard"

To anyone who is python-literate, I'm sure this is backwards and stupid, but for me, it does the job.

Actions

Copy link

#20

Updated by pjessen almost 3 years ago

There must be a leak or some such, somewhere in mailman. I think the mailman "runner" that services the REST API at port 8001 was restarted this morning and now discard requests are processed in 4-5 seconds instead of 25-30 seconds yesterday. I am continuing clearing out held messages, but the general problem will not fix itself.

Actions

Copy link

#21

Updated by pjessen almost 3 years ago

Status changed from In Progress to Resolved
% Done changed from 0 to 100

I have cleared out about 90% of the 20K held messages, there is now 2118 left. As I wrote earlier, I think this is a bug in postorious, so the slowness will likely reappear. Feel free to re-open.

Actions

Copy link

#22

Updated by pjessen over 2 years ago

Related to tickets #113111: lists.opensuse.org - webserver backand timeouts - nginx 504 gateway timeout added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

openSUSE admin

Tags

Custom queries

tickets #104899

moderation of lists.opensuse.org very slow / timing out

Updated by pjessen about 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by msmeissn almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by msmeissn almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by cboltz almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen almost 3 years ago

Updated by pjessen over 2 years ago