tickets #135785
closedAssess raising matrix.i.o.o resources
0%
Description
I have been watching matrix.i.o.o resource consumption of the last few days, and we should assess raising its RAM, at least:
luc14n0@matrix:~> free -h
total used free shared buff/cache available
Mem: 14Gi 13Gi 525Mi 0.0Ki 784Mi 961Mi
Swap: 255Mi 255Mi 0.0Ki
It's been like that for almost as long as I started watching it -- around three days ago when I took a closer look at the Hookshot bridge failures.
The average load, from what I've seen, is reasonably OK from the times I was looking at it. Sometimes there are some crazy CPU spikes:
luc14n0@matrix:~> w
21:25:38 up 10 days, 8:44, 3 users, load average: 5.76, 11.07, 23.69
Like this one I saw today, but I can't say exactly what is causing them, and how frequent they happen.
Files
Updated by pjessen about 1 year ago
256Mb swap is ridiculous, in my opinion, but no more ridiculous than 14Gb for exchanging a few messages 😱
Any chance this is yet-another-memory-leak? Something that needs tidying up every so often? I can't login to matrix.i.o.o to have a look, so just thinking out loud.
Updated by luc14n0 about 1 year ago
pjessen wrote in #note-2:
256Mb swap is ridiculous, in my opinion, but no more ridiculous than 14Gb for exchanging a few messages 😱
(Un)Fortunately we're doing much more than exchanging a few messages nowadays. I can't say exact numbers, Jacob knows better, but we have dozens of rooms. I imagine at least some are (very) busy, at times. We do have thousands of users, most of which are bots. We have 17 workers, if I'm counting them correctly. We have three bridges (Discord, Telegram, and Hookshot).
All of that doesn't scale well with Synapse, which is written in Python.
Any chance this is yet-another-memory-leak? Something that needs tidying up every so often? I can't login to matrix.i.o.o to have a look, so just thinking out loud.
I don't think so. Looking at it right now, it still sitting around of 13.2G/14.4G -- in Htop:
matrix (matrix.o.o):~ # free --mega
total used free shared buff/cache available
Mem: 15506 14584 207 0 1079 921
Swap: 268 268 0
Just like yesterday. But I'm going to reboot the VM and watch it over time.
Updated by luc14n0 about 1 year ago
Right now the load is below 100%, though:
21:06:29 up 11 days, 8:25, 4 users, load average: 4.88, 4.68, 4.60
Updated by hellcp about 1 year ago
Before you restart, you could run zypper dup on the machine, as there's pending synapse update
Updated by luc14n0 about 1 year ago
OK. I did a zypper dup, then rebooted.
When the system got back it stayed sitting at around 3.8G. Now, after a while, it got higher. Let's see if it keeps getting higher tomorrow.
matrix (matrix.o.o):~ # free --mega
total used free shared buff/cache available
Mem: 15506 6334 6817 0 2719 9171
Swap: 268 0 268
matrix (matrix.o.o):~ # w
00:55:57 up 2:27, 1 user, load average: 1.19, 0.47, 0.42
Updated by luc14n0 about 1 year ago
BTW, I left htop open the whole time. And it shows that there were a RAM usage spike (around 8G, judging by the bar in the annexed picture). Just a FYI.
Updated by pjessen about 1 year ago
luc14n0 wrote in #note-3:
pjessen wrote in #note-2:
256Mb swap is ridiculous, in my opinion, but no more ridiculous than 14Gb for exchanging a few messages 😱
(Un)Fortunately we're doing much more than exchanging a few messages nowadays. I can't say exact numbers, Jacob knows better, but we have dozens of rooms.
I imagine at least some are (very) busy, at times. We do have thousands of users, most of which are bots. We have 17 workers, if I'm counting them correctly.
We have three bridges (Discord, Telegram, and Hookshot).
I may be old fashioned, but that still sounds like "exchanging a few messages" 🙂
Any chance this is yet-another-memory-leak? Something that needs tidying up every so often? I can't login to matrix.i.o.o to have a look, so just thinking out loud.
I don't think so. Looking at it right now, it still sitting around of 13.2G/14.4G -- in Htop:
I only mentioned it because that was the case with mailman3. (also python ...)
Restarting the gunicorn workers after X requests produced a significant reduction in memory footprint. (--max-requests=500)
Updated by luc14n0 about 1 year ago
- Status changed from New to Feedback
pjessen wrote in #note-8:
luc14n0 wrote in #note-3:
pjessen wrote in #note-2:
256Mb swap is ridiculous, in my opinion, but no more ridiculous than 14Gb for exchanging a few messages 😱
(Un)Fortunately we're doing much more than exchanging a few messages nowadays. I can't say exact numbers, Jacob knows better, but we have dozens of rooms.
I imagine at least some are (very) busy, at times. We do have thousands of users, most of which are bots. We have 17 workers, if I'm counting them correctly.
We have three bridges (Discord, Telegram, and Hookshot).I may be old fashioned, but that still sounds like "exchanging a few messages" 🙂
Well, in a way it might very well be. However, in general there more things Synapse is doing in the background than, let's say, an IRC server. There are some many cogwheels spinning day in day out.
Any chance this is yet-another-memory-leak? Something that needs tidying up every so often? I can't login to matrix.i.o.o to have a look, so just thinking out loud.
I don't think so. Looking at it right now, it still sitting around of 13.2G/14.4G -- in Htop:
I only mentioned it because that was the case with mailman3. (also python ...)
Restarting the gunicorn workers after X requests produced a significant reduction in memory footprint. (--max-requests=500)
Taking another look:
matrix (matrix.o.o):~ # free --mega
total used free shared buff/cache available
Mem: 15506 6651 6180 0 3039 8854
Swap: 268 0 268
matrix (matrix.o.o):~ # uptime
23:08:48 up 1 day 0:40, 1 user, load average: 0.15, 0.23, 0.43
And you know what? At this point I won't say that there isn't a memory leak, or something similar -- like a worker spawning too many children. I'm going to keep an eye on the monitors for a while.
Updated by luc14n0 about 1 year ago
I believe my initial proposition while opening this ticket proved to be hasty judgement, and I changed my mind about raising the resources allocated to the matrix.i.o.o VM. However, I'm going to keep this ticket in feedback status while I keep my watch.
Updated by luc14n0 about 1 year ago
Indeed Per, you do have a point.
I've kept matrix.i.o.o under my watch and day by day the RAM consumption has been raising, little by little. And I have a suspicion it has to do with our current broken federation with matrix.org.
luc14n0@matrix:~> free --mega
total used free shared buff/cache available
Mem: 15506 9053 552 0 6265 6452
Swap: 268 1 266
The system load, however, got tamed ever since the last system reboot:
luc14n0@matrix:~> uptime
00:08:59 up 13 days 1:40, 1 user, load average: 0.23, 0.33, 0.40
Since there are incoming updates for element-web, and soon there will be one for Synapse as well, I'm thinking of creating a script to try to find out more about possible culprits after updating the system -- there is a Kernel update in the stack too. I do see lots of processes regarding federation.
If anyone knows a handy script that would fit the job for monitoring RAM usage and process spawning, please speak up. In the meanwhile I'm going to take a look at the openSUSE's System Analysis and Tuning Guide, more specifically the System Monitoring part.
Updated by crameleon about 1 year ago
If anyone knows a handy script that would fit the job for monitoring RAM usage
Is it not monitored by nrpe, visible in Icinga?
Updated by luc14n0 about 1 year ago
crameleon wrote in #note-12:
If anyone knows a handy script that would fit the job for monitoring RAM usage
Is it not monitored by nrpe, visible in Icinga?
Yes, it is. And here it is the last 13 days graph.
I don't think we're going to need over-engineering here:
luc14n0@matrix:~> ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head -4
PID PPID CMD %MEM %CPU
1232 1 /usr/bin/python3 -m synapse 18.2 18.5
1924 1 /usr/bin/python3 -m synapse 3.6 7.2
1921 1 /usr/bin/python3 -m synapse 3.6 7.6
luc14n0@matrix:~> ps -e -o pid,user,%mem,cmd --sort=-%mem | head -4
PID USER %MEM CMD
1232 synapse 18.2 /usr/bin/python3 -m synapse.app.homeserver --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/
1924 synapse 3.6 /usr/bin/python3 -m synapse.app.generic_worker --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/ --config-path=/etc/matrix-synapse/workers/federation_sender2.yaml
1921 synapse 3.6 /usr/bin/python3 -m synapse.app.generic_worker --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/ --config-path=/etc/matrix-synapse/workers/federation_sender1.yaml
luc14n0@matrix:~> python3 -c 'print(15506 * 0.182)'
2822.092
So the main Synapse process is using 2.822M of RAM, at this moment. I'm going to update the machine this Saturday as there's a Synapse update in the available (in the most recent upstream release should be available next week) and let's see how thing go from there.
Updated by luc14n0 about 1 year ago
For posterity's sake -- the graph will go away at some point:
Updated by luc14n0 about 1 year ago
- Status changed from Feedback to Closed
Alright, I'm going to close this one now, since there's nothing actionable here in the context of what this ticket was opened for.