tickets #135527
closedmatrix.i.o.o hookshot bridge failures and CPU spikes
100%
Description
Lately I've noticed that sometimes when I enter Element on my phone that animation of syncing never stops and rooms don't sync latest messages.
Today I logged in on matrix.i.o.o and checked the system logs, and this loop was all over the place:
Sep 11 14:04:13 matrix systemd[1]: hookshot.service: Main process exited, code=exited, status=1/FAILURE
Sep 11 14:04:13 matrix systemd[1]: hookshot.service: Failed with result 'exit-code'.
Sep 11 14:04:13 matrix systemd[1]: hookshot.service: Consumed 1.576s CPU time.
Sep 11 14:04:14 matrix systemd[1]: hookshot.service: Scheduled restart job, restart counter is at 260306.
Sep 11 14:04:14 matrix systemd[1]: Stopped Matrix Appservice hookshot.
Sep 11 14:04:14 matrix systemd[1]: hookshot.service: Consumed 1.576s CPU time.
Sep 11 14:04:14 matrix systemd[1]: Started Matrix Appservice hookshot.
Sep 11 14:04:15 matrix node[14486]: node:internal/modules/cjs/loader:1051
Sep 11 14:04:15 matrix node[14486]: throw err;
Sep 11 14:04:15 matrix node[14486]: ^
Sep 11 14:04:15 matrix node[14486]: Error: Cannot find module 'rss-parser'
Sep 11 14:04:15 matrix node[14486]: Require stack:
Sep 11 14:04:15 matrix node[14486]: - /data/var/lib/matrix-synapse/hookshot/lib/feeds/FeedReader.js
Sep 11 14:04:15 matrix node[14486]: - /data/var/lib/matrix-synapse/hookshot/lib/Bridge.js
Sep 11 14:04:15 matrix node[14486]: - /data/var/lib/matrix-synapse/hookshot/lib/App/BridgeApp.js
Sep 11 14:04:15 matrix node[14486]: at Module._resolveFilename (node:internal/modules/cjs/loader:1048:15)
Sep 11 14:04:15 matrix node[14486]: at Module._load (node:internal/modules/cjs/loader:901:27)
Sep 11 14:04:15 matrix node[14486]: at Module.require (node:internal/modules/cjs/loader:1115:19)
Sep 11 14:04:15 matrix node[14486]: at require (node:internal/modules/helpers:130:18)
Sep 11 14:04:15 matrix node[14486]: at Object.<anonymous> (/data/var/lib/matrix-synapse/hookshot/lib/feeds/FeedReader.js:11:38)
Sep 11 14:04:15 matrix node[14486]: at Module._compile (node:internal/modules/cjs/loader:1233:14)
Sep 11 14:04:15 matrix node[14486]: at Module._extensions..js (node:internal/modules/cjs/loader:1287:10)
Sep 11 14:04:15 matrix node[14486]: at Module.load (node:internal/modules/cjs/loader:1091:32)
Sep 11 14:04:15 matrix node[14486]: at Module._load (node:internal/modules/cjs/loader:938:12)
Sep 11 14:04:15 matrix node[14486]: at Module.require (node:internal/modules/cjs/loader:1115:19)
Sep 11 14:04:15 matrix node[14486]: at require (node:internal/modules/helpers:130:18)
Sep 11 14:04:15 matrix node[14486]: at Object.<anonymous> (/data/var/lib/matrix-synapse/hookshot/lib/Bridge.js:34:22)
Sep 11 14:04:15 matrix node[14486]: at Module._compile (node:internal/modules/cjs/loader:1233:14)
Sep 11 14:04:15 matrix node[14486]: at Module._extensions..js (node:internal/modules/cjs/loader:1287:10)
Sep 11 14:04:15 matrix node[14486]: at Module.load (node:internal/modules/cjs/loader:1091:32)
Sep 11 14:04:15 matrix node[14486]: at Module._load (node:internal/modules/cjs/loader:938:12) {
Sep 11 14:04:15 matrix node[14486]: code: 'MODULE_NOT_FOUND',
Sep 11 14:04:15 matrix node[14486]: requireStack: [
Sep 11 14:04:15 matrix node[14486]: '/data/var/lib/matrix-synapse/hookshot/lib/feeds/FeedReader.js',
Sep 11 14:04:15 matrix node[14486]: '/data/var/lib/matrix-synapse/hookshot/lib/Bridge.js',
Sep 11 14:04:15 matrix node[14486]: '/data/var/lib/matrix-synapse/hookshot/lib/App/BridgeApp.js'
Sep 11 14:04:15 matrix node[14486]: ]
Sep 11 14:04:15 matrix node[14486]: }
Sep 11 14:04:15 matrix node[14486]: Node.js v20.5.1
Sep 11 14:04:15 matrix systemd[1]: hookshot.service: Main process exited, code=exited, status=1/FAILURE
Sep 11 14:04:15 matrix systemd[1]: hookshot.service: Failed with result 'exit-code'.
Sep 11 14:04:15 matrix systemd[1]: hookshot.service: Consumed 1.515s CPU time.
Sep 11 14:04:16 matrix systemd[1]: Stopped Matrix Appservice hookshot.
Sep 11 14:04:16 matrix systemd[1]: hookshot.service: Consumed 1.515s CPU time.
The hookshot bridge was failing and retrying, failing and retrying, due to Error: Cannot find module 'rss-parser'
. What are we using the hookshot bridge for?
The system load was ~3 ~3 ~3 (higher than seen in the attached image) before I stopped hookshot.service
. And after observing Htop for a while the load average is still high -- more than 2, and less than 3.
Files
Updated by luc14n0 12 months ago
Another thing I noticed, is this normal?
luc14n0@matrix:~> ping -c4 opensuse.org
PING opensuse.org (195.135.221.140) 56(84) bytes of data.
From scar.infra.opensuse.org (192.168.47.252) icmp_seq=1 Destination Port Unreachable
From scar.infra.opensuse.org (192.168.47.252) icmp_seq=2 Destination Port Unreachable
From scar.infra.opensuse.org (192.168.47.252) icmp_seq=3 Destination Port Unreachable
From scar.infra.opensuse.org (192.168.47.252) icmp_seq=4 Destination Port Unreachable
--- opensuse.org ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3003ms
Updated by luc14n0 12 months ago
BTW, how are we updating things over there, more specifically that particular bridge?
P.S.: I spoke too soon. From Bash history, I see sudo -u synapse yarn run build
and sudo -u synapse npm run build
, I suppose that's run under /var/lib/matrix-synapse/hookshot
after doing sudo -u synapse git foo
.
Updated by crameleon 12 months ago
Another thing I noticed, is this normal?
It's not normal, but I couldn't quite figure out what the issue is by inspecting /etc/nftables.conf on scar.i.o.o. Other hosts in the network are allowed ICMP/to ping just fine. But I couldn't notice it causing any issues with the applications on matrix.i.o.o either.
From Bash history, I see sudo -u synapse yarn run build and sudo -u synapse npm run build, I suppose that's run under /var/lib/matrix-synapse/hookshot after doing sudo -u synapse git foo.
That was just me debugging what is usually executed by Salt, sorry for the confusion.
Updated by luc14n0 12 months ago
crameleon wrote in #note-8:
Another thing I noticed, is this normal?
It's not normal, but I couldn't quite figure out what the issue is by inspecting /etc/nftables.conf on scar.i.o.o. Other hosts in the network are allowed ICMP/to ping just fine. But I couldn't notice it causing any issues with the applications on matrix.i.o.o either.
OK. I'm not sure it causes any issue with the app either.
From Bash history, I see sudo -u synapse yarn run build and sudo -u synapse npm run build, I suppose that's run under /var/lib/matrix-synapse/hookshot after doing sudo -u synapse git foo.
That was just me debugging what is usually executed by Salt, sorry for the confusion.
Thanks for the clarification.
Updated by luc14n0 12 months ago
I believe I was able to hopefully fix the Hookshot bridge by using (probably) unorthodox means, with my limited NPM/Yarn skills.
Since rss-parser
wasn't installed, which by the way is odd because if I got this right:
matrix (matrix.o.o):/var/lib/matrix-synapse/hookshot # grep -n rss-parser node_modules/.yarn-integrity
./node_modules/.yarn-integrity:59: "rss-parser@^3.12.0",
./node_modules/.yarn-integrity:921: "rss-parser@^3.12.0": "https://registry.yarnpkg.com/rss-parser/-/rss-parser-3.12.0.
tgz#b8888699ea46304a74363fbd8144671b2997984c",
Does that means that at some point it was installed, somehow?! I can't say.
$ yarn help | grep integrity
--skip-integrity-check run install without checking if node_modules is installed
So, what did I do? First I added rss-parser
with Yarn:
matrix (matrix.o.o):/var/lib/matrix-synapse/hookshot # sudo -u synapse yarn add rss-parser
This recompiled all the Node modules successfully, but failed with:
...
$ tsc --project tsconfig.json
error TS5109: Option 'moduleResolution' must be set to 'node16' (or left unspecified) when option 'module' is set to 'node16'.
So I edited tsconfig.json
:
diff --git a/tsconfig.json b/tsconfig.json
index 7d9a92a..839c386 100644
--- a/tsconfig.json
+++ b/tsconfig.json
@@ -1,6 +1,8 @@
{
"extends": "@tsconfig/node18/tsconfig.json",
"compilerOptions": {
+ "module": "NodeNext",
+ "moduleResolution": "NodeNext",
"incremental": true,
"declaration": false,
"outDir": "./lib",
And tried that failed step again with:
matrix (matrix.o.o):/var/lib/matrix-synapse/hookshot # sudo -u synapse yarn run tsc --project tsconfig.json
yarn run v1.22.19
$ /data/var/lib/matrix-synapse/hookshot/node_modules/.bin/tsc --project tsconfig.json
Done in 10.42s.
And then I restarted the hookshot.service
. It seems fine:
Sep 11 22:39:44 matrix systemd[1]: Started Matrix Appservice hookshot.
Sep 11 22:39:45 matrix node[6394]: INFO 22:39:45:720 [ListenerService] Listening on http://127.0.0.1:9005 for webhooks
Sep 11 22:39:45 matrix node[6394]: INFO 22:39:45:724 [Appservice] Initialising memory storage
Sep 11 22:39:45 matrix node[6394]: INFO 22:39:45:733 [Bridge] Starting up
Sep 11 22:39:45 matrix node[6394]: INFO 22:39:45:733 [UserTokenStore] Loading token key file passkey.pem
Sep 11 22:39:45 matrix node[6394]: INFO 22:39:45:737 [Bridge] Ensuring homeserver can be reached...
Sep 11 22:39:45 matrix node[6394]: INFO 22:39:45:795 [BotUsersManager] Ensuring bot users are set up...
Sep 11 22:39:45 matrix node[6394]: INFO 22:39:45:815 [BotUsersManager] Getting joined rooms...
Sep 11 22:39:45 matrix node[6394]: INFO 22:39:45:821 [Config] Prefilling room membership for permissions for 0 rooms
Sep 11 22:39:49 matrix node[6394]: ERROR 22:39:49:375 [ConnectionManager] User @_hookshot:opensuse.org is disallowed to manage state for generic in !NXyYLakNWAIiSBMuWD:opensuse.org
Sep 11 22:39:51 matrix node[6394]: INFO 22:39:51:918 [Bridge] All connections loaded
Sep 11 22:39:51 matrix node[6394]: INFO 22:39:51:921 [Bridge] Bridge is now ready. Found 4 connections
Sep 11 22:41:39 matrix node[6394]: INFO 22:41:39:380 [Appservice] Processing transaction 4645132
Sep 11 22:41:39 matrix node[6394]: INFO 22:41:39:382 [Appservice] Processing event of type m.room.member
Sep 11 22:41:39 matrix node[6394]: INFO 22:41:39:387 [Appservice] 127.0.0.1 - - [11/Sep/2023:22:41:39 +0000] "PUT /_matrix/app/v1/transactions/4645132?access_token=%3Credacted%3E HTTP/1.1" 200 2 "-" "Synapse/1.90.0"
Sep 11 22:41:39 matrix node[6394]: INFO 22:41:39:412 [Appservice] Processing transaction 4645138
Sep 11 22:41:39 matrix node[6394]: INFO 22:41:39:412 [Appservice] Processing event of type m.room.member
Sep 11 22:41:39 matrix node[6394]: INFO 22:41:39:413 [Appservice] 127.0.0.1 - - [11/Sep/2023:22:41:39 +0000] "PUT /_matrix/app/v1/transactions/4645138?access_token=%3Credacted%3E HTTP/1.1" 200 2 "-" "Synapse/1.90.0"
Sep 11 22:41:39 matrix node[6394]: INFO 22:41:39:441 [Appservice] Processing transaction 4645190
Sep 11 22:41:39 matrix node[6394]: INFO 22:41:39:441 [Appservice] Processing event of type m.room.member
Sep 11 22:41:39 matrix node[6394]: INFO 22:41:39:442 [Appservice] 127.0.0.1 - - [11/Sep/2023:22:41:39 +0000] "PUT /_matrix/app/v1/transactions/4645190?access_token=%3Credacted%3E HTTP/1.1" 200 2 "-" "Synapse/1.90.0"
About whether we need to report something to upstream. I'm just not sure.
Updated by luc14n0 12 months ago
- Status changed from New to Resolved
- % Done changed from 50 to 100
The Hookshot bridge seems to be fixed. Of course at this moment other issues started to crop up and make the homeserver unstable.
And about the CPU spikes I mention in the title of the ticket, I realized that the spikes I saw at the time I was taking a look at the Hookshot bridge were normal after all, and within the range of what we could call "normal" average load (~50%) for a 6-core system.
So I'd say we can close this ticket.