Project

General

Profile

Actions

tickets #181646

open

code.o.o (Pagure) causes DoS against id.o.o (Ipsilon)

Added by crameleon about 1 month ago. Updated 6 days ago.

Status:
New
Priority:
High
Assignee:
Category:
Pagure
Target version:
-
Start date:
2025-04-30
Due date:
% Done:

0%

Estimated time:

Description

This is just from the last 22.5 hours:

ldap-proxy (idp proxy server):~ # grep -c 2a07:de40:b27e:1206::a /var/log/apache2/error_log
1098630

This causes session cache on Ipsilon to fill up so rapidly with session and lock files that the disk runs full before Ipsilon cleans up.


Related issues 2 (1 open1 closed)

Related to openSUSE admin - tickets #181751: login to code.o.o not possibleRejected2025-05-03

Actions
Related to openSUSE admin - tickets #182324: State of pagure01.i.o.o / code.o.oNewPharaoh_Atem2025-05-13

Actions
Actions #1

Updated by crameleon about 1 month ago

  • Category set to Pagure
  • Assignee set to Pharaoh_Atem
  • Priority changed from Normal to High
  • Private changed from Yes to No
Actions #2

Updated by crameleon about 1 month ago · Edited

I now blocked pagure01.i.o.o from reaching id.o.o.

Please mitigate this situation and let me know to remove the ban again.

Actions #3

Updated by crameleon about 1 month ago · Edited

I marked code.o.o as down on status.o.o as it seems to serve 502 as a result.

Edit: seems it does work occasionally, just login not, so I changed it to partial outage.

Actions #4

Updated by crameleon about 1 month ago

This also does not help: https://pagure.io/ipsilon/issue/262. ;-)

Actions #5

Updated by crameleon about 1 month ago

Actions #6

Updated by sfalken@cloverleaf-linux.org 28 days ago

I'm not a hero, and don't have the right access to look, but can somebody hit me with some logs, so I can have a look and see if I can't fix this issue?

Actions #7

Updated by crameleon 28 days ago

If you let me know what you're interested in, sure. On pagure01 I find it still trying to query id.o.o constantly:

May 07 02:47:05 pagure01 gunicorn[4049]: 2025-05-07 02:47:05,959 [WARNING] pagure.ui.flask_fas_openid: Error fetching XRDS document: Remote end closed connection without response
pagure01 (pagure):~ # journalctl -t gunicorn -g XRDS -S '1d ago' --no-pager |wc -l
277045

(that warning is just because it's no longer allowed to reach it)

There's nothing that seems relevant leading up to the message in the journal.

In /var/log/pagure/ I find a access_web.log seems to be web server access logs and a error_web.log which shows this over and over again:

[2025-05-07 15:56:16 +0000] [923] [CRITICAL] WORKER TIMEOUT (pid:19664)
[2025-05-07 15:56:16 +0000] [923] [CRITICAL] WORKER TIMEOUT (pid:19847)
[2025-05-07 15:56:17 +0000] [19943] [INFO] Booting worker with pid: 19943
[2025-05-07 15:56:17 +0000] [19946] [INFO] Booting worker with pid: 19946
pagure01 (pagure):~ # grep -c '^\[2025-05-07.*TIMEOUT' /var/log/pagure/error_web.log
811

Seems somewhat broken too but probably not related.

Actions #8

Updated by crameleon 22 days ago

@Pharaoh_Atem Any update?

Actions #9

Updated by crameleon 22 days ago

Actions #10

Updated by Pharaoh_Atem 20 days ago

I am looking into it, I am just not sure yet what's going on.

Actions #11

Updated by crameleon 6 days ago

Considering

  • there still not having been any maintainer solution in either Pagure or Ipsilon
  • a community member having shown interest in hosting a Forgejo and syncing with Fedora on the migration tooling in a reasonable time frame

and me being interested in making the situation usable in the meanwhile, I now went through a myriad of hacks, and eventually made Ipsilon serve a static XRDS file instead of generating one through Ipsilon.
I will have to monitor if this keeps the load tame enough to not cause an outage but it means that login is possible again for now.

Actions

Also available in: Atom PDF