tickets #154051
openLarge amount of SMTP connections in CLOSE-WAIT on atlas2
0%
Description
The CPU usage on atlas2 (aka the proxy for mx2.o.o) is permanently at 100%, it seems this is the reason:
atlas2 (Proxy):~ # ss -t state CLOSE-WAIT '( dport 25 )' |wc -l
1508
In contrast, on atlas1, only a few connections are in the CLOSE-WAIT state at a time.
Using
atlas2 (Proxy):~ # ss -Hnto sport 25|awk '{ split($5, con, /:[[:digit:]]{4,5}$/); print con[1] }'|sort | uniq -d --count|sort -h
one can find that there are indeed a few addresses with an oddly large amount of connections, but some just have a few at a time.
I know that secondary mx's are often a "target", but given both MX records have the same priority I think this behavior is odd to only be found on atlas2. I'm not sure how to assess whether it's an issue with our HAProxy/Postfix configurations or actually malicious connections.
Killing them using ss's --kill
, clears the entries, but HAProxy continues to hog the CPU until the service is restarted. Doing this seems to only help for a few hours.
Any ideas?