Project

General

Profile

Actions

tickets #125897

open

pagure01.infra.o.o / code-o-o - redis killed due to OOM, system RAM and swap maxed out

Added by lkocman over 1 year ago. Updated about 1 month ago.

Status:
Workable
Priority:
Normal
Assignee:
Category:
Git(lab|hub)
Target version:
-
Start date:
2023-03-13
Due date:
% Done:

0%

Estimated time:

Description

Hello team,

I tried to reject this particular issue https://code.opensuse.org/leap/features/issue/105
However, I always received Fatal Error (500)

Could you please look at it?

Thank you


Related issues 1 (0 open1 closed)

Related to openSUSE admin - tickets #125687: code.o.o error when creating new reposResolvedcrameleon2023-03-09

Actions
Actions #1

Updated by lkocman over 1 year ago

I just figured out that the same 500 happens if I try to comment to the issue.

Actions #2

Updated by lkocman over 1 year ago

Logs mention that connection to redis failed.

Actions #3

Updated by Pharaoh_Atem over 1 year ago

This is what I'm seeing at a cursory glance...

pagure01 (pagure):~ # systemctl status redis@default
× redis@default.service - Redis instance: default
     Loaded: loaded (/usr/lib/systemd/system/redis@.service; enabled; vendor preset: disabled)
     Active: failed (Result: signal) since Wed 2023-03-08 16:38:42 UTC; 4 days ago
    Process: 25073 ExecStart=/usr/sbin/redis-server /etc/redis/default.conf (code=killed, signal=KILL)
   Main PID: 25073 (code=killed, signal=KILL)
     Status: "Redis is loading..."

Mar 08 16:38:42 pagure01 systemd[1]: redis@default.service: Main process exited, code=killed, status=9/KILL
Mar 08 16:38:42 pagure01 systemd[1]: redis@default.service: Failed with result 'signal'.
Mar 08 16:38:42 pagure01 systemd[1]: Failed to start Redis instance: default.
Mar 08 16:38:42 pagure01 systemd[1]: redis@default.service: Scheduled restart job, restart counter is at 2183.
Mar 08 16:38:42 pagure01 systemd[1]: Stopped Redis instance: default.
Mar 08 16:38:42 pagure01 systemd[1]: redis@default.service: Start request repeated too quickly.
Mar 08 16:38:42 pagure01 systemd[1]: redis@default.service: Failed with result 'signal'.
Mar 08 16:38:42 pagure01 systemd[1]: Failed to start Redis instance: default.
pagure01 (pagure):~ # systemctl start redis@default
Job for redis@default.service failed because a fatal signal was delivered to the control process.
See "systemctl status redis@default.service" and "journalctl -xeu redis@default.service" for details.

Actions #4

Updated by Pharaoh_Atem over 1 year ago

Looks like redis is getting killed because we're out of memory. Trying to restart service to free up RAM.

Actions #5

Updated by Pharaoh_Atem over 1 year ago

  • Subject changed from code-o-o - unable to reject ticket to code-o-o - redis killed due to OOM, system RAM and swap maxed out

It looks like we're maxed out on RAM and swap, can we double the amount of RAM for the pagure VM?

Actions #6

Updated by Pharaoh_Atem over 1 year ago

  • Private changed from Yes to No
Actions #7

Updated by Pharaoh_Atem over 1 year ago

  • Subject changed from code-o-o - redis killed due to OOM, system RAM and swap maxed out to pagure01.infra.o.o / code-o-o - redis killed due to OOM, system RAM and swap maxed out
Actions #9

Updated by crameleon over 1 year ago

  • Status changed from New to In Progress
  • Assignee set to crameleon
Actions #10

Updated by crameleon over 1 year ago

  • Status changed from In Progress to Workable
  • Assignee deleted (crameleon)

Thanks for the SD ticket, Lubos.

Memory change is done.

I recommend the application owner to configure the maxmemory and maxmemory-policy options in Redis to mitigate such issues.

Actions #11

Updated by pjessen over 1 year ago

Actions #12

Updated by crameleon 12 months ago

  • Assignee set to Pharaoh_Atem

Hi @Pharaoh_Atem, any comments on my suggestion?

Actions #13

Updated by wombelix about 1 month ago

crameleon wrote in #note-10:

I recommend the application owner to configure the maxmemory and maxmemory-policy options in Redis to mitigate such issues.

sounds reasonable to me, I just don't know what a sane value for Redis on code.o.o would be. Any monitoring data you can use to identify an average value over the last months that would make sense to set?

Actions #14

Updated by crameleon about 1 month ago

Hi,

identifying an average to set as the maximum is no problem, I mostly wonder what policy suits Pagure best. These are the possible choices:

# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory
# is reached. You can select one from the following behaviors:
#
# volatile-lru -> Evict using approximated LRU, only keys with an expire set.
# allkeys-lru -> Evict any key using approximated LRU.
# volatile-lfu -> Evict using approximated LFU, only keys with an expire set.
# allkeys-lfu -> Evict any key using approximated LFU.
# volatile-random -> Remove a random key having an expire set.
# allkeys-random -> Remove a random key, any key.
# volatile-ttl -> Remove the key with the nearest expire time (minor TTL)
# noeviction -> Don't evict anything, just return an error on write operations.
Actions

Also available in: Atom PDF