Project

General

Profile

Actions

tickets #95756

closed

factory mailing list archives incomplete

Added by boombatower over 3 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Mailing lists
Target version:
-
Start date:
2021-07-21
Due date:
% Done:

0%

Estimated time:


Related issues 1 (0 open1 closed)

Related to openSUSE admin - tickets #103911: Mailing list archives broken?Resolved2021-12-13

Actions
Actions #1

Updated by pjessen over 3 years ago

  • Category set to Mailing lists
  • Private changed from Yes to No
Actions #2

Updated by pjessen over 3 years ago

  • Subject changed from mailing list archives incomplete to factory mailing list archives incomplete

I can confirm, the factory list mbox archive for July 2021 stops at around 14 July.
Looking at e.g. https://lists.opensuse.org/archives/list/users@lists.opensuse.org/export/users@lists.opensuse.org-2021-08.mbox.gz?start=2021-07-01&end=2021-08-01 or https://lists.opensuse.org/archives/list/heroes@lists.opensuse.org/export/heroes@lists.opensuse.org-2021-08.mbox.gz?start=2021-07-01&end=2021-08-01, they both seem to be complete. Maybe this only affects the factory list? I expect those mbox.gz exports are created on-demand, so that is probably the place to start looking.

Actions #3

Updated by blackbrook about 3 years ago

I would like to emphasize that what is going on is non-deterministic which should be a big clue as to the cause. If you run that curl command multiple times and look at the size of the resultant .gz file (in bytes), you will see that it is different each time. Some ranges may produce a file that looks complete (and may be usable), but there is still usually a slight variation in the size in bytes, which suggests to me different amounts of whitespace are just being harmlessly truncated in those cases that appear successful.

Also, it is not the gz that is being truncated but the .mbox file going into the archive.

Actions #4

Updated by hellcp about 3 years ago

Wow, sorry for inaction on this, I had no idea this ticket existed until I looked at snapshot review site to see why I wasn't getting notified of new releases. I will bring this up with upstream and see what can be done to resolve this

Actions #5

Updated by hellcp about 3 years ago

It seems that for archive that Jimmy brought up specifically, 14th of July always fails to load, so we may have too large limit on how big mails can be to be accepted on the mailing list.

As to why this currently happens, it's likely caused by the current timeout value in uwsgi. We could increase that, since archives like this are quite large and take a while to process. We also need to optimize hyperkitty in general, because it times out on much smaller pages, but that's an issue that's already reported in another ticket and needs to be addressed there.

Would it be possible to maybe use smaller increments of the archives instead of full month? Instead of trying to load https://lists.opensuse.org/archives/list/factory@lists.opensuse.org/export/factory@lists.opensuse.org-2021-07.mbox.gz?start=2021-07-01&end=2021-08-01, maybe try with https://lists.opensuse.org/archives/list/factory@lists.opensuse.org/export/factory@lists.opensuse.org-2021-07.mbox.gz?start=2021-07-01&end=2021-07-15 and https://lists.opensuse.org/archives/list/factory@lists.opensuse.org/export/factory@lists.opensuse.org-2021-07.mbox.gz?start=2021-07-15&end=2021-08-01 and combine the results, while we work on a more permanent solution.

Actions #6

Updated by pjessen about 3 years ago

Actions #7

Updated by crameleon 6 months ago

When requesting the problematic query, the following is printed upon abortion:

[2024-07-14 11:17:45 +0000] [53809] [ERROR] Error handling request
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 183, in handle_request
    for item in respiter:
  File "/usr/lib/python3.11/site-packages/hyperkitty/views/mlist.py", line 391, in stream_mbox
    yield compressor.compress(email.as_bytes())
                              ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/hyperkitty/models/email.py", line 200, in as_bytes
    msg = self.as_message()
          ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/hyperkitty/models/email.py", line 190, in as_message
    msg.add_attachment(attachment.get_content(), maintype=mimetype[0],
  File "/usr/lib64/python3.11/email/message.py", line 1184, in add_attachment
    self._add_multipart('mixed', *args, _disp='attachment', **kw)
  File "/usr/lib64/python3.11/email/message.py", line 1172, in _add_multipart
    part.set_content(*args, **kw)
  File "/usr/lib64/python3.11/email/message.py", line 1199, in set_content
    super().set_content(*args, **kw)
  File "/usr/lib64/python3.11/email/message.py", line 1129, in set_content
    content_manager.set_content(self, *args, **kw)
  File "/usr/lib64/python3.11/email/contentmanager.py", line 37, in set_content
    handler(msg, obj, *args, **kw)
TypeError: set_text_content() got an unexpected keyword argument 'maintype'
Actions #8

Updated by crameleon 6 months ago

  • Status changed from New to In Progress
  • Assignee set to crameleon

The issue is when an attachment represents a string instead of a bytes object:

2024-07-14 11:22:20,102 54137 hyperkitty.models.email <class 'bytes'>
ERROR 2024-07-14 11:22:20,124 54137 hyperkitty.models.email <class 'str'>

I will prepare a patch.

Actions #10

Updated by crameleon 6 months ago

  • Status changed from In Progress to Blocked
Actions #11

Updated by crameleon 5 months ago

  • Status changed from Blocked to In Progress

Upstream has better suggestions regarding the issue / I will investigate more.

Actions #12

Updated by crameleon 5 months ago

  • Status changed from In Progress to Blocked

The issue was found to be with one email containing a 0B attachment. Patch was merged upstream returning a bytes instead of a string object in this situation: https://gitlab.com/mailman/hyperkitty/-/merge_requests/635. New patch submitted downstream: https://build.opensuse.org/request/show/1188489.

Actions #13

Updated by crameleon 5 months ago

  • Status changed from Blocked to Resolved

Deployed, link tested and confirmed to now download correctly.

Actions

Also available in: Atom PDF