tickets #95756
closedfactory mailing list archives incomplete
0%
Description
https://github.com/boombatower/tumbleweed-review/issues/18#issuecomment-883809536
TL;DR
curl 'https://lists.opensuse.org/archives/list/factory@lists.opensuse.org/export/factory@lists.opensuse.org-2021-07.mbox.gz?start=2021-07-01&end=2021-08-01' --output 2021-07.gz
File ends mid email instead of a complete mailbox archive.
Block Tumbleweed reviewer.
Updated by pjessen over 3 years ago
- Category set to Mailing lists
- Private changed from Yes to No
Updated by pjessen over 3 years ago
- Subject changed from mailing list archives incomplete to factory mailing list archives incomplete
I can confirm, the factory list mbox archive for July 2021 stops at around 14 July.
Looking at e.g. https://lists.opensuse.org/archives/list/users@lists.opensuse.org/export/users@lists.opensuse.org-2021-08.mbox.gz?start=2021-07-01&end=2021-08-01 or https://lists.opensuse.org/archives/list/heroes@lists.opensuse.org/export/heroes@lists.opensuse.org-2021-08.mbox.gz?start=2021-07-01&end=2021-08-01, they both seem to be complete. Maybe this only affects the factory list? I expect those mbox.gz exports are created on-demand, so that is probably the place to start looking.
Updated by blackbrook about 3 years ago
I would like to emphasize that what is going on is non-deterministic which should be a big clue as to the cause. If you run that curl command multiple times and look at the size of the resultant .gz file (in bytes), you will see that it is different each time. Some ranges may produce a file that looks complete (and may be usable), but there is still usually a slight variation in the size in bytes, which suggests to me different amounts of whitespace are just being harmlessly truncated in those cases that appear successful.
Also, it is not the gz that is being truncated but the .mbox file going into the archive.
Updated by hellcp about 3 years ago
Wow, sorry for inaction on this, I had no idea this ticket existed until I looked at snapshot review site to see why I wasn't getting notified of new releases. I will bring this up with upstream and see what can be done to resolve this
Updated by hellcp about 3 years ago
It seems that for archive that Jimmy brought up specifically, 14th of July always fails to load, so we may have too large limit on how big mails can be to be accepted on the mailing list.
As to why this currently happens, it's likely caused by the current timeout value in uwsgi. We could increase that, since archives like this are quite large and take a while to process. We also need to optimize hyperkitty in general, because it times out on much smaller pages, but that's an issue that's already reported in another ticket and needs to be addressed there.
Would it be possible to maybe use smaller increments of the archives instead of full month? Instead of trying to load https://lists.opensuse.org/archives/list/factory@lists.opensuse.org/export/factory@lists.opensuse.org-2021-07.mbox.gz?start=2021-07-01&end=2021-08-01, maybe try with https://lists.opensuse.org/archives/list/factory@lists.opensuse.org/export/factory@lists.opensuse.org-2021-07.mbox.gz?start=2021-07-01&end=2021-07-15 and https://lists.opensuse.org/archives/list/factory@lists.opensuse.org/export/factory@lists.opensuse.org-2021-07.mbox.gz?start=2021-07-15&end=2021-08-01 and combine the results, while we work on a more permanent solution.
Updated by pjessen about 3 years ago
- Related to tickets #103911: Mailing list archives broken? added
Updated by crameleon 6 months ago
When requesting the problematic query, the following is printed upon abortion:
[2024-07-14 11:17:45 +0000] [53809] [ERROR] Error handling request
Traceback (most recent call last):
File "/usr/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 183, in handle_request
for item in respiter:
File "/usr/lib/python3.11/site-packages/hyperkitty/views/mlist.py", line 391, in stream_mbox
yield compressor.compress(email.as_bytes())
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/hyperkitty/models/email.py", line 200, in as_bytes
msg = self.as_message()
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/hyperkitty/models/email.py", line 190, in as_message
msg.add_attachment(attachment.get_content(), maintype=mimetype[0],
File "/usr/lib64/python3.11/email/message.py", line 1184, in add_attachment
self._add_multipart('mixed', *args, _disp='attachment', **kw)
File "/usr/lib64/python3.11/email/message.py", line 1172, in _add_multipart
part.set_content(*args, **kw)
File "/usr/lib64/python3.11/email/message.py", line 1199, in set_content
super().set_content(*args, **kw)
File "/usr/lib64/python3.11/email/message.py", line 1129, in set_content
content_manager.set_content(self, *args, **kw)
File "/usr/lib64/python3.11/email/contentmanager.py", line 37, in set_content
handler(msg, obj, *args, **kw)
TypeError: set_text_content() got an unexpected keyword argument 'maintype'
Updated by crameleon 6 months ago
- Status changed from New to In Progress
- Assignee set to crameleon
The issue is when an attachment represents a string instead of a bytes object:
2024-07-14 11:22:20,102 54137 hyperkitty.models.email <class 'bytes'>
ERROR 2024-07-14 11:22:20,124 54137 hyperkitty.models.email <class 'str'>
I will prepare a patch.
Updated by crameleon 6 months ago
Submitted upstream: https://gitlab.com/mailman/hyperkitty/-/merge_requests/633
Updated by crameleon 6 months ago
- Status changed from In Progress to Blocked
Submitted downstream: https://build.opensuse.org/request/show/1187378
Updated by crameleon 5 months ago
- Status changed from In Progress to Blocked
The issue was found to be with one email containing a 0B attachment. Patch was merged upstream returning a bytes instead of a string object in this situation: https://gitlab.com/mailman/hyperkitty/-/merge_requests/635. New patch submitted downstream: https://build.opensuse.org/request/show/1188489.