factory mailing list archives incomplete
File ends mid email instead of a complete mailbox archive.
Block Tumbleweed reviewer.
- Subject changed from mailing list archives incomplete to factory mailing list archives incomplete
I can confirm, the factory list mbox archive for July 2021 stops at around 14 July.
Looking at e.g. https://firstname.lastname@example.orgemail@example.com?start=2021-07-01&end=2021-08-01 or https://firstname.lastname@example.orgemail@example.com?start=2021-07-01&end=2021-08-01, they both seem to be complete. Maybe this only affects the factory list? I expect those mbox.gz exports are created on-demand, so that is probably the place to start looking.
#3 Updated by blackbrook 6 months ago
I would like to emphasize that what is going on is non-deterministic which should be a big clue as to the cause. If you run that curl command multiple times and look at the size of the resultant .gz file (in bytes), you will see that it is different each time. Some ranges may produce a file that looks complete (and may be usable), but there is still usually a slight variation in the size in bytes, which suggests to me different amounts of whitespace are just being harmlessly truncated in those cases that appear successful.
Also, it is not the gz that is being truncated but the .mbox file going into the archive.
It seems that for archive that Jimmy brought up specifically, 14th of July always fails to load, so we may have too large limit on how big mails can be to be accepted on the mailing list.
As to why this currently happens, it's likely caused by the current timeout value in uwsgi. We could increase that, since archives like this are quite large and take a while to process. We also need to optimize hyperkitty in general, because it times out on much smaller pages, but that's an issue that's already reported in another ticket and needs to be addressed there.
Would it be possible to maybe use smaller increments of the archives instead of full month? Instead of trying to load https://firstname.lastname@example.orgemail@example.com?start=2021-07-01&end=2021-08-01, maybe try with https://firstname.lastname@example.orgemail@example.com?start=2021-07-01&end=2021-07-15 and https://firstname.lastname@example.orgemail@example.com?start=2021-07-15&end=2021-08-01 and combine the results, while we work on a more permanent solution.