Project

General

Profile

Actions

tickets #77278

closed

Redirect lists.opensuse.org to /archives/

Added by hellcp over 3 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Email
Target version:
-
Start date:
2020-11-10
Due date:
% Done:

0%

Estimated time:

Description

We have migrated to mailman3, we should have the homepage set to mailman now

Actions #1

Updated by cboltz over 3 years ago

  • Category set to Email
  • Status changed from New to Resolved

Done, / now redirects to /archives/ (using a RedirectMatch ^/$ on baloo) - and the "old" archives are still reachable if you (or $search_engine) know where to find them.

Long(er) term, we probably should move over the old archives to the mailman3 VM and shut down baloo.

Actions #2

Updated by pjessen over 3 years ago

  • Status changed from Resolved to New
  • Private changed from Yes to No

Long(er) term, we probably should move over the old archives to the mailman3 VM and shut down baloo.

I thought the archives were already moved over? Looking at e.g. factory.lists, that is certainly the case.
Looking at the apache logs on baloo, ignoring most bots and wget, there are still many accesses to /listname/..... apparently there are people out there who like to read the lists by reading the archives, and they are confused because they are no longer updated.

I am thinking of creating "permanently moved" redirects from /oldlistname/whatever to https://lists.opensuse.org/archives/list/newlistname@lists.opensuse.org/ ? maybe with a stop-over on a page explaining the move ?

Actions #3

Updated by pjessen over 3 years ago

What about the RSS feed, what do we do about that one?

Actions #4

Updated by cboltz over 3 years ago

pjessen wrote:

Long(er) term, we probably should move over the old archives to the mailman3 VM and shut down baloo.

I thought the archives were already moved over? Looking at e.g. factory.lists, that is certainly the case.

Yes and no ;-)

The content of the archives was indeed moved over to mailman3, but it's available under a completely new URL.

Looking at the apache logs on baloo, ignoring most bots and wget, there are still many accesses to /listname/..... apparently there are people out there who like to read the lists by reading the archives, and they are confused because they are no longer updated.

Are those people really accessing the lists.o.o/$listname/ overview pages, or are they accessing specific mails like https://lists.opensuse.org/opensuse-factory/2020-11/msg00009.html ?

There are lots of places out there that still link to the old URLs like (random example) https://lists.opensuse.org/opensuse-factory/2020-11/msg00009.html - these places can be links included in mailinglist posts, search engine results, bookmarks, ... and I'd prefer not to break all the links that exist out there. Redirecting each and every single mail to the new mailman location would be very hard, therefore I'd recommend to keep the "old" archives available under their well-known location.

(Technical sidenote: currently haproxy splits lists.o.o between baloo and mailman3 based on the URL, so either we host the old archives on mailman3 to simplify that, or we host them on narwal* (aka static.o.o) to keep the mailman3 setup clean. Both variants have pros and cons, and so far I'm undecided which way is better.)

I am thinking of creating "permanently moved" redirects from /oldlistname/whatever to https://lists.opensuse.org/archives/list/newlistname@lists.opensuse.org/ ? maybe with a stop-over on a page explaining the move ?

Doing those redirects for the overview page would be fine, but please ensure that the deep links to specific mails keep working.

Another option would be to add a big warning box with a link to the new location at the top of each overview page (somewhat similar to the "deprecation note" on lizards.o.o). We can probably half-automate that for all lists by using a patch (ideally updated with the new list location for each list).

For the RSS feed - maybe replace it with something similar to https://status.opensuse.org/rss ?

Thinking about things to keep from baloo: It's probably a good idea to keep all subscribe logs, just in case someone claims we spam him/her without having consent.

Actions #5

Updated by pjessen over 3 years ago

cboltz wrote:

pjessen wrote:

Long(er) term, we probably should move over the old archives to the mailman3 VM and shut down baloo.

I thought the archives were already moved over? Looking at e.g. factory.lists, that is certainly the case.

Yes and no ;-)

The content of the archives was indeed moved over to mailman3, but it's available under a completely new URL.

Yep.

Looking at the apache logs on baloo, ignoring most bots and wget, there are still many accesses to /listname/..... apparently there are people out there who like to read the lists by reading the archives, and they are confused because they are no longer updated.

Are those people really accessing the lists.o.o/$listname/ overview pages, or are they accessing specific mails like https://lists.opensuse.org/opensuse-factory/2020-11/msg00009.html ?

There aren't really enough of them to say :-) I think mostly the latter, but apparently also some that start with the overview. I think we are talking a very small minority, certainly in comparison to the bots.

There are lots of places out there that still link to the old URLs like (random example) https://lists.opensuse.org/opensuse-factory/2020-11/msg00009.html - these places can be links included in mailinglist posts, search engine results, bookmarks, ... and I'd prefer not to break all the links that exist out there. Redirecting each and every single mail to the new mailman location would be very hard, therefore I'd recommend to keep the "old" archives available under their well-known location.

Yeah, in a way I would like that too, but it would mean keeping some 47Gb of archive and some software that is growing increasingly old. I think it might be sensible to look at how much use we really have of the old archives and then decide ?

(Technical sidenote: currently haproxy splits lists.o.o between baloo and mailman3 based on the URL, so either we host the old archives on mailman3 to simplify that, or we host them on narwal* (aka static.o.o) to keep the mailman3 setup clean. Both variants have pros and cons, and so far I'm undecided which way is better.)

Maybe keep in mind that we get occasional requests to have stuff deleted from the archives ... (I have sofar not done any deletions).

I am thinking of creating "permanently moved" redirects from /oldlistname/whatever to https://lists.opensuse.org/archives/list/newlistname@lists.opensuse.org/ ? maybe with a stop-over on a page explaining the move ?

Doing those redirects for the overview page would be fine, but please ensure that the deep links to specific mails keep working.

If we ignore the bots, and the use of the last 30 days is minimal, maybe reconsider ?

Another option would be to add a big warning box with a link to the new location at the top of each overview page (somewhat similar to the "deprecation note" on lizards.o.o). We can probably half-automate that for all lists by using a patch (ideally updated with the new list location for each list).

It should certainly be announced, yes. I think I would prefer the plain redirect, KISS principle.

For the RSS feed - maybe replace it with something similar to https://status.opensuse.org/rss ?

I only brought it up because I saw rss requests, but I have not looked into it.

Thinking about things to keep from baloo: It's probably a good idea to keep all subscribe logs, just in case someone claims we spam him/her without having consent.

Heh, those are long gone anyway. mlmmj.operation.log is usually only few months old. mail logs, also a few months.

Actions #6

Updated by cboltz over 3 years ago

pjessen wrote:

cboltz wrote:

There are lots of places out there that still link to the old URLs like (random example) https://lists.opensuse.org/opensuse-factory/2020-11/msg00009.html - these places can be links included in mailinglist posts, search engine results, bookmarks, ... and I'd prefer not to break all the links that exist out there. Redirecting each and every single mail to the new mailman location would be very hard, therefore I'd recommend to keep the "old" archives available under their well-known location.

Yeah, in a way I would like that too, but it would mean keeping some 47Gb of archive and some software that is growing increasingly old.

I'd guess we don't need to keep the old software (since the old archives won't get updated anymore) - "just" the archives as static HTML.

Maybe keep in mind that we get occasional requests to have stuff deleted from the archives ... (I have sofar not done any deletions).

Right, that makes things more painful :-( - if you start to delete something ;-)

If we ignore the bots, and the use of the last 30 days is minimal, maybe reconsider ?

We'll still break lots of existing links out there (even if not used/clicked too often), so if the only "problem" is disk space, I'd really prefer to keep the old archives.

Actions #7

Updated by pjessen over 3 years ago

cboltz wrote:

pjessen wrote:

cboltz wrote:

There are lots of places out there that still link to the old URLs like (random example) https://lists.opensuse.org/opensuse-factory/2020-11/msg00009.html - these places can be links included in mailinglist posts, search engine results, bookmarks, ... and I'd prefer not to break all the links that exist out there. Redirecting each and every single mail to the new mailman location would be very hard, therefore I'd recommend to keep the "old" archives available under their well-known location.

Yeah, in a way I would like that too, but it would mean keeping some 47Gb of archive and some software that is growing increasingly old.

I'd guess we don't need to keep the old software (since the old archives won't get updated anymore) - "just" the archives as static HTML.

Ah yes, you're right, they are static, I forgot.

Maybe keep in mind that we get occasional requests to have stuff deleted from the archives ... (I have sofar not done any deletions).

Right, that makes things more painful :-( - if you start to delete something ;-)

Hmm, maybe it's a Good Thing (R) I haven't even tried - rebuilding the archives afterwards would change the article numbering and screw up the links ....

Actions #8

Updated by hellcp about 3 years ago

Assuming we manage to recover all of the emails from opensuse-2007-06, we will be missing the following emails:

Keep in mind! opensuse-board and opensuse-maintenance were migrated to private mailing lists (basically merged with board and maintenance respectively), so we don't really have a place to redirect those, but they are emails that seem like they weren't really meant to be public anyway, but are for some reason.

I didn't bother with opensuse-test archives, since that's really not a list that needs to have complete set of archives anyway, opensuse-pt and limal-devel archives were corrupted/missing, and if you have a look at opensuse-ru and opensuse messages, you will understand why I didn't bother with those.

Actions #9

Updated by hellcp about 3 years ago

I added miscs.rewritemap containing those addresses to redirect in the best place we can. I submitted a change to salt to reflect adding miscs.rewritemap too https://gitlab.infra.opensuse.org/infra/salt/-/merge_requests/482

Actions #10

Updated by hellcp about 3 years ago

We could technically switch over now and retire baloo

Actions #11

Updated by pjessen about 3 years ago

hellcp wrote:

We could technically switch over now and retire baloo

Umm, I think not. Still no headers to be seen in the hyperkitty version.

Actions #12

Updated by hellcp about 3 years ago

Which headers would you like to see? Keep in mind that HyperKitty stores every email as processed database entry and not an mbox, so for some headers that data is either not currently available or not exposed yet

Actions #13

Updated by pjessen about 3 years ago

hellcp wrote:

Which headers would you like to see? Keep in mind that HyperKitty stores every email as processed database entry and not an mbox, so for some headers that data is either not currently available or not exposed yet

I need all of the headers. Received, DKIM, Spamassassin - all of them. Only having the contents available will make debugging impossible.
I want the whole email, just like we used to have. How Hyperkitty stores them is none of my concern :-)

Actions #14

Updated by hellcp about 3 years ago

pjessen wrote:

I need all of the headers. Received, DKIM, Spamassassin - all of them. Only having the contents available will make debugging impossible.
I want the whole email, just like we used to have. How Hyperkitty stores them is none of my concern :-)

I will enable the prototype archiver then, which will dump every email to /var/lib/mailman/archives/prototype. I suspect that will fill up the space on the vm fairly quickly though. Should I enable that archive for mailing lists that currently don't get archived at all as well? Or are the current public lists enough?

Actions #15

Updated by hellcp about 3 years ago

hellcp wrote:

I will enable the prototype archiver then, which will dump every email to /var/lib/mailman/archives/prototype.

It's enabled now, it includes all the headers and collects mail from all the same lists hyperkitty archives. You can find the mails in /var/lib/mailman/archives/prototype/$list@address.org/new/

Actions #16

Updated by hellcp over 1 year ago

  • Status changed from New to Resolved

I will close this now, since all of the issues seem to be addressed by now

Actions

Also available in: Atom PDF