Mailman is a very popular mailing list manager.
Unfortunately, one feature Mailman doesn't provide is searching its archives. Note that although Mailman can be integrated with Google search, this method is discouraged - it normally takes several weeks until Googlebot crawls your new posts.
However, Mailman can be easily integrated with existing open-source indexing systems, like Swish-e, which we will document here.
This HOWTO assumes:
- both Mailman and Swish-e will be installed on the same machine,
- your Mailman setup/list is already running,
- Swish-e is already installed (if your system doesn't provide packaged version of Swish-e, see Swish-e documentation for more info)
- this setup reflects Mailman/Swish-e installation on http://lists.wpkg.org
Swish-e uses Perl; Mailman uses Python. This means that we probably need to tell Apache how to parse
.cgi files. Your Apache needs
mod_perl. To parse Perl .cgi files, you need to add
+ExecCGI to Mailman directory options. Apache also needs to know that it has to parse some or all HTML files (later, you will decide if you want to have the search form on all Mailman pages, or just on
See http://httpd.apache.org/docs/2.2/howto/ssi.html#configuring for more info on configuring Server Side Includes (SSI) in Apache.
- The first step is to create a working directory for Swish-e. This is where you will keep its configuration and index files.
- Next, create a config file. Save it as
/srv/www/vhosts/wpkg.org/swishdirectory. For a full description of used directives, see http://www.swish-e.org/docs/swish-config.html.
- Now we can index our Mailman archive
Web search configuration
Indexing is done - now, it's time to set up a search on your Mailman pages.
- First, copy
- Then, create a config file for
swish.cgi- save it as
/srv/www/vhosts/wpkg.org/swish/swishcgi.conf- I didn't want such an advanced search form, so I've hidden some entries:
- I also didn't want to search by date - as my Mailman archive was first created out of a mbox file, the dates of .html files did not match the dates when posts were sent to the list. Here's a patch that does it, basically just commenting out the date range fields:
(Note that if you do not comment that code out, and date options still don't show up on the search page, you may be missing the Date::Calc module required by swish.cgi - see http://swish-e.org/docs/swish.cgi.html - you can test this from the command line with perl -e 'require Date::Calc' which should have no output.)
swish.cgineeds to know where to look for the configuration file - open it with your favourite editor, and change the location of
- Now it's time to see if the search works - open
swish.cgiin a web browser and search some terms. For me, the address is http://lists.wpkg.org/swish.cgi. Depending on your server configuration, it can be located somewhere else, i.e. under http://server/mailman/swish.cgi
Integrating the search with Mailman's pages
If search works - congratulations. Now it's time to integrate the search form with some of the Mailman's pages. We will do it by a simple Server Side Include (SSI) -
<!--#include virtual="/swish_mm.cgi" --> added to Mailman pages. Did you notice
swish_mm.cgi here? It is there for a reason.
swish.cgi generates a whole HTML page, that is, with all <html>, <body> etc. tags. As Mailman's pages already include these tags we have to make sure these tags are not added by
swish_mm.cgi and make these changes:
- As you probably noticed, you will also need to edit one more file (well make a copy of the original first, just in case):
- In this patch, we comment out all unnecessary <body>, <html> etc. tags, change its Perl name to SWISH::TemplateDefault_MM and send results to a separate
swish.cgi(we can't use
swish_mm.cgi, it doesn't contain <body>, <html> etc. tags):
Now it's time to edit Mailman template files so that Mailman pages include a search form. If you just want a search form on
date.html, you need to add
<!--#include virtual="/swish_mm.cgi" --> to three Mailman templates:
archtocnombox.html. It is very important that you do NOT edit the templates in
MAILMANDIR/templates/en (because you would lose your changes later if you upgraded Mailman). Instead, create a directory at
MAILMANDIR/templates/site/en, copy the templates you want to update to this new directory and edit the site files.
If you use the default English language in Mailman, you will find these files in
templates/en directory of your Mailman installation. The change is simple - an example below:
If you want to have a search form also on every Mailman's archived message page, do a similar change in
Once you have made the changes to the templates, you MUST restart the Mailman process, since ArchRunner keeps a cache of the templates.
Recreating Mailman's archive
If you already have a list archive, you will need to recreate it to apply all these changes. To do this, you need a mbox file which is created by Mailman. An example - below:
If you executed the above command as root, make sure to restore proper permissions:
That's it! Now check if search is integrated with your Mailman pages.
Adding crontab entries
You will want to crawl your archive periodically. Also, if you only want to have the search form on
date.html pages, you have to add execute bit to them.
How often you do it will depend on the size of your list and the traffic it gets.
I run these two commands every hour (note - this is not crontab entry, just commands you need to start with crontab):
Also, you will probably need to add such entries to default Mailman's cron file - otherwise: