My Ultimate Email Configuration

Travis H.

26 Aug 2009

Abstract

This paper is for system administrators who are also email power users. It is an attempt to document everything I’ve done to deal with a massive volume of mail; over 300k emails and 62 mailing lists. My spam rate is also quite acceptable; I count approximately one spam per 5000 emails. I will attempt to explain why it is the way it is, and why you will want to set your own system up this way. I will also attempt to present the material in a way that aids the reader in setting up a similar configuration.

1 Related Work

2 Domain Name

2.1 Why You Don’t Want to Use Your ISP’s Domain

Many of my relatives who are not as technically inclined use the default mailbox that their Internet Service Provider gives them. This is an acceptable solution for some, but suffers from a number of drawbacks from my perspective:
  1. If you change ISPs, you have to distribute your new email address to everyone. Undoubtedly, many will continue to use your old email address, and you will never see that mail. Also, you’ll need to update your email address at every mailing list that you subscribed to. If you’ve forgotten your password to the mailing list software, you may be unable to do this, since it will only send that password to the email address it has on file - the one you’re likely no longer using. If you didn’t move all your mailing list subscriptions over to the new address before cancelling your old ISP accounts, you’re SOL.
  2. ISPs primarily provide Internet access, but many vary in their ability to run mail servers. I have had ISPs whose email systems regularly dropped email, or whose mail servers were unavailable, sometimes days at a time.
  3. ISPs sometimes change the rules on their email accounts. My parents had two email accounts with their ISP. They performed some system upgrades, and only migrated one of the accounts over; the secondary account stopped working. When the owner of this account complained, they effectively said that secondary email accounts were no longer part of their offering; he was SOL, and had to migrate over to gmail.
  4. ISPs vary in the amount and type of spam filtering that they do. In some cases, they may maintain their own blacklists, but do it poorly. When I got my own servers, the IPs I was given had previously belonged to other systems. Apparently some of these systems were used to send spam (perhaps not voluntarily; spammers routinely hack into systems to use them to send spam). This meant that my IP address was on the blacklist of several ISPs, and I had to go through a manual process to remove it. In some cases, the information they gave about how to get off the blacklist was out of date and no longer worked, so I simply couldn’t email certain recipients.
  5. The most effective spam filtering options involve identifying and stopping spam during the SMTP transaction. Once it gets to you, you only have a few options; you can delete the spam silently, which may cause you to silently miss important email, you can send a bounce back to the sender, who is probably some innocent schmuck who has been joe jobbed (http://en.wikipedia.org/wiki/Joe_job), or you can put it in some spam folder that you hardly ever read. You don’t have the ability to reject the spam during the SMTP session, which is a far superior solution. More on this later.

2.2 Gmail

Some people use gmail accounts to solve this problem, and that may be a fine solution for them. I find that solution unsatisfactory; it means Google owns my email, and it effectively means that they can do anything they want to with it (ownership being nine-tenths of the law and all that). Also, I find that I can do much more sophisticated filtering of email by running my own mail filters; more on this later. And more importantly, there are transfer-time anti-spam countermeasures that you can put in place, like graylisting, that are much more effective than filtering after it has been accepted.

2.3 Picking A Domain Name

This is part of the process that is a little tricky. When I pick a domain name, I’m picking something I’ll probably be using for the rest of my life. I created a list of cool words, and tried checking for open domains that used those words. I used the command-line WHOIS tool to find out if something was registered, but it appears they have started to rate-limit WHOIS lookups to prevent spammers from harvesting them for email addresses. However, domain registrars now have search engines on their sites that allow you to check for availability of domain names in the common top-level domains (.com, .net, .org, etc.).
I suggest a few properties for your domain name:
memorable It had to be something easy to remember. It turns out that for anyone who has seen more than a few Star Trek episodes, “subspacefield” is easy to remember.
easy to spell I didn’t want a funny spelling of a common word, because if I had to tell someone my domain name orally, I’d have to spell it for them, and they still may get it wrong.
common TLD Everyone on the Internet is familiar with .com as a top level domain, and .net is also quite common, but as you stray from that, people seem to have a little trouble believing they got your email address right. I find that techies don’t have a problem with .org, but some lay people do. I can only imagine that .info and other TLDs would be even more problematic.
short I didn’t think of this when I picked this domain name, but often times you’ll be filling out forms (either in paper or the online kind) where they limit your email address to a certain number of characters, and if you’re over that, you’re SOL. As such, “subspacefield.org” takes up a few too many.

2.4 Registering Your Domain

There are quite a few registrars, and it would seem that they are all alike; however, that is not the case. For example, GoDaddy (http://www.godaddy.com/) is a popular and inexpensive registrar in the US, but (if I recall correctly - please feel free to correct me), they could decide against you owning the domain if it violated a US trademark, and you would lose the rights to the domain name. For this reason, some people prefer international registrars such as Gandi (http://www.gandi.net/), who some claim give you more property rights over your domain name.
But I have a further requirement; when using a normal domain registrar, the information you give about technical and administrative contacts (name, email, etc.) is stored in the WHOIS database for anyone to query. And although I am not a lawyer, I seem to recall a law that may have passed that requires the information for the domain to be correct.
So what do you do if you don’t want your personal information in the WHOIS database, but you want to remain legal? Well, as far as I know, you have two options. The first is Domains By Proxy (http://www.domainsbyproxy.com/); you can go to their site and read about their offering. However, when I was researching them, I seem to recall a case where someone objected to some content being offered on the web, sent them a letter (not a subpoena), and they disclosed their customer information.
So I opted for KatzGlobal (http://www.katzglobal.com/hosting/anonymous-domains.html). This appears to be an offshore company, not subject to US laws. They essentially hold the domain while you control it. This seems to work rather well; however, I do have a few problems with their implementation. First off, they have a GPG key on their web site so you can encrypt your communication with them - a definite plus. However, the response I got back, which quoted part of my message, was unencrypted - which sort of defeats the point of using encryption in the first place. Secondly, it sometimes takes a few emails to make a change to your domain - their customer service can be a bit disorganized at times.
Since I intend to keep my domains forever, I generally register them for five years at a time. This saves me the headache of having to remember to renew them every year, and the possible catastrophe of losing the domain name.

3 Servers

Now it comes time to pick out servers on which to host your domains. If you want to run your own DNS, it is recommended that you have at least two geographically redundant servers (see 4↓). You will also want at least one backup MX server, in case one goes down. It seems prudent to have it be located in a different geographical area as well (see 5↓).
When renting a server, you generally have your choice of:
shared servers where you share the machine with other users, do not have root access, and thus can’t modify the system settings. The upside of this is that they are cheap.
dedicated servers where you basically “own” the box, have root, can change the root password to lock the system support team out, and so on. These tend to be more expensive.
colocated servers are basically dedicated servers where you provide the hardware. These tend to be more troublesome, because if there’s a hardware failure, it’s up to you to ship them a replacement, whereas with dedicated servers they usually have spare parts on hand.
virtual servers are a new emerging middle ground, where you are root on a given virtual machine, and these virtual machine co-exists on the same host OS as other virtual machines. I haven’t looked at prices on these, but I would guess they fall between shared and dedicated servers in price.
You’ll also have to decide what level of service you want. You can get self-managed servers, where you’re expected to be the sysadmin (except in extreme cases like hardware failures or failure to boot), or pay more for managed servers. At these higher support levels, the support team will do more for you, but you’ll have to give them root level access to the box. There are also limits to the sphere of support for which they’ll help you - if you want them to code you a web site, or troubleshoot some unusual software, you might have to pay extra. If you need this kind of support, you might want to consider a company like Rackspace (http://www.rackspace.com/).
When picking a hosting provider, you might also want to consider these factors:
What I recommend is that you get two dedicated self-managed servers, one in the US and one in the EU.

4 Domain Name Servers

Mail is routed based on domain names. When an email is sent off to an SMTP server, the server does a DNS lookup on the destination address’s domain name (i.e. your domain name). It actually does this first by querying the top-level domain (TLD), such as .com, and then asking for the authorities for your domain name. Thus, when you register your domain, you often want to specify your DNS servers. You can do this by IP, in which case you provide your registrar with the two IP addresses you’ll be using, or you can do it by name, such as “ns1.yourdomain.com” and “ns2.yourdomain.com”. This presents a bit of a chicken-and-egg problem, because how is it supposed to find IP addresses for those machines to query for your domain name servers? Well, fortunately, the TLD name servers can provide something called “glue records”, where they will look up those address (A) records and return them with the request for your DNS servers. I’m a bit unclear on how this works, so perhaps a reader could email me to clue me in.
Now, the MTA needs to look up Mail Exchange (MX) records for your domain. This is how you tell other MTAs which IPs it needs to deliver email to. Note that MX records are not supposed to map to CNAMEs; they should map to IP addresses.
Obviously, if the MTA can’t contact your domain name servers, it can’t look up these MX records and won’t be able to deliver the email - it will “bounce” back to the recipient.
This is highly undesirable, and there are two solutions for it:
I prefer the second option for a number of reasons:
  1. If you ever need to make the serial numbers in your domain lower, it is easier for you to clear out the slave server’s cached zones and force it to reload your zone with the new, lower zone serials.
  2. You control the servers, so you can protect them against being hacked and having them serve up bogus data for your domain.
  3. You can run tinydns instead of BIND on your name servers.
There are basically two main options for DNS server software; you can run BIND or DJBDNS (tinydns):

4.1 Berkeley Internet Name Daemon

BIND is the de-facto standard software for serving up domain names. It is essentially a fully-featured server, capable of being configured in a number of ways. It also acts as a DNS resolver, so your DNS client software can use it to look up arbitrary domain names.
  1. Fully featured, including dynamic DNS updates, support for DNS-SEC
  2. Requires special configuration to prevent it from being used as a recursive resolver by external hosts.
  3. Big code base; potentially slow or buggy.
  4. Allows for symbolic references, such as CNAMEs
  5. Uses an obscure and error-prone syntax for zone files.

4.1.1 Zone File Syntax

Each zone file has a “serial number”, which must always increase monotonically if you want your slave servers to stay in sync. The common convention is to use YYYYMMDDXX as the format, with numerical year, month, day, and a two digit per-day serial number. This works fine unless you update your zone file more than 99 times in a day.
The most common problem when updating the zone files is that you forget to update the zone serial number.
I have a solution for this; I have a tool that generates the header for a zone file, and it uses the current timestamp (seconds since epoch) as the serial number for the zone. I borrowed this idea from TinyDNS.

4.2 TinyDNS

This is part of Dan Bernstein’s djbdns suite. It doesn’t have nearly the features that BIND has, but it has some significant differences from BIND:
  1. It does only one thing, serve authoritative queries, and it does it quickly and easily.
  2. It uses a much more sensible format for zone files.
  3. It does not support dynamic DNS updates
  4. It does not support IPv6 addresses (AAAA records)
  5. It does not support CNAMEs; every record must map to an IP address. In theory, you could use some sort of pre-processor to do this for CNAMEs that map to addresses you control.
  6. It automatically creates reverse zones for you
  7. It does not act as a recursive resolver; there is a seperate program for that (dnscache). Since both recursive and authoritative queries are sent to the same port, this means you must run dnscache on a different IP address than tinydns. This is often done with IP aliases.
  8. It generates the zone serial numbers automatically based on the time at which the zone database is compiled into its on-disk format.
  9. Dan has a very nice architecture for how his programs are run as daemons, but it is different than the OSes. Dan is insistent that nobody modify his programs to run in the more conventional ways (e.g. /etc/init.d scripts), because he doesn’t want to have to support and answer queries in the presence of multiple possible configurations.

5 Mail Transport Agent

The Mail Transport Agent, or MTA, is the program that speaks the Simple Mail Transfer Protocol (SMTP) to receive and send email over the Internet. When a person sends an email, it goes to a MTA, and that MTA does domain name lookups until it gets an appropriate list of MX record for the destination. MX records have “weights” assigned to them, and the sending MTA attempts to contact MTAs listed in MX records in ascending order of weight. If it runs out of MTAs, it will bounce the message back to the recipient. For this reason, I suggest running MTAs on two geographically seperated servers (see 3↑).
You will want to have two of these for your domain, one of which is configured to handle it as a “relay domain”. That way, if the first MX server is down, it will deliver the email to the backup MX server, which will accept it and queue it for sending along to the primary MX server.
For a long time, the only MTA available was sendmail. However, sendmail has had an extremely poor security history, is difficult to configure, and thus I do not recommend it.
The two mailers which seem the most secure to me are postfix and qmail.

5.1 Qmail

This is another of Dan Bernstein’s creations. It is likely to be very secure; he has an outstanding cash bounty for the first person to find a remotely exploitable bug, and he is a well known personality in computer security circles.
There are, however, a number of complaints I have heard from others:
  1. In the log files, each email is assigned a unique identifier, but that identifier is only unique while the email is being processed. That is, you may be looking through the logs and find the identifier you are interested in, and when grepping through the logs for that identifier, you usually find that it is reused for another email message processed at a different time. This makes analyzing mailer failures more challenging than it needs to be.
  2. I have heard that Dan has not updated the code base in a long period of time; as a result, there is a family of patches out there to get new functionality from it that are not integrated into the code base.
  3. In some configurations (e.g. when run under Cpanel), qmail is invoked directly rather than by the tcpd wrapper - this means that no IP addresses are logged. This can make analyzing failures much more challenging than it needs to be. Note that this is a problem in configuration and not with qmail itself.
  4. Some of my co-workers have found it difficult to troubleshoot. For example, recently one of my co-workers found that it was not processing the aliases file, but he could not figure out how to troubleshoot why.
I do not use qmail on my production systems so am not in a good position to recommend it, as I do not have sufficient experience running it.

5.2 Postfix

Postfix was originally written and is currently maintained by Weitse Venema. It has a number of features that I find very attractive:
  1. It is very easy to configure (especially by comparison to sendmail!)
  2. It requires very little configuration to get it working - “it just works”.
  3. It is very secure; Weitse Venema is well-known in the security community and has strong secure coding practices. Instead of one large, privileged daemon that does everything, postfix consists of a set of cooperating processes each with limited privileges and duties.
  4. It is available as a package for most OSes
  5. When installed from package, it plays well with system startup scripts, launching itself in a manner consistent with the rest of the OS.
  6. It has a large number of anti-spam checks that it can do during the SMTP conversation, and they are easy to configure. It is also possible to whitelist certain hosts which may not otherwise pass the spam checks.
TODO: Give a link to a sample postfix configuration that others can use

5.3 Address Extensions

If you have the following entry in your postfix main.conf, you can do some interesting things:
recipient_delimiter = +
Although the implications of how this interacts with other features is far beyond what I intend to describe here, it does have one important function. Namely, if someone sends email to “[email protected]”, it will ignore the delimiter and what follows (in the local-part), so that it delivers to the user travis on subspacefield.org. This has a number of interesting uses:
First, when I give my email address to a web site foo.com, I tell them my email address is “[email protected]”. Now, if I receive email to that address, I know it is from that web site. If it is unrelated spam, I know which web site sold my email address to spammers. This is similar to the old trick of giving companies different middle names to determine if they sold your account information to mass mailers.
Second, when I sign up for a mailing list about python, for example, I sign up using the email address “[email protected]”. Then, in my procmail config (see 7↓), I have it filter email sent to that address to a particular folder. That way, I can properly filter emails, even if they were blind carbon copied (BCC) to a list, or if it’s a response to one of my emails to the list; they all go to the list folder. I sometimes wonder if the responses shouldn’t go into my INBOX, but I am definitely sure that I want the former.
However, this has one drawback; when sending email to such a list, I must change my From: line to have the appropriate “+ml-python”, or else the mailing list software will treat my email as if it came from someone not subscribed to the list. TODO: I should really automate this in my mutt configuration. I have it partially automated; when replying to an email from the list, I have a script called muttedit that will figure out what email address was the recipient, and use that in the From line.
If I had to do it again, I would choose a recipient delimiter other than +, because several poorly-coded web applications seem to think that “+” is not a valid character in an email address. It might be more useful to use “.” or “-” instead.

5.4 Graylisting

I believe graylisting is the single most effective anti-spam technique. There are a few ways that you can implement graylisting:

5.4.1 OpenBSD’s spamd

OpenBSD has an amazing graylisting functionality. There are a number of ways to configure it; let me attempt to explain the default mode.
Essentially what it does is the first time a sending MTA attempts to connect to the SMTP port, the packet filter redirects that communication to spamd, which is running on a higher port number (8025 on my system). No matter what happens, this program will not accept the email.
First, it “stutters” at the host, by sending one character at a time very slowly. Many spammers run software that is impatient with slow hosts, since they want to send as much spam as quickly as possible, so they may choose to disconnect at this point.
Then, after the sending MTA announces itself, and tells spamd who the message is from and to (the envelope information), the sending MTA attempts to send the email itself using the DATA command. At this point, spamd replies with:
451 Temporary failure, please try again later.
At this point, spamd will log a tuple containing certain information about this email (sending IP, HELO/EHLO information, envelope from, envelope to) to a database on the file system. However, there is nothing the sending MTA can do to force it to accept the email. Eventually the MTA will quit the connection. In the case of spammers, they usually never attempt it again.
But legitimate MTAs will attempt to re-send the email periodically. After passtime minutes (default of 25), when it connects again, trying to send the same email, spamd will notice that it has been waiting patiently, and will whitelist it by loading its IP address in a table that is used in a packet filter (pf) rule that allows access to the real MTA.
Thus, the next time it connects, it will connect to the real MTA and deliver the message. And its IP address stays in the whitelist table, so future connections do not go through spamd again, bypassing the 25 minute delay in receiving an email.

5.4.2 Postgrey

There is a Debian package called “postgrey” which interacts with postfix and achieves essentially the same effect as OpenBSD’s spamd, but without using the packet filter.

6 Mailbox Format

Once you start using Maildir, you’ll never go back. I highly recommend it over the old Berkeley mbox format.
I have a small script I use to make Maildir-format mailboxes, called mkmd.

7 Mail Delivery Agent

To properly pass all the information that postfix knows about to procmail, I use the following postfix configuration line:
mailbox_command = /usr/local/bin/procmail -t -a “$EXTENSION” -a “$USER” -a “$DOMAIN” -a “$LOCAL”
I like to keep my $HOME/.procmailrc rather simple; it looks like this:
PATH=/bin:/usr/bin:/usr/local/bin
MAILDIR=$HOME/Maildir
LOGFILE=$HOME/.maillog
DEFAULT=$MAILDIR/INBOX/
INCLUDERC=$HOME/.procmailrc.local
INCLUDERC=$HOME/.procmailrc.test
Then, in my $HOME/.procmailrc.local, I have these lines to capture the values back into variables:
EXTENSION=”$1”
USER=”$2”
DOMAIN=”$3”
LOCAL=”$4”
...
custom stuff here
...
INCLUDERC=$HOME/.procmailrc.mlists
That file included on the last line is how I filter my mailing list traffic into mailboxes. It has several stanzas of the following form:
:0
* ? test “$EXTENSION” = “ml-python”
python/
I autogenerate these using a python script mkmlists (see 9↓).

8 Mail User Agent

9 mkmlists

When I subscribe to a mailing list, I need to do several things
I do this automatically using a python script I have called mkmlists, which takes an input file (~/.mlists) of the following form:
ml-nanog nanog [email protected] [email protected]
This says that email to “travis+ml-nanog” gets filtered to the nanog folder, and that there two addresses for the list, and that ml-nanog is an alias for the first address for the mailing list (notably, [email protected]).

10 Mail Search Engine

11 IMAP Server

Needed if you want a graphical email client. To avoid password-guessing attacks, I make it only available over SSH/VPN.

12 Graphical Email Client

I don’t usually need graphics but every once in a while relatives send me pictures, so I fire up one and access my inbox over IMAP over SSH/VPN.