My Ultimate Email Configuration
Travis H.
26 Aug 2009
Abstract
This paper is for system administrators who are also email power users.
It is an attempt to document everything I've done to deal with a massive
volume of mail; over 300k emails and 62 mailing lists. My spam rate
is also quite acceptable; I count approximately one spam per 5000
emails. I will attempt to explain why it is the way it is, and why
you will want to set your own system up this way. I will also attempt
to present the material in a way that aids the reader in setting up
a similar configuration.
1 Related Work
- The Ultimate Email Setup (http://danielwebb.us/software/email/).
The author of this web page lays out some guidelines for what he wants
out of his mail system, and I share many of his goals.
- Inbox Heaven (http://putthingsoff.com/inbox-heaven/).
The author of this web page takes a completely different approach,
using gmail for everything. I have a gmail account, but it merely
forwards to a folder on my "real" system. I hardly ever read
it because it gets far too much spam. And it doesn't give nearly the
control that I wanted.
2 Domain Name
2.1 Why You Don't Want to Use Your ISP's Domain
Many of my relatives who are not as technically inclined use the default
mailbox that their Internet Service Provider gives them. This is an
acceptable solution for some, but suffers from a number of drawbacks
from my perspective:
- If you change ISPs, you have to distribute your new email address
to everyone. Undoubtedly, many will continue to use your old email
address, and you will never see that mail. Also, you'll need to update
your email address at every mailing list that you subscribed to. If
you've forgotten your password to the mailing list software, you may
be unable to do this, since it will only send that password to the
email address it has on file - the one you're likely no longer using.
If you didn't move all your mailing list subscriptions over to the
new address before cancelling your old ISP accounts, you're SOL.
- ISPs primarily provide Internet access, but many vary in their ability
to run mail servers. I have had ISPs whose email systems regularly
dropped email, or whose mail servers were unavailable, sometimes days
at a time.
- ISPs sometimes change the rules on their email accounts. My parents
had two email accounts with their ISP. They performed some system
upgrades, and only migrated one of the accounts over; the secondary
account stopped working. When the owner of this account complained,
they effectively said that secondary email accounts were no longer
part of their offering; he was SOL, and had to migrate over to gmail.
- ISPs vary in the amount and type of spam filtering that they do. In
some cases, they may maintain their own blacklists, but do it poorly.
When I got my own servers, the IPs I was given had previously belonged
to other systems. Apparently some of these systems were used to send
spam (perhaps not voluntarily; spammers routinely hack into systems
to use them to send spam). This meant that my IP address was on the
blacklist of several ISPs, and I had to go through a manual process
to remove it. In some cases, the information they gave about how to
get off the blacklist was out of date and no longer worked, so I simply
couldn't email certain recipients.
- The most effective spam filtering options involve identifying and
stopping spam during the SMTP transaction. Once it gets to you, you
only have a few options; you can delete the spam silently, which may
cause you to silently miss important email, you can send a bounce
back to the sender, who is probably some innocent schmuck who has
been joe jobbed (http://en.wikipedia.org/wiki/Joe_job), or
you can put it in some spam folder that you hardly ever read. You
don't have the ability to reject the spam during the SMTP session,
which is a far superior solution. More on this later.
Some people use gmail accounts to solve this problem, and that may
be a fine solution for them. I find that solution unsatisfactory;
it means Google owns my email, and it effectively means that they
can do anything they want to with it (ownership being nine-tenths
of the law and all that). Also, I find that I can do much more sophisticated
filtering of email by running my own mail filters; more on this later.
And more importantly, there are transfer-time anti-spam countermeasures
that you can put in place, like graylisting, that are much more effective
than filtering after it has been accepted.
2.3 Picking A Domain Name
This is part of the process that is a little tricky. When I pick a
domain name, I'm picking something I'll probably be using for the
rest of my life. I created a list of cool words, and tried checking
for open domains that used those words. I used the command-line WHOIS
tool to find out if something was registed, but it appears they have
started to rate-limit WHOIS lookups to prevent spammers from harvesting
them for email addresses. However, domain registrars now have search
engines on their sites that allow you to check for availability of
domain names in the common top-level domains (.com, .net, .org, etc.).
I suggest a few properties for your domain name:
- memorable
- It had to be something easy to remember. It turns out
that for anyone who has seen more than a few Star Trek episodes, "subspacefield"
is easy to remember.
- easy to spell
- I didn't want a funny spelling of a common word,
because if I had to tell someone my domain name orally, I'd have to
spell it for them, and they still may get it wrong.
- common TLD
- Everyone on the Internet is familiar with .com as
a top level domain, and .net is also quite common, but as you stray
from that, people seem to have a little trouble believing they got
your email address right. I find that techies don't have a problem
with .org, but some lay people do. I can only imagine that .info and
other TLDs would be even more problematic.
- short
- I didn't think of this when I picked this domain name,
but often times you'll be filling out forms (either in paper or the
online kind) where they limit your email address to a certain number
of characters, and if you're over that, you're SOL. As such, "subspacefield.org"
takes up a few too many.
2.4 Registering Your Domain
There are quite a few registrars, and it would seem that they are
all alike; however, that is not the case. For example, GoDaddy (http://www.godaddy.com/)
is a popular and inexpensive registrar in the US, but (if I recall
correctly - please feel free to correct me), they could decide against
you owning the domain if it violated a US trademark, and you would
lose the rights to the domain name. For this reason, some people prefer
international registrars such as Gandi (http://www.gandi.net/),
who some claim give you more property rights over your domain name.
But I have a further requirement; when using a normal domain registrar,
the information you give about technical and administrative contacts
(name, email, etc.) is stored in the WHOIS database for anyone to
query. And although I am not a lawyer, I seem to recall a law that
may have passed that requires the information for the domain to be
correct.
So what do you do if you don't want your personal information in the
WHOIS database, but you want to remain legal? Well, as far as I know,
you have two options. The first is Domains By Proxy (http://www.domainsbyproxy.com/);
you can go to their site and read about their offering. However, when
I was researching them, I seem to recall a case where someone objected
to some content being offered on the web, sent them a letter (not
a subpoena), and they disclosed their customer information.
So I opted for KatzGlobal (http://www.katzglobal.com/hosting/anonymous-domains.html).
This appears to be an offshore company, not subject to US laws. They
essentially hold the domain while you control it. This seems to work
rather well; however, I do have a few problems with their implementation.
First off, they have a GPG key on their web site so you can encrypt
your communication with them - a definite plus. However, the response
I got back, which quoted part of my message, was unencrypted - which
sort of defeats the point of using encryption in the first place.
Secondly, it sometimes takes a few emails to make a change to your
domain - their customer service can be a bit disorganized at times.
Since I intend to keep my domains forever, I generally register them
for five years at a time. This saves me the headache of having to
remember to renew them every year, and the possible catastrophe of
losing the domain name.
3 Servers
Now it comes time to pick out servers on which to host your domains.
If you want to run your own DNS, it is recommended that you have at
least two geographically redundant servers (see 4). You
will also want at least one backup MX server, in case one goes down.
It seems prudent to have it be located in a different geographical
area as well (see 5).
When renting a server, you generally have your choice of:
- shared servers
- where you share the machine with other users,
do not have root access, and thus can't modify the system settings.
The upside of this is that they are cheap.
- dedicated servers
- where you basically "own" the box, have
root, can change the root password to lock the system support team
out, and so on. These tend to be more expensive.
- colocated servers
- are basically dedicated servers where you
provide the hardware. These tend to be more troublesome, because if
there's a hardware failure, it's up to you to ship them a replacement,
whereas with dedicated servers they usually have spare parts on hand.
- virtual servers
- are a new emerging middle ground, where you
are root on a given virtual machine, and these virtual machine co-exists
on the same host OS as other virtual machines. I haven't looked at
prices on these, but I would guess they fall between shared and dedicated
servers in price.
You'll also have to decide what level of service you want. You can
get self-managed servers, where you're expected to be the sysadmin
(except in extreme cases like hardware failures or failure to boot),
or pay more for managed servers. At these higher support levels, the
support team will do more for you, but you'll have to give them root
level access to the box. There are also limits to the sphere of support
for which they'll help you - if you want them to code you a web site,
or troubleshoot some unusual software, you might have to pay extra.
If you need this kind of support, you might want to consider a company
like Rackspace (http://www.rackspace.com/).
When picking a hosting provider, you might also want to consider these
factors:
- What speed is my upstream connection? These tend to be ethernet connections,
so could be 10Mbps, 100Mbps, or 1000Mbps. Most are hooked into switched
environments so this usually will be a full-duplex connection.
- What peering arrangements does this provider have for network connectivity?
The more peers, the less chance of your server being unavailable due
to a network outage.
- What is the currency exchange rate between you and the country in
which the hosting provider is located? Currently US purchasers must
pay significantly more in Euros due to the exchange rates.
- What is the legal system in the country like when it comes to running
servers? What sort of privacy can you expect, running a server there?
- What OS offerings do they have? I personally prefer OpenBSD and Ubuntu
Server. The former is more secure, more stable, more paranoid, and
the latter seems to have most software packages you would want in
their repositories.
What I recommend is that you get two dedicated self-managed servers,
one in the US and one in the EU.
4 Domain Name Servers
Mail is routed based on domain names. When an email is sent off to
an SMTP server, the server does a DNS lookup on the destination address's
domain name (i.e. your domain name). It actually does this first by
querying the top-level domain (TLD), such as .com, and then asking
for the authorities for your domain name. Thus, when you register
your domain, you often want to specify your DNS servers. You can do
this by IP, in which case you provide your registrar with the two
IP addresses you'll be using, or you can do it by name, such as "ns1.yourdomain.com"
and "ns2.yourdomain.com". This presents a bit of a chicken-and-egg
problem, because how is it supposed to find IP addresses for those
machines to query for your domain name servers? Well, fortunately,
the TLD name servers can provide something called "glue records",
where they will look up those address (A) records and return them
with the request for your DNS servers. I'm a bit unclear on how this
works, so perhaps a reader could email me to clue me in.
Now, the MTA needs to look up Mail Exchange (MX) records for your
domain. This is how you tell other MTAs which IPs it needs to deliver
email to. Note that MX records are not supposed to map to CNAMEs;
they should map to IP addresses.
Obviously, if the MTA can't contact your domain name servers, it can't
look up these MX records and won't be able to deliver the email -
it will "bounce" back to the recipient.
This is highly undesirable, and there are two solutions for it:
- You can pay a DNS provider to serve up your domain as a "slave
server". This means they will transfer your DNS zones from your
master server, and clients can query for your records from them or
from you. Companies such as Dreamhost, Google, Namesecure, and Verisign
can do this. You can also sometimes have your hosting provider do
it. In fact, there is a configuration called the "hidden master"
configuration, which I recommend, where all clients access only the
slave severs, and the master server is only accessed by slave servers.
In this case, you want your name server (NS) records for your domain
to point to the slaves only.
- The other, "do it yourself" option, is that you run DNS software
on both of your geographically-distributed hosts (see 3).
I prefer the second option for a number of reasons:
- If you ever need to make the serial numbers in your domain lower,
it is easier for you to clear out the slave server's cached zones
and force it to reload your zone with the new, lower zone serials.
- You control the servers, so you can protect them against being hacked
and having them serve up bogus data for your domain.
- You can run tinydns instead of BIND on your name servers.
There are basically two main options for DNS server software; you
can run BIND or DJBDNS (tinydns):
4.1 Berkeley Internet Name Daemon
BIND is the de-facto standard software for serving up domain names.
It is essentially a fully-featured server, capable of being configured
in a number of ways. It also acts as a DNS resolver, so your DNS client
software can use it to look up arbitrary domain names.
- Fully featured, including dynamic DNS updates, support for DNS-SEC
- Requires special configuration to prevent it from being used as a
recursive resolver by external hosts.
- Big code base; potentially slow or buggy.
- Allows for symbolic references, such as CNAMEs
- Uses an obscure and error-prone syntax for zone files.
4.1.1 Zone File Syntax
Each zone file has a "serial number", which must always increase
monotonically if you want your slave servers to stay in sync. The
common convention is to use YYYYMMDDXX as the format, with numerical
year, month, day, and a two digit per-day serial number. This works
fine unless you update your zone file more than 99 times in a day.
The most common problem when updating the zone files is that you forget
to update the zone serial number.
I have a solution for this; I have a tool that generates the header
for a zone file, and it uses the current timestamp (seconds since
epoch) as the serial number for the zone. I borrowed this idea from
TinyDNS.
4.2 TinyDNS
This is part of Dan Bernstein's djbdns suite. It doesn't have nearly
the features that BIND has, but it has some significant differences
from BIND:
- It does only one thing, serve authoritative queries, and it does it
quickly and easily.
- It uses a much more sensible format for zone files.
- It does not support dynamic DNS updates
- It does not support IPv6 addresses (AAAA records)
- It does not support CNAMEs; every record must map to an IP address.
In theory, you could use some sort of pre-processor to do this for
CNAMEs that map to addresses you control.
- It automatically creates reverse zones for you
- It does not act as a recursive resolver; there is a seperate program
for that (dnscache). Since both recursive and authoritative queries
are sent to the same port, this means you must run dnscache on a different
IP address than tinydns. This is often done with IP aliases.
- It generates the zone serial numbers automatically based on the time
at which the zone database is compiled into its on-disk format.
- Dan has a very nice architecture for how his programs are run as daemons,
but it is different than the OSes. Dan is insistent that nobody modify
his programs to run in the more conventional ways (e.g. /etc/init.d
scripts), because he doesn't want to have to support and answer queries
in the presence of multiple possible configurations.
5 Mail Transport Agent
The Mail Transport Agent, or MTA, is the program that speaks the Simple
Mail Transfer Protocol (SMTP) to receive and send email over the Internet.
When a person sends an email, it goes to a MTA, and that MTA does
domain name lookups until it gets an appropriate list of MX record
for the destination. MX records have "weights" assigned to them,
and the sending MTA attempts to contact MTAs listed in MX records
in ascending order of weight. If it runs out of MTAs, it will bounce
the message back to the recipient. For this reason, I suggest running
MTAs on two geographically seperated servers (see 3).
You will want to have two of these for your domain, one of which is
configured to handle it as a "relay domain". That way, if the
first MX server is down, it will deliver the email to the backup MX
server, which will accept it and queue it for sending along to the
primary MX server.
For a long time, the only MTA available was sendmail. However, sendmail
has had an extremely poor security history, is difficult to configure,
and thus I do not recommend it.
The two mailers which seem the most secure to me are postfix and qmail.
This is another of Dan Bernstein's creations. It is likely to be very
secure; he has an outstanding cash bounty for the first person to
find a remotely exploitable bug, and he is a well known personality
in computer security circles.
There are, however, a number of complaints I have heard from others:
- In the log files, each email is assigned a unique identifier, but
that identifier is only unique while the email is being processed.
That is, you may be looking through the logs and find the identifier
you are interested in, and when grepping through the logs for that
identifier, you usually find that it is reused for another email message
processed at a different time. This makes analyzing mailer failures
more challenging than it needs to be.
- I have heard that Dan has not updated the code base in a long period
of time; as a result, there is a family of patches out there to get
new functionality from it that are not integrated into the code base.
- In some configurations (e.g. when run under Cpanel), qmail is invoked
directly rather than by the tcpd wrapper - this means that no IP addresses
are logged. This can make analyzing failures much more challenging
than it needs to be. Note that this is a problem in configuration
and not with qmail itself.
- Some of my co-workers have found it difficult to troubleshoot. For
example, recently one of my co-workers found that it was not processing
the aliases file, but he could not figure out how to troubleshoot
why.
I do not use qmail on my production systems so am not in a good position
to recommend it, as I do not have sufficient experience running it.
5.2 Postfix
Postfix was originally written and is currently maintained by Weitse
Venema. It has a number of features that I find very attractive:
- It is very easy to configure (especially by comparison to sendmail!)
- It requires very little configuration to get it working - "it
just works".
- It is very secure; Weitse Venema is well-known in the security community
and has strong secure coding practices. Instead of one large, privileged
daemon that does everything, postfix consists of a set of cooperating
processes each with limited privileges and duties.
- It is available as a package for most OSes
- When installed from package, it plays well with system startup scripts,
launching itself in a manner consistent with the rest of the OS.
- It has a large number of anti-spam checks that it can do during the
SMTP conversation, and they are easy to configure. It is also possible
to whitelist certain hosts which may not otherwise pass the spam checks.
TODO: Give a link to a sample postfix configuration that others can
use
5.3 Address Extensions
If you have the following entry in your postfix main.conf, you can
do some interesting things:
recipient_delimiter = +
Although the implications of how this interacts with other features
is far beyond what I intend to describe here, it does have one important
function. Namely, if someone sends email to "travis+web@subspacefield.org",
it will ignore the delimiter and what follows (in the local-part),
so that it delivers to the user travis on subspacefield.org. This
has a number of interesting uses:
First, when I give my email address to a web site foo.com, I tell
them my email address is "travis+w-foo.com@subspacefield.org".
Now, if I receive email to that address, I know it is from that web
site. If it is unrelated spam, I know which web site sold my email
address to spammers. This is similar to the old trick of giving companies
different middle names to determine if they sold your account information
to mass mailers.
Second, when I sign up for a mailing list about python, for example,
I sign up using the email address "travis+ml-python@subspacefield.org".
Then, in my procmail config (see 7), I have it filter
email sent to that address to a particular folder. That way, I can
properly filter emails, even if they were blind carbon copied (BCC)
to a list, or if it's a response to one of my emails to the list;
they all go to the list folder. I sometimes wonder if the responses
shouldn't go into my INBOX, but I am definitely sure that I want the
former.
However, this has one drawback; when sending email to such a list,
I must change my From: line to have the appropriate "+ml-python",
or else the mailing list software will treat my email as if it came
from someone not subscribed to the list. TODO: I should really automate
this in my mutt configuration. I have it partially automated; when
replying to an email from the list, I have a script called muttedit
that will figure out what email address was the recipient, and use
that in the From line.
If I had to do it again, I would choose a recipient delimiter other
than +, because several poorly-coded web applications seem to think
that "+" is not a valid character in an email address. It might
be more useful to use "." or "-" instead.
5.4 Graylisting
I believe graylisting is the single most effective anti-spam technique.
There are a few ways that you can implement graylisting:
5.4.1 OpenBSD's spamd
OpenBSD has an amazing graylisting functionality. There are a number
of ways to configure it; let me attempt to explain the default mode.
Essentially what it does is the first time a sending MTA attempts
to connect to the SMTP port, the packet filter redirects that communication
to spamd, which is running on a higher port number (8025 on my system).
No matter what happens, this program will not accept the email.
First, it "stutters" at the host, by sending one character at
a time very slowly. Many spammers run software that is impatient with
slow hosts, since they want to send as much spam as quickly as possible,
so they may choose to disconnect at this point.
Then, after the sending MTA announces itself, and tells spamd who
the message is from and to (the envelope information), the sending
MTA attempts to send the email itself using the DATA command. At this
point, spamd replies with:
451 Temporary failure, please try again later.
At this point, spamd will log a tuple containing certain information
about this email (sending IP, HELO/EHLO information, envelope from,
envelope to) to a database on the file system. However, there is nothing
the sending MTA can do to force it to accept the email. Eventually
the MTA will quit the connection. In the case of spammers, they usually
never attempt it again.
But legitimate MTAs will attempt to re-send the email periodically.
After passtime minutes (default of 25), when it connects again,
trying to send the same email, spamd will notice that it has been
waiting patiently, and will whitelist it by loading its IP address
in a table that is used in a packet filter (pf) rule that allows access
to the real MTA.
Thus, the next time it connects, it will connect to the real MTA and
deliver the message. And its IP address stays in the whitelist table,
so future connections do not go through spamd again, bypassing the
25 minute delay in receiving an email.
There is a Debian package called "postgrey" which interacts
with postfix and achieves essentially the same effect as OpenBSD's
spamd, but without using the packet filter.
6 Mailbox Format
Once you start using Maildir, you'll never go back. I highly recommend
it over the old Berkeley mbox format.
I have a small script I use to make Maildir-format mailboxes, called
mkmd.
7 Mail Delivery Agent
To properly pass all the information that postfix knows about to procmail,
I use the following postfix configuration line:
mailbox_command = /usr/local/bin/procmail -t -a "$EXTENSION"
-a "$USER" -a "$DOMAIN" -a "$LOCAL"
I like to keep my $HOME/.procmailrc rather simple; it looks like
this:
PATH=/bin:/usr/bin:/usr/local/bin
MAILDIR=$HOME/Maildir
LOGFILE=$HOME/.maillog
DEFAULT=$MAILDIR/INBOX/
INCLUDERC=$HOME/.procmailrc.local
INCLUDERC=$HOME/.procmailrc.test
Then, in my $HOME/.procmailrc.local, I have these lines to capture
the values back into variables:
EXTENSION="$1"
USER="$2"
DOMAIN="$3"
LOCAL="$4"
...
custom stuff here
...
INCLUDERC=$HOME/.procmailrc.mlists
That file included on the last line is how I filter my mailing list
traffic into mailboxes. It has several stanzas of the following form:
:0
* ? test "$EXTENSION" = "ml-python"
python/
I autogenerate these using a python script mkmlists (see 9).
8 Mail User Agent
9 mkmlists
When I subscribe to a mailing list, I need to do several things
- I need to call mkmd to make the mailing list's folder.
- I need to update my .procmailrc.mlists to filter email to the list
into the list folder
- I need to update my /.mutt/aliases.lists to make
the alias (e.g.) "ml-python" map to the list's email address
(e.g. "python-list@python.org")
- I need to update /.mutt/lists to tell mutt that the
list's address (e.g. "python-list@python.org") is the email
address associated with a mailing list I've subscribed to
- I need to update /.mutt/mailboxes to tell mutt that
the mailing list's folder is a mailbox that I can switch to
I do this automatically using a python script I have called mkmlists,
which takes an input file (/.mlists) of the following
form:
ml-nanog nanog nanog@nanog.org nanog@merit.edu
This says that email to "travis+ml-nanog" gets filtered to the
nanog folder, and that there two addresses for the list, and that
ml-nanog is an alias for the first address for the mailing list (notably,
nanog@nanog.org).
10 Mail Search Engine
11 IMAP Server
Needed if you want a graphical email client. To avoid password-guessing
attacks, I make it only available over SSH/VPN.
12 Graphical Email Client
I don't usually need graphics but every once in a while relatives
send me pictures, so I fire up one and access my inbox over IMAP over
SSH/VPN.
File translated from
TEX
by
TTH,
version 3.85.
On 28 Jul 2010, 14:24.