The war on SPAM: an review of the real world tools

Definitions

MTA : Mail Transport Agent : this is the software that will actually do mail delivery. It is listening on port 25 and answers to SMTP commands. Some common MTAs : Postfix, Exim. More about MTA.
MX : MX records are DNS entries that are identifying the server responsible for mail delivery for the domain. More about MX records.
RBL : Realtime Blackhole List : a list of blacklisted IPs, that should be considered as spam sources. It is called Realtime because they are constantly updated. More about RBL.
RFC : Request For Comments, these are proposal for norms, some of them become norms. More about RFC.
IPS : Intrusion Protection System : these are "smart firewalls" that are blocking malicious requests, often based on behavioral rules. More about IPS.

The goals

My personal goals on SPAM war are pretty short:

Minimum false positive: having a spam is better than missing an important mail, try keeping the "permanent bashing" as low as possible
Hit harder on reoffending: coming-back spammers should be slapped harder
Internet neutrality: try not to encourage big mail farm, and let fair little providers doing their business

Available techniques

Blacklist / Blackhole Lists

Blacklists, Blackhole Lists (or RBLs) are the most ancient and most common measures for reducing spam. They are still pretty accurate, but:

It depends A LOT on the quality of the list, trashy lists are very common and they would end up sucking resources for no result, or worse, blocking legitimate emails
They are only accurate when updated often. I mean, very often (the R in RBL).
You should not use it directly on the MTA, these lists are an aggressive artefact of the past, where spam did not come from mail farms.
Very few are implementing IPv6, I agree that IPv6 spam is quite anecdotal, but it will probably change pretty soon (believe me)

Greylist

Greylisting is issuing a temporary REJECT code to force the foreign server to keep the message and send it back later. It aims at increasing the "cost per mail" for spam farms, as they cannot "hit and run" as fast as before.

The main culprits are that some providers are not implementing it well (hello Facebook) and so it needs an educated whitelist to work properly.

Some spammers are also re-sending the same mail several times in case of failure, looking like a legitimate mail server, and making Greylisting inefficient.

It is also hurting the fastness of the mail transmission, as the retry may occur several tens minutes after, slowing the mail delivery with little control on delays.

Also, some legitimate mail farms (hello OVH) are distributing their retry on several servers, making Greylisting inapplicable without fully whitelisting them.

RFC compliance

Spammers are often running special softwares for their crafted emails, tightening RFC compliance may be a good way to kick them out.

It can be as simple as forcing an HELO on SMTP protocol, or more tricky like checking the mail headers or enforcing a valid reverse.

In the majority of cases, it is very efficient, but:

Some home-made servers, especially Synology or Windows servers may be blocked while they are sending legitimate email. These servers are often ran by people who doesn't know or care on how to correctly setup a mail server. These buggy setups are more common than you think, and they are often legitimate senders, who have no clue of what is wrong, and are not willing to fix it (did I mentioned banking companies?).
In the vast majority of cases, IPv6-ready servers have no reverse on their IPv6 addresses and/or the IPv6 reverse is wrong.

SPF and DKIM

SPF and DKIM are anti-spoofing techniques. They does not guarantee that a mail is legitimate, but if the controls are showing an anomaly, it is very likely to be spam (or worse : scam or social engineering tentative).

The SPF technique is based on IP or sending domain whitelist: the sending email domain publishes a list of servers allowed to send email, along with a hint on what is expected if it does not pass (soft or hard reject).
DKIM is much more complex as the sending mail server has to cryptographically sign every message with a domain-specific key, which is then published in a special domain record.

SPF has been proven efficient at its beginnings, but today many spammers are just using stolen email accounts or custom domains that does not publish any SPF records, making it less pertinent.

DKIM is, like SPF, an anti spoofing technique, and spammers may be likely to sign mails from their custom domain if it becomes a necessity. Like SPF, a signed mail is not necessarily a clean mail. By the way, signature problems are pretty comon, and trashing an offending DKIM may not be the right behavior: Yahoo broke a lot of mailing lists when they enforced their DMARC policy.

To conclude, DKIM is relevant if you need to certify outgoing mails or if you are enforcing policies inside your company (to block spoofed email targeting your organisation) but it is definitely not an efficient anti-spam measure. And the recipient's servers may decide to simply ignore your painfully-configured DKIM headers.

Bayesian filters

Bayesian filters are frequency-based spam detection mechanisms. The idea is to sort out ham and spam for a short period of time, to let it "learn" what spam is made of, for efficient content-based detection. It has the benefit of being organisation-specific, as what is ham and what is spam may vary from one company to another (a company selling drugs may not be willing to filter out every message containing the word "pill").

But the learning process has to be taken seriously, and many end-users are just deleting spam instead of marking it for feeding the learning. By the way, the learning process needs IMAP folders to sort mails, and it will not work properly if all users are using POP mailboxes.

Spammers techniques

Or "the today's weapons of this war" .

Here is a short review of the spammers techniques I know, and some counter measures.

Address guessing

Some spammers are taking random web domains from their crawling, and then try to send their mails to commonly used addresses patterns. It can be webmaster@ ; ceo@ or whatever. These are easy to spot in log files, and a well configured MTA can take counter measures to lock out these guesses. Free (French ISP) is actually implementing this: postmaster.free.fr/index_en.html

Unfortunately, some legitimate email are sometimes sent to non existent addresses (typo, user deleted, etc) and legitimate MTA sometimes get blocked if the ban trigger is too low.

Moreover, nowadays' spammers are distributing their guesses trough zombie machines to stay under triggers, making it hard to spot.

Botnets

A lot of spammers are using zombies machines to send a high amount of mail in a short amount of time. Commercial ISP are taking the problem seriously, and many are blocking or filtering port 25 on their dynamic ranges, making impossible to have a custom mail server at home, but also preventing infected machines from sending direct-to-SMTP queries.

Direct to SMTP connections, ignoring MXes

Some spammers are just scanning IP ranges and directly talking to MTA, even if these MTAs are internal and pointed by no MX records. You may think that an authoritarian firewall is the neat solution, but it may be worth collecting these feisty IPs and feed them to an IPS, to protect the network from their guesses on the real MTAs.

Fake bounces and backsquatting

Sometimes you get an Undelivered Message notice (DSN) for a mail you never sent. This is probably backsquatting.
Backsquatting is sending email to a buggy address with a valid "from" address. An incorrectly configured mailserver would reply straight away to the from address to notify the failed delivery, instead of rejecting the mail and letting the foreign server doing the dirty job.

These configuration errors are pretty common, sometimes in defaut configurations or example configurations, but they can be easily avoided (look for documentation about backsquatting for your MTA, and test if your y server is vulnerable).

Real life advices

I am sorry, I don't have the magic wand to stop all Spam. A good spam fighting solution is always a combination of techniques, SpamAssassin for instance uses scoring from RBL as well as a bayesian filter and SPF checks.

I think that constant monitoring is important. Not just automated monitoring, but also clever log reading and mind openness on what can be a better solution for each problem. You never know what can happen in a spammer mind, and what works today may not work tomorrow. The Internet of Things is already a game changer.

I also advise you to be careful. Some decisions on our implementations may really hurt the Internet. Locking whole countries out is not without consequences, and the rise of IPv6 Internet has to be take into consideration from now.

Things are often not that pretty in mail servers, the temptation is great for a default blocking policy (hello MailInBlack). But as sysadmins, it is our responsability to not abuse and not hurting the smallest actors in the market (mails not coming from big farms). That may be a big word, but in my opinion, the freedom of the Internet also count on our neutrality on mail processing.

Thanks for reading.

# uname -a