Filtering No Silver Bullet

in

Link: Fooling Bayes (Joho the Blog)

David Weinburger is surprised a spam message got through his Bayesian filter and into his mailbox. I'm not.

New anti-spam methods can be very effective early on. But, like an evil deranged mutant virus, spammers adapt and attack. For instance, now that Spamassassin has become so popular, spammers have adapted to it. Spammers will test messages prior to sending, and tweak them to get by the filters.

This is easy to do with rule-based filtering such as Spamassassin. The rules are fixed. You simply push a message through, see what things cause negative points and adjust accordingly.

This is tougher for Bayesian filters, which are adaptive. The filter actually trains to your spam (and non-spam--sometimes called ham) stream. Unfortunately, I fear even this system can be gamed. Spammers can train their own Bayesian filters and use them to develop a vocabulary that avoids spam classification.

That's why I think people who believe filtering is a silver bullet solution are being naive. Doesn't mean you shouldn't use filtering technologies--I certainly do. Just means you shouldn't think they will vanquish the spam problem.

Remember, spam has nothing to do with content. It's about permission and delivery. Content filtering may exploit some weak heuristics that may identify spam, but ultimately cannot be depended upon as the solution.

Comments

Comments have been closed for this entry.

re: Filtering No Silver Bullet

I agree that filtering is no silver bullet. I wonder, though: do you suppose that filtering could get good enough, and ubiquitous enough, to raise the marginal cost of each spam message that successfully penetrates to the point that sending spam becomes un-economical?

Right now, about 10% of spam gets through to me. I can imagine better filtering mechanisms improving that quite a bit. If spam advertisers start getting 1% the response rate they previously got, they'll only be willing to pay spammers 1% as much for the same number of addresses sent. Spammers can then try to increase their address lists by a factor of 100 and invest in even more bandwidth/hardware, or they can decide the revenue doesn't justify continued operation.

I'm just speculating here, mind you.

re: Filtering No Silver Bullet

I wonder, though: do you suppose that filtering could get good enough, and ubiquitous enough, to raise the marginal cost of each spam message

Good question. That is, after all, the intent of mandatory labelling (e.g. "ADV" in the subject) for spam. If that happens, then filtering will be optimized: nearly all spam can be blocked.

I have observed that spammers have responded to improved filtering by raising their sending rates. My fear is that as filtering gets better, spammers are going to transmit more, leading to less mail in your mailbox but meltdown of the servers. (Not to mention the poor schmoes on dialups that have to download hundreds of spam messages just to filter them away.)

That, of course, is speculation on my part. I think your question is a fair one. My concern is too many people are presuming an answer to it.

re: Filtering No Silver Bullet

I feel strangely compelled to mention TarProxy at this point. It seemed like such a good idea when it was mentioned on slashdot a few months back.

Of course it only works if you run your own server - your ISP doesn't care how slowly you retrieve stuff that's already in your inbox.

re: Filtering No Silver Bullet

Interesting, Jim. Especially like the writeup: "we all want to cause spammers pain."

re: Filtering No Silver Bullet

(Not to mention the poor schmoes on dialups that have to download hundreds of spam messages just to filter them away.)

We "poor schmoes" should then use a program that only looks at mail headers on the server and doesn't download the messages themselves, e.g. IIRC MailWasher or ERemover.