Spam Economics and CAPTCHA

Almost everybody hates spam, but the simple economics make it work. The costs to send spam are so low that payoff is achieved with a minuscule success rate. This applies not only to email spam, but also blog and web form spam.

The goal of conventional email spam is different than the goal of blog comment (and trackback) spam. Conventional spam is intended to lead to a direct sale. Follow the link in your email and buy some pills to increase your manhood. The primary goal of blog spam is not sales, but to increase the number of links to the spammer's web site. The blog spammer is trying to game the search engines. That's because search engines tend to treat web sites with more inbound links as popular and authoritative, and thus make them appear more prominently in result listings.

There are tools to automate both spam email and spam blog comments. Often the spam is sent from a net of zombie computers that have been infected and now operate under remote control. Thus the cost of spam is minuscule, and the incremental cost of one additional email message or blog comment is nearly immeasurable.

CAPTCHA (yes, all capitalized; it's an acronym) is one solution to the problem. A CAPTCHA is a puzzle—often a graphic of fuzzy letters—that you have to solve. The idea is that the puzzle should be easy for a human to solve, but hard for a machine.

CAPTCHA is an economic solution to the spam problem. It makes a web site resistant to automated tools. (In theory. In practice, CAPTCHA can be broken.) If the automated tools don't work, then a human is needed, and the incremental cost of spam is raised from "nearly immeasurable" to an amount paid by the hour.

Spammers have responded to this with ways to tip the economics back in their favor. One way is hiring cheap overseas labor to break CAPTCHAs.

Or, at least, I assume that's what was happening. I run several web sites (including this one) that have been seeing increasing amounts of comment spam, even though they have a simple math puzzle (like "what's 8 + 4?") on the comment form.

At first, I thought somebody had automated an attack on the math puzzle in the Drupal CAPTCHA module. It would be trivial to do in any scripting language that supports regular expressions—which these days is all of them. If that was true then a few simple changes to the puzzle would foil the 'bot. I tried that but the spam continued.

This was pretty convincing evidence that actual human beings behind keyboards were generating the spam.

A recent article over at ZDNet entitled Inside India’s CAPTCHA solving economy confirms that CAPTCHA solving has become its own industry.

The bad news is that this suggests, short of mandatory registration or closing off comments, there isn't any way to stop blog spam completely. The good news is that given that there is a cost to sending spam, it can be limited. Just so long as the cost of cleaning up the spam is less than the value of open comments, the economics will work for me.