Fighting Link Spam with Redacted RSS Entries

The link spam problem has been getting worse on the Austin Bloggers web site, just as it has everywhere else. When the site opened two years ago, we provided an open posting form. Anybody was welcome to submit whatever they blogged about Austin. Last month we moved the form into the member portal, behind a login screen.

The link spam has been a minor annoyance on the front page. I just need to make a few twiddles of the database bits, and it's gone. We'd typically get a few spam postings in the wee hours of the morning. I could clean it out over my morning coffee, and it would be gone long before most people would see it.

The link spam problem for RSS, on the other hand, has been significant. That's because once spam got into our RSS feed, it stayed there. If you were viewing Austin Bloggers with an RSS reader, it probably was looking very spammy to you.

That's because the RSS reader periodically fetches the RSS feed and stores away the articles. When I cleanup the spam it comes off the front page, but I can't remove it from your RSS reader once it's downloaded the entry.

I've now got a solution that should solve that problem. When an entry is removed from the web site, rather than deleting it from the RSS, I'll replace it with a redacted entry.

This solution required an upgrade of our RSS feed, from version 0.91 to 2.0. That's because RSS 2.0 adds support for a <guid> element. The Globally Unique Identifier (GUID) element is a unique identifier for the RSS entry. It typically (and by default) is a permalink, usually set to the same value as the <link> element. But it need not be. Instead, I'm using it to assign a unique identifier to every entry in the feed.

When an article posts to Austin Bloggers, the RSS will look something like:

  <item>
    <title>Get a Bigger Nose</title>
    <link>http://www.example.net/nose-enhancer.html</link>
    <description>Blah blah blah ...</description> 
    <guid isPermaLink="false">entry:1615 [at] austinbloggers [dot] org</guid>
  </item>

When an entry is removed from the site, instead of removing it from RSS, it will be replaced with a redacted entry. The redacted entry for the example above would be:

  <item>
    <title>(this entry has been removed)</title>
    <guid isPermaLink="false">entry:1615 [at] austinbloggers [dot] org</guid>
  </item>

The RSS reader should recognize this as an update of a previously posted entry. The original entry should be discarded, and this version used in its place.

You can see this in action at the Bloglines view of our site. Currently, there is one redacted article showing from February 6.

I hope this technique makes the link spam a little less annoying for our readers, and a lot less effective for the spammers.