Blogger Finds its 404

The way the hypertext transfer protocol (HTTP) works is that your browser sends a request to the web server, and the server returns a response. The response includes a numeric status code followed by some data such as an HTML document. When all goes well the server sends you a 200 status code. When the page you requested cannot be found, the server is supposed to return a 404 status code.

Some blog hosting services fail to serve a 404 error. Instead, when you request a page that doesn't exist they return a success code, but return an HTML document indicates an error occurred, maybe offers a navigation index or some other helpful tool.

Well, that's wrong—and worse, that's stupid. You can serve up an HTML document along with the 404 error. The sites that don't do 404 errors don't have to change a thing about the HTML they return, they just need to set the damn status code correctly.

Providers that do not serve 404 errors are breaking the web—and causing me a lot of grief. I see this regularly on the Austin Bloggers and Holidailies portal sites. These sites aggregate weblog and journal postings. Users submit URLs of their posted articles to the portal. When they do, the portal verifies that the URL they submitted works. If not, it throws an error and asks them to correct the problem.

These portals check the URL by requesting the page and checking the status result. If a server returns a 404 error, then the portal knows to flag the URL. If the server returns a status code, however, the portal is going to accept the URL—even if the server sent back a web page that says, "Whups! So sorry!"

Blogspot, the Blogger hosting service, has traditionally been the biggest problem. I noticed that in this year's versions of Holidailies, I'm not getting complaints about broken Blogspot entries like I used to. I checked it out and found that sometime over the past year they implemented 404 errors. Kudos to them for doing so.

I sampled the Holidailies portal to look at some of the other blogging services, to determine how many services are still failing to do proper status codes. I ran my test by taking the URL for a sample entry at the site, corrupting the URL by inserting a single character, submitting a request with the now broken URL, and checking the response. The web server should have responded with an error 404 status code. Some services are still not doing that right.

Here are my results:

Correct Response List Broken Response List
www.blogspot.com
www.diary-x.com
www.journalscape.com
www.lightblog.com
www.livejournal.com
www.myspace.com
www.typepad.com
www.xanga.com
www.blog-city.com
www.blogsome.com
www.bravejournal.com
www.diaryland.com
www.inknoise.com

The "Correct" sites responded to the request with an error 404 status code. The "Broken" sites responded to the bad request with a success status code.

It seems like the number of sites with broken 404 response handling is shrinking. I hope the list will be even smaller when Holidailies 2006 comes around.

Comments

Comments have been closed for this entry.

re: Blogger Finds its 404

dude no matter what i do i cant get to myspace i even clicked ur link to myspace cuz i didnt kno what u where talkin about