Another Day, Another Portal

I was particularly excited when one participant asked for an RSS feed of the Holidailies. Journal writers often are stereotyped as technologically backwards (as well as big margarita drinkers). As this demonstrates, that's not fair. I was pleased to satisfy this request. I should remember that whenever I do a project that involves topically or chronically organized quanta of information, there should be an RSS version.

I am particularly frustrated by goddamn non-standard Microsoft character sets. I suspect people are writing their entry synopses in Microsoft Word and cut-n-pasting into the web form. When they do this, they end up using character codes that are not legal in the international character sets used on the web. I'm tempted to change the portal code to just bounce entries with these characters, but the technically inept journal writers (ooops! there's that ugly stereotype again) will never figure it out.

The solution I'm heading towards is accepting the input, but when rendering the output pass the text through a lookup table that converts codes that correspond to illegal character positions to the Unicode representation of the windows-1252 glyph. That is, for instance, “smart quotes” are replaced with the numeric entities “ and ”.

So, if you'll excuse me, I need to find my "Will Code Portals for Food" sign and head on down to the Drag.

Comments

Comments have been closed for this entry.

re: Another Day, Another Portal

Uhm.. Its "godam'n'".. :)

re: Another Day, Another Portal

Actually, it should be "goddamn fucking."

(Seriously, thanks. I fixed the typo.)

re: Another Day, Another Portal

The issue with character encoding took on added urgency when somebody nicely setup a Live Journal feed of the site, but it puked on the misformed XML.

Fortunately, I was able to find a very nice, definitive mapping from cp-1252 to Unicode.

I ran into one more problem. The 8-bit characters (that is, character codes in the range 0x80 through 0xFF) were rejected as invalid. I ended up converting them to numeric entities too. This problem has me puzzled, because I thought by declaring encoding="UTF-8" those character codes would be accepted. I guess I have more research to do.

re: Another Day, Another Portal

you can use my sign.

re: Another Day, Another Portal

Uh, isn't having a blogger set up their portal and RSS feed not really helping their "technically inept" reputation? "Yeah, those online journallers are so ept that they had to get a weblogger to set up their site." "Maybe they were just too drunk to do it themselves."

(And since I'm a big geek, are you stripping out "<!--" without "-->"s? Cause I've broken more of the web applications that I've had to write that way than with any other trick in my big book of testing.)

re: Another Day, Another Portal

Blake - Since I hang with the Austin journal writing crowd and even attended Journalcon, I believe I qualify for honorary journal writer status.

With regard to the hanging comment, the Holidalies site accepts plain text only. You can type in <!-- (or any tag) and it will be rendered exactly as you type it.

re: Another Day, Another Portal

I wanted to thank you, by the way, for adding the journal name and author's name to the rss items. It's making it much easier for me to figure out which urls go with which people.