Lately, I've been getting a lot of spam in my blog comments. Some people are getting a handful, but I've gotten at least a hundred, including a blitzkreig a week ago.
I hate spam in my email inbox, but I really loathe spam in my blog comments, and now I've done something about it.
The October issue of the eminent hacker magazine Dr. Dobb's Journal had an article by Paul Tremblett which outlined a way to use the Java 2D API in a servlet to render dynamically created images of a sequence of letters and numbers for use in web form validation. This has been gaining popularity as a method to deter bots from submitting information to webforms.

Unfortunately the magazine did not include the proper source code either in the article or in its online resource center, but Mr. Tremblett was kind enough to dig up some mostly working source code for his application and email it to me. I was able to use it to get working a rough version of the application described in his article. I then added an XMLRPC wrapper to it, obtained his permission to release it as open source under the BSD license, and dubbed it Sapience.
Next I started in on a hack for Moveable Type comments to use Sapience for comment form validation. The hack I came up with works like this.
Place Sapience.pm file in
In the MT admin interface, edit the comment templates to include <!--SAPIENCE--> within your comment form html.
Edit the file
I've set up a project home for Sapience on sourceforge.
Note there are other methods for stopping spambots from posting to your Moveable Type blog, such as MT Blacklist which does not require so much hacking of your MT source.
There are a few things that need to be done in order for it to go 1.0. I need to figure out how to more exactly crop the image and I need to institute a clean up mechanism for the images on the server. I also need to tweak the random code string generation to ensure there is always a letter present, since perl's XMLRPC::Lite will transport the code in <int></int> instead of <string></string> if the code is all numbers.
There are also a few things I'd like to do, such as add color and new validation methods to further thwart bot usage. There could be instructions such as 'Enter only the blue characters.' Also there could be animation or audio vailidation methods I suppose.
I'm getting ready to leave on a 2.5 week business trip though, so I may not be able to finish it up quickly. At least in the meantime I can rest assured that the spambot will cease posting to my blog.
Last night I met Mie and her friend Mieko out at the S.F. Eagle Tavern on 12th street. I hadn't been there before. Nice place, has an outdoor patio with a fire pit and plenty of overhead space heaters.
Mieko was there because she promotes a few bands in Japan and the San Francisco area, and the roommate of someone in one of her bands was performing that night. The band is named Sistersound, a three-member "Avant-Rock" group playing excellent Sonic Youth inspired pieces. I'm really glad I got to see them and look forward to the next time.
I took two photos with Mie's cell phone, but they didn't come out so well in the low light.
I wanted to try out some mapping systems, so I finished up that scraper for the San Francisco Health Code Violation Database that I had been meaning to write. I first wanted to try it out on an installation of GeoServer a Java servlet based implementation of the OpenGIS spec, but I had problems getting it installed correctly. Impatient to see the data plotted, I tried out RDF Mapper which is a project from a guy I met last weekend named Jason Harlan. I stuck the data into a PostgreSQL database and wrote a little perl script to put all of the businesses who had health code violations in the past inspection period into an RDF Map file. Here's the result: San Francisco Health Code Map (or click on the static map below to get to the interactive one).
Update: The city has changed its web site, so clicking on the restaurant names is broken temporarily until I can compensate for their changes. You can still look up restaurants by name using the city's search form.
I like RDFMapper, it's super easy to use and very snazzy!
I'd like to try to plot all of the businesses, but make them a differnt color for ones that had no violations. This way you could get an idea of where the seedier and cleanlier parts of the city might be. I also need to scrape past violations so changes over time could be plotted.
A pithy essay over at Danny O'Brien's Oblomovka examines how the use of the web to mediate conversations, and in particular blogs, has altered the available communication modes so that the private mode is disappearing.
The problem here is one (ironically) of register. In the real world, we have conversations in public, in private, and in secret. All three are quite separate. The public is what we say to a crowd; the private is what we chatter amongst ourselves, when free from the demands of the crowd; and the secret is what we keep from everyone but our confidant. Secrecy implies intrigue, implies you have something to hide. Being private doesn't. You can have a private gathering, but it isn't necessarily a secret. All these conversations have different implications, different tones.[.....]
This is why, incidentally, people hate blogs so much. My God, people say, how can Livejournallers be so self-obsessed? Oh, Christ, is Xeni talking about LA art again? Why won't they all shut up?
The answer why they won't shut up is - they're not talking to you. They're talking in the private register of blogs, that confidential style between secret-and-public. And you found them via Google. They're having a bad day. They're writing for friends who are interested in their hobbies and their life. Meanwhile, you're standing fifty yards away with a sneer, a telephoto lens and a directional microphone. Who's obsessed now?
I think another problem with blogs is that their 'register' is often murky, even to the blog owner. Occasionally I blog things that are mostly just notes to myself or close friends, therefore private. But then when I have something that I think is interesting for the general public, my blog is the natural place to post it.
I'm sure there's a bunch of online ways to convert a U.S. address to a lat/long, but for some reason I always have problems finding one when I need it. Anselm told me there was an open source lat/long server available that used the U.S. Census Bureau's Tiger data, so I tracked it down. Turns out it's that thing that won the 2002 google programming contest and was written by Dan Egnor whom I met yesterday. Dan must be the uber alles alpha geek of geo-geeking between that hack and his cool mobile Tron-GPS game (see my previous blog posting).
So I compiled his code on the headmap server and hacked in a few minor changes to turn it into simply an address->lat/long conversion server (his geocode server does much much more). You can try it out here: lat/long lookup. Since it's GPL I guess I'd be in violation if I didn't offer the altered source code, so email me if you want it, but the changes are so minor you'd probably be able to do it yourself just as quickly.
What a day.
The schedule at FOO Camp was set by the participants, a large blank calendar listing the available rooms (along with .their approximate size) was provided Friday night, and by Saturday morning it was filled in with a variety of handwritten ad-hoc geek topics.
I started with the Social Networking session, which quickly proved more popular than the medium sized room could handle comfortably. Danah Boyd acted as our academic expert to get things rolling and we spent a bit of time just arguing over what Social Software is. Then we moved into people introducing social software that they are working on. The creators of meetup.com, upcoming.org and a few others described their experiences. A Microsoft Researcher went introduced one of the best M$ products I've seen: Netscan which does interesting data mining of usenet, and produces pretty pictures.
![]() netscan usenet visualization |
I felt like the session wandered around alot, probably because the topic can be interpretated so broadly. I personally would have liked to heard more discussion about practical flexible architectures for mapping and traversing social network topologies, but I don't have the moxie to speak up in such a crowded room unfortunately. But I do think I identified people I'd like to discuss the topic with off line at some later date.
The next session was along the same lines, in my opinion, but apparently not to most of the other Social Software fans. It was Bram Choen's session on Trust Metrics. There were maybe 6 people in attendance for this one. Trust Metrics are ways of measuring trust or reputation in social topologies. Bram gave a good overview of the most common algorithms used including an improved one he recently developed. Read Bram's mind here. Bram is probably the alpha-est alpha geek of whom I know, having created both BitTorrent and codecon. This was my favorite session, mostly because I enjoy algorithm discussions and I learned about a voting system called Condorcet which is better than the instant-runoff algorithm being pushed now in San Francisco.
After that session I attended the geo-geeking session, as geo-geeking was the main reason I was attending the event in the first place. Mike Liebhold did a great job shining light on the tangled web of standards and implementations that are choking the geo-geeking space. I still haven't figured it all out yet though, there's just so much going on and it seems to be all over the map (pun intended). I'd like to create a list of topics in the geo-geeking sphere and have all of the geo-geekers rate the importance of each topic so we can identify the most dense clusters of interest and maybe I can get an idea of what needs to be done first.
The geo-geeking session spilled over past its two hour slot (one of the nice things about an ad-hoc conference is that you can do that), but I skipped out on the extra discussion to attend the Hacking State of the Union session put on by the Schmoo guys who went over the pros and cons of Intrusion Detection Systems. For people representing one organization they had a number of different viewpoints ranging from they're mostly a waste of time and resources to they are an important part of a complete security process.
At various times throughout the day, someone was giving test rides on their segway out back but I never bothered to queue up for a turn at it. It was fun to watch others give it a roll though, and I didn't see anyone fall off.
More fun discussion during the evening with various people and on various topics.....I even ran into someone with whom I could talk chemical informatics: Jesus Castagnetto from San Diego who created the Metalloprotein Database and Browser.
I just set up a web cam for FOO Camp: Foo Camp Cam. Updated about every two minutes, I'll try to move it around throughout the day.
It's a Phillips eggcam running on Linux. If anyone knows how to get rid of that grey border, please let me know. I'm ussing vgrabbj and pwc.o and haven't figured it out yet I guess.



