When I was at the Emerging Technology conference back in April, one of the sessions that got me all hot and bothered was Maciej Ceglowski's presentation on semantic mapping. Specifically the Contextual Network Graph system which he described and released into the public domain. The CNG system is able to handle a keyword search and bring up documents that are similar even though they don't contain the keyword. One example Maciej gave was that you could do a search on 'photosynthesis' and have documents returned that contained only the phrase 'plants get their energy from sunlight.' It is able to do this by noting which words might be shared between documents that contain the keyword and documents that don't, and their relative importance.
I could see many uses for a generalized version of this system, including using it in my day job for doing searches in the chemistry space. Unfortunately Maciej didn't release any code for the CNG at the time, and the code he did release (for a patented system called LSI) was written in C++ and I got over C++ a long time ago. So I had been meaning to implement CNG in Java ever since but just didn't get around to it.
Last night I discovered that Anselm, my collegue in the Headmap Collective, has been working with CNG for a couple of weeks now and just completed a working alpha level implementation in C# (he also has a detailed explanation of CNG at that previous link). Since C# is based on Java, it was trivial to port it, so it is now available in the headmap cvs tree.
After porting it over I wanted to be able to pull in some external documents to play around with it. I wrote a quick system that reads an export file from a Moveable Type blog, stores it as a CNG where each blog posting is a separate document, and then lets you interactively specify a post and get back a list of the other posts sorted by similarity.
If you want to play around with it yourself, you can download the headmap-cng.jar jar file and an export file (obtain that from your MT blog menu), then run the utility from the command line like this:
java -classpath headmap-cng.jar org.headmap.cng.util.SemanticMT myblog.txt
You'll need Java 1.4 installed on your system.
It's not very useful yet; the algorithm needs to be tweaked and a better utility needs to be written. But it has some promise and interesting applications. Download the source code (headmap-cng-src.zip) and explore it!