We got a new Continuous Integration box from Pivotal Labs yesterday and I ran into some weird MySQL time based issues when trying to get our tests to pass on it. It turned out that the CI box was running MySQL v 3.0.38, but all of our dev boxen were running v 3.0.45. Here's some difference between these two versions:
v.45 will happily take a clause of the format SELECT * FROM a_table WHERE end_date < '07-12-31' while v.38 isn't so happy with it. In this case, I have to hand it to v.38, because that date was in YY-MM-DD format which is about the most retarded date format I've seen in production code. I changed the Date#to_mysql method to use YYYY-MM-DD format and all was well.
The second issue was more odd. In v.45 if you compare a date column value with a time, the date is treated as a time value set to 00:00:00 of the day for the comparison. In other words, if the value in the date column is 2006-07-05 and you compare that with '2006-07-05 00:00:00' they are equal. Not so in v.38:
mysql> create table delme ( a date );
Query OK, 0 rows affected (0.03 sec)
mysql> insert into delme (a) values ('2006-07-05');
Query OK, 1 row affected (0.01 sec)
mysql> select * from delme where a < '2006-07-05';
Empty set (0.00 sec)
mysql> select * from delme where a < '2006-07-05 00:00:00';
+------------+
| a |
+------------+
| 2006-07-05 |
+------------+
1 row in set (0.00 sec)
I found out that the production box is running v3.0.51 so we've now upgraded all of our workstations and deployment/testing machines to that.
One of the first things I needed to have done at my new job was to convert the application from the Ruby on Rails 1.2.6 to the latest 2.x version. This wasn't the first time I've upgraded a 1.x app to Rails 2, but it's the first time with a significantly complex application. Some issues we ran into:
The dreaded "Expected X to define Y" error message. This message can come from a few places in the rails code, and it is a usually useless message that obscures the real problem. In this case I added a rescue clause around an internal require statement in active_support/dependencies.rb and reported the true culprit: a model definition was bombing out when trying to require acts_as_list. Many of the acts_as components of ActiveSupport were pulled out to plugins in Rails 2. So the solution was to script/install acts_as_list.
The rake deprecated task is handy, but it just does a context-free grep of the source code and warns about code that isn't problematic, like complaining about @session_key with "@session is deprecated, use session instead". Still it's a helpful place to start. It didn't catch everything of course. It was fine spotting the deprecated render_partial calls, but it didn't complain about render_text for example.
We had a number of problems in tests that had been dependent on the fact that in Rails 1 fixtures loaded into models didn't set the timestamps. In other words, unless you specified a created_at or updated_at in the YAML file you got NULL for those columns in the database. In Rails 2 apparently that's no longer true. It sets the timestamps on all fixtures. A quick workaround for this that worked for us was to clear the fixtures before tests that were dependent on only certain records having timestamps. The tests and fixtures will need to be fixed eventually.
Behavior change in the radio_button_tag. Rails 2 added this:
def radio_button_tag(name, value, checked = false, options = {})
[...]
pretty_name = name.to_s.gsub(/\[/, "_").gsub(/\]/, "")
[...]
end
Which changed the ids on several of our input tags to something arbitrarily prettier but unexpected by the application.
Associations on models were screwy. From script/console:
>> g.game_category NoMethodError: undefined method `game_category' for # from /Library/Ruby/Gems/1.8/gems/activerecord-2.0.1/lib/active_record/attribute_methods.rb:205:in `method_missing' from (irb):2 >> g.game_category_id => 1 >> class << g >> belongs_to :game_category >> end => nil >> g.game_category => #This was annoying. I dug back into the active_support/dependencies.rb and started puts-ing around with the code that does the loading. I found that the Game constant was being defined when the Migrator class was being loaded. It turns out that a migration was defining Game in order to cheaply access some ActiveRecord methods, but not bothering to declare the associations. I could have added the association in the migration, and I thought for a moment of changing the AutomaticMigration code to clear any constants it loaded when it was finished, but decided the safer and quicker appproach is to just rename the class in the migration and use set_table_name to point it at the games table. This goes along with my general notion that using ActiveRecords in migrations is a bad idea.
As I've mentioned before, the Coachella web site helpfully provides one or two tracks for most of the 100+ artists at the festival so you can hear samples of the bands you don't know to help determine if you want to go see them or not. They even have a scheduler page where you can select the bands you want to see and hide the ones you know you don't. The user experience is poor though. While listening to the provided tracks you can't easily make note of which ones you like or not, and there is no real link between the schedule page and the listening page. Plus in the end, how are they going to handle it when you select two artists scheduled at different stages at the same time?
My project for this week is a remix of the coachella web site to help make the exploration experience little better: Coachella Explorer. Here you can listen to the tracks from the official website, but as you listen you can also rate each artist on a scale of 1-5. When the official schedule is released, the site will be able to take your ratings and calculate a schedule for you. If there is a conflict it can perhaps be resolved by the relative rating you gave between bands in the same time slot.
This took a bit longer than I thought to complete. Partly because TextDrive sucks (I don't feel like going into details on this now - too annoyed), and partly because I never have time to work on these things anymore it seems. There are certainly features that would make it nicer, from little things like Netflix-like stars for rating instead of a select drop down to major features like sharing your picks with your friends and then producing an optimized schedule that keeps you at the same shows as much as possible. Maybe next year. As it is, the festival is less than a week away and probably no one is going to use this thing except me. Which is probably for the best since I was forced to host it on my decade+ old linux server that's slow as a turtle dragging a cinderblock. Any significant traffic would likely slaughter the server.
By the way, I call it a remix because all of the media (music and images) are still being served from the Coachella web site. I'm just sort of putting a new interface on top of it.
I need to hire some Rails developers at Lumos Labs, so I decided to write a spec for the position:
http://sfbay.craigslist.org/sfc/eng/646385102.html
require File.dirname(__FILE__) + '/../spec_helper'
describe Developer do
before(:each) do
@developer = Developer.new(ideal_developer_qualities_ hash)
end
it "should test drive" do
@developer.should be_test_driven
end
it "should know rails 2.x" do
@developer.experience.should include(:rails2)
end
it "should know REST patterns" do
@developer.experience.should include(:rest)
end
it "should use selenium" do
@developer.experience.should include(:selenium)
end
it "should use capistrano" do
@developer.experience.should include(:capistrano)
end
it "should receive competitive compensation" do
@developer.should respond_to(:competitive_compensation)
end
it "should like brains" do
@developer.should be_familiar_with('www.lumosity.com')
end
end
I know the 'it' thing seems a bit odd, that's just how rspec works. I could have created a new dsl but then he_or_she looked dumb too. I was tempted to add an "it puts the lotion in the basket" but figured that would be just a tad too creepy for a job listing.
While on Chicken John's bus trip to Death Valley, I participated in a couple of exquisite corpses. From wikipedia: "a method by which a collection of words or images are collectively assembled." Kind of fun. Can you guess which parts I did of these three-part doodles?
Coachella is three weeks away, and just like last year the lineup is mostly bands I've never heard of. Also, just like last year, the Coachella website provides 1-2 tracks for most of the artists scheduled so you can get an idea of what they sound like. However the website interface doesn't make the aural exploration easy. It only lets you listen to the songs, not annotate them with your opinions. I was resorting to scribbling down artists I liked by hand last year. So I've been working on a couple of things to make it easier. The first of these is my project for this week, the second may be the project for next week if I get it finished.
Here is the first: a ruby script that saves all 210 of the Coachella tracks as normal MP3s that you can play anywhere. It even inserts the correct artist/title info and image file into the ID3 structure of the MP3. This makes it easier as you can use the iTunes (or equivalent) rating system as you listen to the songs and then sort by rating later to get a quick list of the artists you want to see at the festival.
This was written for OS X, but if you are ruby savvy on Windows you can easily get it to work there too. The main hiccup will be that it depends on a third party library in order to manage the id3 song info in the mp3s. Download this library from Here's the steps, sudo will sometimes ask for your password, just type it in when asked:
- Save the pullem.rb script to your Music folder.
- Open a Terminal window
- Type: cd ~/Music
- Type: sudo port install id3lib
- Type: sudo env ARCHFLAGS="-arch i386" gem install id3lib-ruby -- --with-opt-dir=/opt/local
- Type: sudo gem install xml-simple
- Type: ruby pullem.rb
This will start the download. You'll end up with a little over 200 shiny new MP3s.
The port install id3lib part is what installs the third party library. In order for that to work you have to have DarwinPorts installed. And in order to have DarwinPorts, you will have installed the OS X Developer's Toolkit that came on your OS X DVDs when you bought your Mac. The toolkit is not installed by default.
If you are on Windows you can download a binary of the id3 library from sourceforge.
An interesting post titled More Data Usually Beats Better Algorithms shows how two teams using the different approaches fared in the Netflix Challenge. Here is the gist with a corroborating analysis of Google success:
But the bigger point is, adding more, independent data usually beats out designing ever-better algorithms to analyze an existing data set. I'm often suprised that many people in the business, and even in academia, don't realize this.
Another fine illustration of this principle comes from Google. Most people think Google's success is due to their brilliant algorithms, especially PageRank. In reality, the two big innovations ... were:
1. The recognition that hyperlinks were an important measure of popularity -- a link to a webpage counts as a vote for it.
2. The use of anchortext (the text of hyperlinks) in the web index, giving it a weight close to the page title.
First generation search engines had used only the text of the web pages themselves. The addition of these two additional data sets -- hyperlinks and anchortext -- took Google's search to the next level. The PageRank algorithm itself is a minor detail -- any halfway decent algorithm that exploited this additional data would have produced roughly comparable results.
This is interesting to me, as I tend to get seduced by the desire to tweak algorithms

