Soundex for Ruby

December 23rd, 2005

I’ve found two implementations of the enhanced Soundex algorithm for Ruby. The first is written by Mike Stok, and is a port of his Perl implementation:

:::ruby
def soundex(string)
  copy = string.upcase.tr '^A-Z', ''
  return nil if copy.empty?
  first_letter = copy[0, 1]
  copy.tr_s! 'AEHIOUWYBFPVCGJKQSXZDTLMNR', '00000000111122222222334556'
  copy.sub!(/^(.)\1*/, '').gsub!(/0/, '')
  "#{first_letter}#{copy.ljust(3,"0")}"
end

The other is written by Michael Neumann and is available at the Ruby Application Archive.

I would recommend using Mike Stok’s implementation unless you need to be able to pass in an array. It’s twice as fast as the one at the RAA site. Both algorithms tested over 10,000 iterations:

                user        system    total    real
stok soundex:  0.290000   0.010000   0.300000 (  0.350133)
 raa soundex:  0.600000   0.010000   0.610000 (  0.708554)

Finding Nemo

December 23rd, 2005

Ioan, Mika, and I went out for happy hour the other night and some how we got onto the topic of searching for names in a database when you weren’t sure of the spelling. This is a pretty easy thing to do using soundex, which is a simple and fairly effective algorithm.

If you aren’t familiar with soundex, then you might want to read up on this wikipedia article before going any further.

There are a couple of different variations of the soundex algorithm, so if you are going to use it you need to be aware of the differences. The original version discards vowels before removing duplicate letters, and the newer enhanced version of the algorithm removes the duplicated before discarding vowels. This has the effect that some names will have a different soundex code depending on which version of the algorithm is used.

Lets look at a couple of examples on the command line using PHP and MySQL (PHP uses the enhanced soundex algorithm and MySQL uses the original):

php -r "echo soundex('nemo');"
N500

mysql
mysql> select soundex("nemo");
+-----------------+
| soundex("nemo") |
+-----------------+
| N000            |
+-----------------+
1 row in set (0.02 sec)

As we can see, these return different results, so we can’t use them interchangeably. Since we need MySQL’s help here, we’re going to have to do the entire comparison in MySQL. MySQL supports a special SOUNDS LIKE sytax which is the same as saying SOUNDEX(expression1) = SOUNDEX(expression2).

PHP and MySQL

:::php
$name = "nemo";
$sql = "SELECT * FROM customers ";
$sql .= "WHERE first_name SOUNDS LIKE '{$name}'";
$result = mysql_query($sql);

Ruby on Rails While we’re at it let’s look at how to do it in Rails with an ActiveRecord model. Assuming we have a Customer model:

:::ruby
name = "nemo"
customers = Customer.find_all :conditions => ["first_name SOUNDS LIKE ?", name]

Really simple, but helpful stuff.

Published

December 8th, 2005

I just got confirmation that one of my photos is included in Texas State University’s latest Study Abroad brochure for the College of Liberal Arts. I was contacted by the University to get permission to use the photo a couple of months ago, but I had forgotten about it until yesterday when I received some copies of the brochure.

The photo is of the basilica in Guanajuato, Mexico and is located at the top center of page 2. You can see this photo better in my photo gallery.

Another one of my photos is due to be published in Splendid Pathways: A Tour Through the World’s Finest Botanical Gardens, although I haven’t got confirmation that it made it through the editing process.

It’s a small thing, but cool.

Has Many Through Association

December 6th, 2005

Yesterday, DHH posted a link to his Pursuit of Beauty slides from the Snakes and Rubies event this weekend.

I went through each of the slides looking for new stuff and found several great new things. If you look at slide 14, you’ll see a new :through parameter on one of the associations. There’s no documentation on this yet, but I did a little experimentation by checking out the latest edge_rails.

To see how this new type of association works, let’s look at the traditional way to handle many-to-many relationships when we want to store additional attributes about the join.

Lets take a simple example to illustrate how we can use the new functionality. Let’s say that we publish several newsletters and we let users sign up for as many of these newsletters as they want. We also need to to track several things about each subscription, such as the email format the user would like to receive that newsletter in. What we want to do in this case is make the join table a model. Let’s call it Subscription. We would have three tables: users, subscriptions, newsletters. The models would be set up like this:

:::ruby
class User < ActiveRecord::Base
  has_many :subscriptions
end

class Newsletter < ActiveRecord::Base
  has_many :subscriptions
end

class Subscription < ActiveRecord::Base
  belongs_to :user
  belongs_to :newsletter
end

Let’s say that we want to see a list of what newsletters John Doe has subscribed to:

:::ruby
@newsletters = []
User.find_by_name("John Doe").subscriptions.each do |s|
  @newsletters << s.newsletter
end

This works great, but it isn’t very elegant. It would be much nicer if we could just get all the newsletters without having to walk through the subscriptions.

Let’s add the :through associations to the models:

:::ruby
class User < ActiveRecord::Base
  has_many :subscriptions
  has_many :newsletters, :through => :subscriptions
end

class Newsletter < ActiveRecord::Base
  has_many :subscriptions
  has_many :users, :through => :subscriptions
end

class Subscription < ActiveRecord::Base
  belongs_to :user
  belongs_to :newsletter
end

Now we can just access the associated models directly:

:::ruby
@newsletters = User.find_by_name("John Doe").newsletters

There truly is beauty in simplicity.

Update I’ve updated this article with some better examples based on user feedback.

Stamp of Approval

December 2nd, 2005

I reviewed and approved 147 comments in the genealogy section. Another 7 comments were reviewed and hidden, as they contained too much information on living people. I also updated the database based on info from the comments.

I’ve completely neglected these comments for well over a year now. The earliest comment was from May of 2004! I think I’m pretty well burned out on all the genealogy research/correspondence after 15-20 years, but I need to at least keep the comments up to date and respond to questions when I can.

Anyway, it seems like this GTD stuff might be working for me.