On Hoatzin
January 16th, 2011 Comments Off
Hoatzin is a classification gem that I’ve released as part of a drive to open source more of our code at Rattle. It’s a classification gem in the vain of classifier or ankusa but using a Support Vector Machine to do the classification rather than another method such as Naive Bayes. Why is this a good thing ? Well, in theory an SVM should outperform the Naive Bayes method for a start, plus there don’t seem to be any SVM based classifiers out there, so it seemed a good time to push our classifier out. The code was originally based on an excellent igvita.com article , however it has been packaged up, switched to use the libsvm-ruby-swig gem and has had some performance tweaks applied to improve training times.
A quick and dirty example is shown below :
require 'rubygems'
require 'hoatzin'
# Create a hoatzin classifier
c = Hoatzin::Classifier.new()
# Train the classifier with a classification and some text
c.train(:positive, "Thats nice")
# This will return the most likely classification (:positive)
c.classify("Thats nice")
The gem uses a slightly modified vector space model from Joseph Wilkes rSemantic (a latent semantic indexing gem) to improve the classification speed, so training should be much faster than the sample code in the igvita article.
I’ve achieved an 83% accuracy against the movie reviews sentiment analysis corpus (around 2000 docs) with no optimisation, so it’s performance is reasonable (humans only achieve around an 80% success rate classifying sentiment).
It’s available from RubyGems and you can find the source on Github.