After playing around with several variations on the Marsyas MIREX 2007 feature extraction algorithm, I’ve come to a very subjective and unscientific conclusion. I will be using the basic algorithm with two slight modifications: I’m downsampling to 11250 Hz to increase the speed of analysis, as various studies indicate that this does not significantly affect accuracy, and I am using a 50 ms analysis window with 50% overlap (hop size of 512 samples and window size of 1024), as opposed to Marsyas’ original 0% overlap.
These slight changes were made on the very unscientific basis that with a set of 31 random songs from six different albums the results for “most similar songs” seemed more reasonable when using the overlap. Also, for whatever reason, the overlap algorithm was two seconds faster. All in all, I’m very impressed with the speed - on my computer, extracting features for 31 songs took 16 seconds. On a slower computer, this could probably be as much as 1.5 seconds per song, which is still quite competitive with other systems in the literature.
I still have a few things to iron out - right now, I am storing my extracted feature vectors in text files, one per song, but I imagine it would be more efficient to store this as binary data in a database. Also, I have yet to start on the actual Rhythmbox plugin - for a student belonging to GNOME, I have had very little to do with GNOME thus far! I imagine that will change as soon as I start running into challenges with the intricacies of the actual plugin.