Being employed at the Know-Center in Graz i’m trainied on plowing through scientific literature to find solutions to the problems i encounter. For beat detection i first went the usual route of asking Mr. Google what he thinks is the most relevant info on the topic. Looking at how Audiosurf performs on various tracks and following some discussions on the Audiosurf forums i somehow got the impression that Audiosurf does nothing to sophisticated. The blocks seem to be generated based on beats solely in various frequency bands, <500hz for kick drum and at higher frequencies for cymbals and similar percussion instruments. My search keywords thus consisted of the abvious "beat detection FFT spectogram rythm analysis" which turned ob this pretty badly written article on gamedev.net. Several people on the net reported that the approaches described in the article do not work well. This suspicion was verified by trying out the BeatDetector class of Minim, an excellent audio library for Processing. The beat detector is a direct implementation of the approaches described in the gamedev net article. I stripped all Processing related stuff from the Minim source so i can use it for initial analysis of the beat detection problem which worked out exceptionally well. However, the included beat detector does not perform all that well (which i don’t blame on the author of Minim, it’s really an excellent library, check it out!). I tested it on a variety of genres and the results were more than mixed. So the approach described is not suitable for my needs.
So i turned to the scientific literature instead. The term beat detection is not used there, instead onset detection is the term to search for. Onset detection is a more general notion of beat detection which tries to find any beginning of a note, be it for non-pitched instruments like drums or pitched instruments like flutes and so on. There’s a couple of interesting papers i found. I just want to mention to most promising.
First of there was a scientific competition a couple of years ago that focused on audio beat tracking, the Mirex Audio Beat Tracking Challenge 2006. A couple of keyplayers from the field of onset detection and beat tracking took part in this challenge which was executed on a standardized test set that is available on the site. Most noteable are the approaches by Simon Dixon, formerly employed at the Technical University of Vienna in the Artificial Intelligence Department. He basically uses a short time fourier transform (STFT) and computes a quantity he calls the spectral flux which allows to detect onsets. More information on this and the other approaches can be found in this paper. Follow the references in there for more interesting approaches.
Another very nice paper, or rather more a survey is called “A Comparison of Sound Onset Detection Algorithms with Emphasis on Psychoacoustically Motivated Detection Functions” by Collins. It summarizes and evaluates 13 different detection functions along with a peak detector (which is applied to the output of the detection functions to detect onsets). The most interesting approach in there for me is the one by Kapulari which again uses an STFT and does some magic on the result along with peak detection. The approach seems easy to implement and comparse very well to other more sophisticated approaches evaluated in the survey. Follow the references in this survey to get a good grasp of the state of the art in onset detection.
The last noteable paper i’d like to mention uses a pretty neat approach based on self-similarity. Basically an STFT is computed for overlapping sample windows and a vector is derived for each window. Next a similarity matrix is constructed (using cosine similarity) which gives insight on the structure of the song analysed. This reminds me a lot of a topic detection approach in natural language processing i once implemented at work. The beauty of such approaches is that you can use image processing algorithms to get the structure information out of the similarity matrix which is treated as a float grey scale image. You can find the original paper called “The Beat Spectrum: A New Approach To Rythm Analysis” here. Of course the approach is way to computationally complex for use on mobile devices but never the less worth mentioning.
One last thing i just discovered is BeatRoot, a java application for beat tracking/onset detection that can be found on Dixon’s site (see link above). It comes with source code which i will hack my way through tonight. The results i achieved with the gui frontend seem very promising and if the source contains the approach described in his paper (spectral flux) i have a nice template to base my own code on.
That’s it for now. There’s a lot more papers but the mentioned one should get anyone interested started on the state of the art.