I know you guys probably won't reveal any proprietary information - but I'm so damn curious how this works. So I'm going to go out on a limb and try to guess how this works, and maybe extract some more specific information.<p>1) Recognizing the musical notes. From here (<a href="http://www.owlnet.rice.edu/~elec301/Projects02/realTime/301Project.html" rel="nofollow">http://www.owlnet.rice.edu/~elec301/Projects02/realTime/301P...</a>): "Fourier analysis allows us to decompose any such pressure function into a sum of
sinusoids. Therefore, any sound can be represented as a sum of sinusoids. If the sound has a pressure function that is aperiodic with respect to time, decomposition into sinusoids is quite complicated. However, if the sound is periodic with respect to time, it can be easily decomposed and transferred to the frequency domain using a computer and the Fast Fourier Transform." And since each note has a unique frequency/octave associated with it, it could be easily identified via a frequency to note database.<p>2) Recognizing the time-measure of the song (i.e., whether the song is 4/4 or 3/5), since this is required to do straight-forward Fourier transform and also perhaps to mark chord changes. I'm guessing this is either done by simple analysis of any periodic and consistent rises in the sound frequency of the song. Or perhaps, this is done via the same Fourier transform analysis of the sound waves and mapping out where the peaks fall.<p>3) Recognizing the chords; once you have figured out the notes and beat measure. The rest follows pretty easily, you have a chord database of all of the note-triads to chords and map out the chords accordingly. But the challenge there is, what if you have a rhythm guitar going at the same time while there's a solo? How do you map which notes to which guitar. Perhaps, the instruments are recorded onto different channels and you group on notes according on the degree to which they pan to the left, to the right, etc.<p>4) Separating out the instruments from one another; Maybe grouping notes via panning is not enough. Perhaps, you need to do some timbre analysis to group the notes that sound like a guitar vs. notes that sound like a bass guitar. Since each instrument has a distinct harmonics and overtone. You guys have some type of classification algorithm that classifies what portion of the sound belongs to what timbre of the instrument.<p>Any comments/response is appreciated.