Music is separated into stems using a version of Demucs modified to support streaming. Pitched stems are then processed by a machine learning model trained to detect the fundamental frequencies in the music. Finally, the fundamental frequencies are post-processed and combined with magnitudes from the FFT to produce the visualization. A tempo detection model runs in parallel on the client to generate the beat grids.