The key takeaway for me:<p>Whether data is continuous or discrete, no matter its modality (text, video, music, etc.), we now have an array of proven methods for representing it with <i>discrete</i> tokens, enabling us to use existing sequence modeling architectures (Transformers, linear RNNs).<p>We live in interesting times!