Hi! Blog author. This was an attempt a couple years ago to understand and write about this paper in a detailed way. Here is a video going through this topic as well: <a href="https://youtu.be/dKJEpOtVgXc?si=PDNO0B0qi6ARHaeb" rel="nofollow">https://youtu.be/dKJEpOtVgXc?si=PDNO0B0qi6ARHaeb</a><p>Section 2 of the blog post is no longer very relevant. A lot of advances (DSS, S4D) simplified that part of the process. Arguably also this all should be updated for Mamba (same authors).
A lot of intimidating math that will make all self-attention tutorials seem like a walk in the park in comparison. Luckily subsequent state space models building on S4 (DSS, S4D and newer ones like Mamba) simplified the primitives and the math used.