How fast are your favorite LLMs? I recently saw a Reddit post where someone was able to get a distilled version of Deepseek R1 running on a Raspberry Pi. It could generate output at a whopping 1.97 tokens per second. That sounds slow. Is that even usable? I don’t know!<p>Meanwhile, Mistral announced that their Le Chat platform can output tokens at 1,100 per second! That sounds pretty fast? How fast? I don’t know!<p>So, that’s why I put together TokenFlow. It’s a (very!) simple webpage that lets you see the (theoretical) speed of different LLMs in action. You can select from a few preset models / services or enter a custom speed in tokens per second. You can then watch it spit out tokens in real time, showing you exactly how fast a given inference speed is and how it impacts user experience.<p>Check it out: <a href="https://dave.ly/tokenflow/" rel="nofollow">https://dave.ly/tokenflow/</a><p>Github: <a href="https://github.com/daveschumaker/tokenflow">https://github.com/daveschumaker/tokenflow</a>