While using LLMs I realized 2 points:<p>- I often still prefer Google, because I feel like I can get an answer quicker
- I'd rather ask a smaller LLM a few questions than gpt-4 just one
- The latency of LLMs is often enough to lose your momentum or abort the generation<p>So I asked myself how I could built the fastest LLM prompt for the CLI?
My best guess is to use the fastest language (Rust ) and the fastest LLM (Mixtral powered by <a href="https://groq.com" rel="nofollow">https://groq.com</a>)<p>And it's a game changer for me!
At this speed it can replace most Googling, reading man pages, looking stuff up, …
I can't wait to extend it with more features! =)<p>Do you any ideas how to get it even faster?