tldr: I made a global hotkey on my laptop that will record my voice while I hold it down, transcribe the result and type out what it thinks I said.<p>Background:<p>My laptop was stolen recently and the new one I got to replace it has an Intel NPU [0] in it. The promise of the NPU is running small machine learning models efficiently on mobile hardware. I thought a good application of this would be using whisper to transcribe speech into text. There’s not really much [1] out there on Linux that can do this right now which is kind of a bummer because it’s a big accessibility thing to be able to type with your voice. I use my Sway configuration [2] to map the right control key to run a wrapper program [3] and then the release of the right control key to send a SIGINT to that program. The wrapper catches the SIGINT, ends transcription, and types the transcribed text into the focused application with the `enigo` crate.<p>Repo link: <a href="https://github.com/ellenhp/whisper-npu-server">https://github.com/ellenhp/whisper-npu-server</a><p>This is not one of my high polish projects, but I did want to throw it out there into the world, especially because the OpenVINO project doesn't have any containerized NPU examples, even for LLMs.<p>[0] <a href="https://intel.github.io/intel-npu-acceleration-library/npu.html" rel="nofollow">https://intel.github.io/intel-npu-acceleration-library/npu.h...</a><p>[1] I found this, and based some of my code on it: <a href="https://github.com/oddlama/whisper-overlay">https://github.com/oddlama/whisper-overlay</a><p>[2] See end of post for example.<p>[3] <a href="https://github.com/ellenhp/whisper-transcription-wayland/">https://github.com/ellenhp/whisper-transcription-wayland/</a><p>Sample Sway config:<p>bindsym --no-repeat Control_R exec "whisper-transcription"<p>bindsym --release Control_R exec killall -2 whisper-transcription
This is really cool, I wish I could use it on my Windows 11 X1 Carbon which also comes with a NPU.<p>I was quite disappointed that the dictation tool Lenovo praised on their website for this "CoPilot+ AI PC" turned out to be a shortcut to the Windows 11 transcription tool. My hope was that they were indeed putting the NPU to use for that. Other than that this is a great machine. In any case, Lenovo should fire whoever is responsible for that feature and hire you instead.<p>It would also be great to see a demo of how this works on your machine. Thank you for sharing it at all, though!