TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Whisper.cpp example running fully in the browser

199 pointsby lawrencechenover 2 years ago

17 comments

sheepscreekover 2 years ago
I’ve been following along Whisper.com’s incredible progress.<p>It is a high quality piece of software which performs better on its intended hardware than any other implementation. It can easily be embedded anywhere. This is truly remarkable. A big shoutout to Georgi for this.<p>We need to remind ourselves that a part of him is choosing to give this away by open sourcing it. And he has gone through a lot of effort to make it easy to use and understand (just look at the documentation). Georgi to me, personifies every open-source author who put in their sweat and toil towards something that benefits our entire community.<p>Thank you Georgi. Salut, my friend!
评论 #34491924 未加载
mcemilgover 2 years ago
I recently laid off and currently trying to build some apps that could create some revenue that can afford my costs for some weeks. I built a transcription and dictation app for Mac [0] using whisper.cpp, small model works really well on 2019 mpb and m1 for streaming (dictation). It was really straight forward to use, however the streaming algorithm doesn&#x27;t ready for production so I implement my own algorithm using VAD. I believe in that with that pace this could also be fixed.<p>[0] <a href="https:&#x2F;&#x2F;apple.co&#x2F;3j2k8E7" rel="nofollow">https:&#x2F;&#x2F;apple.co&#x2F;3j2k8E7</a>
评论 #34490523 未加载
评论 #34492122 未加载
urbandw311erover 2 years ago
Seemed impressive enough to me, but I don&#x27;t know what the current best-in-class looks like these days. Can anybody working in this area explain if this is a significant milestone and what opportunities it might unlock? The consumer value proposition of basic speech-to-text input seems to be well-handled by most major OS&#x27;s, but I appreciate that&#x27;s proprietary tech and only one use case.
评论 #34488108 未加载
gary_0over 2 years ago
Wow, near-perfect transcription on desktop Firefox! Didn&#x27;t seem to work on Android Chrome, though.<p>I wonder if this can be sped up using WebGPU...
7373737373over 2 years ago
Does anyone know of a real time version of this, that can immediately transcribe individual words? Could be very useful for those hard of hearing.
评论 #34486820 未加载
评论 #34487590 未加载
zachlattaover 2 years ago
I highly recommend trying out <a href="https:&#x2F;&#x2F;whisper.ggerganov.com&#x2F;talk&#x2F;" rel="nofollow">https:&#x2F;&#x2F;whisper.ggerganov.com&#x2F;talk&#x2F;</a>. It lets you talk to GPT-2 using your voice, all running locally in your browser. Holy cow.
评论 #34499822 未加载
cloudkingover 2 years ago
Very cool, it works for videos too. Parsed a 1 minute video with ~95% transcription accuracy
samanatorover 2 years ago
This is incredible! Thank you for sharing! Did OpenAI release these pretrained models, or was the training done separately alone with this project?<p>Of OpenAI releases the pretrained models, why would we use their service?
评论 #34486886 未加载
edtechdevover 2 years ago
Would be interesting to see this connected to YouTube, to improve upon their auto generated transcripts. There is this command line version using YouTube-dl and OpenAI&#x27;s API <a href="https:&#x2F;&#x2F;simonwillison.net&#x2F;2022&#x2F;Sep&#x2F;30&#x2F;action-transcription&#x2F;" rel="nofollow">https:&#x2F;&#x2F;simonwillison.net&#x2F;2022&#x2F;Sep&#x2F;30&#x2F;action-transcription&#x2F;</a>
lsbover 2 years ago
Running in the latest safari iPhone browser I get the error:<p>failed to asynchronously prepare wasm: CompileError: WebAssembly.Module doesn&#x27;t parse at byte 5: can&#x27;t get Function local&#x27;s type in group 1, in function at index 9 Aborted(CompileError: WebAssembly.Module doesn&#x27;t parse at byte 5: can&#x27;t get Function local&#x27;s type in group 1, in function at index 9)
评论 #34486285 未加载
评论 #34486156 未加载
FloatArtifactover 2 years ago
My understanding is each inference run requires 30 seconds. Therefore anything processed process under 30 seconds is padded out with silence.<p>To my knowledge, nobody&#x27;s been able to work around this and it may not be possible without work. Upstream.
raybbover 2 years ago
If someone wants to self host you can also try this decent web interface: <a href="https:&#x2F;&#x2F;codeberg.org&#x2F;pluja&#x2F;web-whisper" rel="nofollow">https:&#x2F;&#x2F;codeberg.org&#x2F;pluja&#x2F;web-whisper</a><p>I&#x27;m not the creator, just a fan.
jonatronover 2 years ago
This might help out the timestamp guy for very long videos&#x2F;podcasts.
sheerunover 2 years ago
I think we should make standard browser API for transcribing, otherwise each website wanting to implement private voice recognition will need to download 500MB of data
评论 #34487666 未加载
Semaphorover 2 years ago
FF Win for the small model: Uncaught DOMException: IDBObjectStore.put: The serialized value is too large (size=487614318 bytes, max=267386880 bytes).
danielovichdkover 2 years ago
English model is really good. My native language Danish, not so much
jahnuover 2 years ago
Three clicks to find out what it is:<p>1: “Minimal whisper.cpp example running fully in the browser”<p>2: “Port of OpenAI&#x27;s Whisper model in C&#x2F;C++”<p>3: “Whisper is a general-purpose speech recognition model.”
评论 #34487173 未加载
评论 #34486738 未加载