TechEcho

17 comments

I’ve been following along Whisper.com’s incredible progress.It is a high quality piece of software which performs better on its intended hardware than any other implementation. It can easily be embedded anywhere. This is truly remarkable. A big shoutout to Georgi for this.We need to remind ourselves that a part of him is choosing to give this away by open sourcing it. And he has gone through a lot of effort to make it easy to use and understand (just look at the documentation). Georgi to me, personifies every open-source author who put in their sweat and toil towards something that benefits our entire community.Thank you Georgi. Salut, my friend!

评论 #34491924 未加载

mcemilgover 2 years ago

I recently laid off and currently trying to build some apps that could create some revenue that can afford my costs for some weeks. I built a transcription and dictation app for Mac [0] using whisper.cpp, small model works really well on 2019 mpb and m1 for streaming (dictation). It was really straight forward to use, however the streaming algorithm doesn't ready for production so I implement my own algorithm using VAD. I believe in that with that pace this could also be fixed.[0] <a href="https://apple.co/3j2k8E7" rel="nofollow">https://apple.co/3j2k8E7</a>

评论 #34490523 未加载

评论 #34492122 未加载

urbandw311erover 2 years ago

Seemed impressive enough to me, but I don't know what the current best-in-class looks like these days. Can anybody working in this area explain if this is a significant milestone and what opportunities it might unlock? The consumer value proposition of basic speech-to-text input seems to be well-handled by most major OS's, but I appreciate that's proprietary tech and only one use case.

评论 #34488108 未加载

gary_0over 2 years ago

Wow, near-perfect transcription on desktop Firefox! Didn't seem to work on Android Chrome, though.I wonder if this can be sped up using WebGPU...

7373737373over 2 years ago

Does anyone know of a real time version of this, that can immediately transcribe individual words? Could be very useful for those hard of hearing.

评论 #34486820 未加载

评论 #34487590 未加载

zachlattaover 2 years ago

I highly recommend trying out <a href="https://whisper.ggerganov.com/talk/" rel="nofollow">https://whisper.ggerganov.com/talk/</a>. It lets you talk to GPT-2 using your voice, all running locally in your browser. Holy cow.

评论 #34499822 未加载

cloudkingover 2 years ago

Very cool, it works for videos too. Parsed a 1 minute video with ~95% transcription accuracy

samanatorover 2 years ago

This is incredible! Thank you for sharing! Did OpenAI release these pretrained models, or was the training done separately alone with this project?Of OpenAI releases the pretrained models, why would we use their service?

评论 #34486886 未加载

edtechdevover 2 years ago

Would be interesting to see this connected to YouTube, to improve upon their auto generated transcripts. There is this command line version using YouTube-dl and OpenAI's API <a href="https://simonwillison.net/2022/Sep/30/action-transcription/" rel="nofollow">https://simonwillison.net/2022/Sep/30/action-transcription/</a>

lsbover 2 years ago

Running in the latest safari iPhone browser I get the error:failed to asynchronously prepare wasm: CompileError: WebAssembly.Module doesn't parse at byte 5: can't get Function local's type in group 1, in function at index 9 Aborted(CompileError: WebAssembly.Module doesn't parse at byte 5: can't get Function local's type in group 1, in function at index 9)

评论 #34486285 未加载

评论 #34486156 未加载

FloatArtifactover 2 years ago

My understanding is each inference run requires 30 seconds. Therefore anything processed process under 30 seconds is padded out with silence.To my knowledge, nobody's been able to work around this and it may not be possible without work. Upstream.

raybbover 2 years ago

If someone wants to self host you can also try this decent web interface: <a href="https://codeberg.org/pluja/web-whisper" rel="nofollow">https://codeberg.org/pluja/web-whisper</a>I'm not the creator, just a fan.

jonatronover 2 years ago

This might help out the timestamp guy for very long videos/podcasts.

sheerunover 2 years ago

I think we should make standard browser API for transcribing, otherwise each website wanting to implement private voice recognition will need to download 500MB of data

评论 #34487666 未加载

Semaphorover 2 years ago

FF Win for the small model: Uncaught DOMException: IDBObjectStore.put: The serialized value is too large (size=487614318 bytes, max=267386880 bytes).

danielovichdkover 2 years ago

English model is really good. My native language Danish, not so much

jahnuover 2 years ago

Three clicks to find out what it is:1: “Minimal whisper.cpp example running fully in the browser”2: “Port of OpenAI's Whisper model in C/C++”3: “Whisper is a general-purpose speech recognition model.”

评论 #34487173 未加载

评论 #34486738 未加载

17 comments

sheepscreekover 2 years ago

评论 #34491924 未加载

mcemilgover 2 years ago

评论 #34490523 未加载

评论 #34492122 未加载

urbandw311erover 2 years ago

评论 #34488108 未加载

gary_0over 2 years ago

Wow, near-perfect transcription on desktop Firefox! Didn't seem to work on Android Chrome, though.I wonder if this can be sped up using WebGPU...

7373737373over 2 years ago

Does anyone know of a real time version of this, that can immediately transcribe individual words? Could be very useful for those hard of hearing.

评论 #34486820 未加载

评论 #34487590 未加载

zachlattaover 2 years ago

评论 #34499822 未加载

cloudkingover 2 years ago

Very cool, it works for videos too. Parsed a 1 minute video with ~95% transcription accuracy

samanatorover 2 years ago

评论 #34486886 未加载

edtechdevover 2 years ago

lsbover 2 years ago

评论 #34486285 未加载

评论 #34486156 未加载

FloatArtifactover 2 years ago

raybbover 2 years ago

jonatronover 2 years ago

This might help out the timestamp guy for very long videos/podcasts.

sheerunover 2 years ago

I think we should make standard browser API for transcribing, otherwise each website wanting to implement private voice recognition will need to download 500MB of data

评论 #34487666 未加载

Semaphorover 2 years ago

FF Win for the small model: Uncaught DOMException: IDBObjectStore.put: The serialized value is too large (size=487614318 bytes, max=267386880 bytes).

Whisper.cpp example running fully in the browser

17 comments

Whisper.cpp example running fully in the browser

17 comments