TechEcho

16 comments

The only thing Whisper misses is speaker diarization. I'm currently working on a model that uses Whisper + pyannote to transcribe Interviews and also detects who is speaking. It's working but damn it takes so long

评论 #34775561 未加载

评论 #34807123 未加载

评论 #34772664 未加载

评论 #34772573 未加载

评论 #34774477 未加载

henry_viiiover 2 years ago

By the way there is also another project called Whisper.cpp:<a href="https://github.com/ggerganov/whisper.cpp">https://github.com/ggerganov/whisper.cpp</a>Which uses x8 less memory than the Python implementation for the tiny model. It would be a good idea to keep an eye on it since there are Python bindings planned on the roadmap:<a href="https://github.com/ggerganov/whisper.cpp#bindings">https://github.com/ggerganov/whisper.cpp#bindings</a>

评论 #34772477 未加载

评论 #34773566 未加载

评论 #34772843 未加载

评论 #34780828 未加载

adlpzover 2 years ago

I understand this is self-hosting the OpenAI Whisper model (which I see is fully MIT-licensed, weights and all). So not calling any OpenAI APIs like other GPT-related tools do.Am I correct on this? The README is not explicit.

评论 #34771861 未加载

silviotover 2 years ago

People interested in this might also be interested in transcribe-anything [1].It automates video fetching and uses whisper to generate .srt, .vtt and .txt files.[1] <a href="https://github.com/zackees/transcribe-anything">https://github.com/zackees/transcribe-anything</a>

评论 #34772549 未加载

raybbover 2 years ago

Whisper-UI is also looking really nice lately but I think it's still pretty early in development. The ability to click on the transcript and hear the sound of that particular moment is great. <a href="https://github.com/hayabhay/whisper-ui">https://github.com/hayabhay/whisper-ui</a>

magicsethover 2 years ago

Is it possible to create a streaming endpoint that returns real-time transcriptions?

评论 #34774249 未加载

评论 #34779991 未加载

monkeydustover 2 years ago

Run this locally for a few work related tasks. One useful feature is being able to provide in your own 'jargon' in the initial prompt which improves recognition quality ('--initial_prompt 'jargon1 jargon 2 ... ')

tsychoover 2 years ago

Is there an open source speech recognition model which can be restricted to a smaller domain-specific dictionary?Use case: I want to transcribe my poker hands while playing, eg: "Flop was 2 of spaces, 3 of diamonds and King of spades", "Button raised to $20" etc.When I tried using Whisper and some other model, the recognition accuracy was atrocious, and it kept finding non-poker words that sounded similar to poker words. I want to restrict its search space to my own list of poker words which should significantly increase the accuracy (theoretically).Any suggestions on how to go about this?

评论 #34779581 未加载

评论 #34779645 未加载

elliotpageover 2 years ago

This looks really good, thanks! Really appreciate this and all the other Whisper implementations in this thread as I am sorting up transcriptions for my 120+ podcast episodes.

评论 #34778049 未加载

jonititanover 2 years ago

That's very interesting. I've been using whisper via pip also but I'm surprised you haven't sought to optimize whisper at all?I've been looking at using compilation in torch but not successful yet as otherwise it can take awhile to run. <a href="https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html" rel="nofollow">https://pytorch.org/tutorials/intermediate/torch_compile_tut...</a>

sgtover 2 years ago

Is the Whisper model better than say Youtube's auto transcribing? I hope it is because the one on YT gets so much wrong it's almost comical.

评论 #34772661 未加载

评论 #34781147 未加载

评论 #34772563 未加载

评论 #34772442 未加载

INTPenisover 2 years ago

Has anyone looked at the code? Because as a Swedish citizen I must say that anything I use by Schibsted is a hot mess from a UX perspective.

评论 #34778025 未加载

deskamessover 2 years ago

Related/Off Topic: Is there a documented way to improve the accuracy of a particular language model? Say we can put in the effort to collect 1000's of verified/transcribed samples of a language that is currently scored poorly (WER). What steps do I have to take to get those improvements into the system?

评论 #34774367 未加载

teucrisover 2 years ago

Very cool - I have a homegrown setup where a script scans my iCloud audio notes directory and generates transcriptions for any new notes. Works like a charm.

评论 #34785904 未加载

mklover 2 years ago

Looks interesting. I noticed that the README says "containe" or "containes" several times, where I think you mean "container(s)".

评论 #34771866 未加载

MitPittover 2 years ago

There's no GUI though

评论 #34778079 未加载

16 comments

sebastianvoelklover 2 years ago

评论 #34775561 未加载

评论 #34807123 未加载

评论 #34772664 未加载

评论 #34772573 未加载

评论 #34774477 未加载

henry_viiiover 2 years ago

评论 #34772477 未加载

评论 #34773566 未加载

评论 #34772843 未加载

评论 #34780828 未加载

adlpzover 2 years ago

评论 #34771861 未加载

silviotover 2 years ago

评论 #34772549 未加载

raybbover 2 years ago

magicsethover 2 years ago

Is it possible to create a streaming endpoint that returns real-time transcriptions?

评论 #34774249 未加载

评论 #34779991 未加载

monkeydustover 2 years ago

tsychoover 2 years ago

评论 #34779581 未加载

评论 #34779645 未加载

elliotpageover 2 years ago

This looks really good, thanks! Really appreciate this and all the other Whisper implementations in this thread as I am sorting up transcriptions for my 120+ podcast episodes.

评论 #34778049 未加载

jonititanover 2 years ago

sgtover 2 years ago

Is the Whisper model better than say Youtube's auto transcribing? I hope it is because the one on YT gets so much wrong it's almost comical.

评论 #34772661 未加载

评论 #34781147 未加载

评论 #34772563 未加载

评论 #34772442 未加载

INTPenisover 2 years ago

Has anyone looked at the code? Because as a Swedish citizen I must say that anything I use by Schibsted is a hot mess from a UX perspective.

评论 #34778025 未加载

deskamessover 2 years ago

评论 #34774367 未加载

teucrisover 2 years ago

Very cool - I have a homegrown setup where a script scans my iCloud audio notes directory and generates transcriptions for any new notes. Works like a charm.

评论 #34785904 未加载

mklover 2 years ago

Looks interesting. I noticed that the README says "containe" or "containes" several times, where I think you mean "container(s)".

评论 #34771866 未加载

MitPittover 2 years ago

There's no GUI though

评论 #34778079 未加载

Show HN: Self-host Whisper As a Service with GUI and queueing

16 comments

Show HN: Self-host Whisper As a Service with GUI and queueing

16 comments