TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Self-host Whisper As a Service with GUI and queueing

267 pointsby olekennethover 2 years ago
Schibsted created a transcription service for our journalists to transcribe audio interviews and podcasts really quick.

16 comments

sebastianvoelklover 2 years ago
The only thing Whisper misses is speaker diarization. I'm currently working on a model that uses Whisper + pyannote to transcribe Interviews and also detects who is speaking. It's working but damn it takes so long
评论 #34775561 未加载
评论 #34807123 未加载
评论 #34772664 未加载
评论 #34772573 未加载
评论 #34774477 未加载
henry_viiiover 2 years ago
By the way there is also another project called Whisper.cpp:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;whisper.cpp">https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;whisper.cpp</a><p>Which uses x8 less memory than the Python implementation for the tiny model. It would be a good idea to keep an eye on it since there are Python bindings planned on the roadmap:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;whisper.cpp#bindings">https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;whisper.cpp#bindings</a>
评论 #34772477 未加载
评论 #34773566 未加载
评论 #34772843 未加载
评论 #34780828 未加载
adlpzover 2 years ago
I understand this is self-hosting the OpenAI Whisper model (which I see is fully MIT-licensed, weights and all). So not calling any OpenAI APIs like other GPT-related tools do.<p>Am I correct on this? The README is not explicit.
评论 #34771861 未加载
silviotover 2 years ago
People interested in this might also be interested in transcribe-anything [1].<p>It automates video fetching and uses whisper to generate .srt, .vtt and .txt files.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;zackees&#x2F;transcribe-anything">https:&#x2F;&#x2F;github.com&#x2F;zackees&#x2F;transcribe-anything</a>
评论 #34772549 未加载
raybbover 2 years ago
Whisper-UI is also looking really nice lately but I think it&#x27;s still pretty early in development. The ability to click on the transcript and hear the sound of that particular moment is great. <a href="https:&#x2F;&#x2F;github.com&#x2F;hayabhay&#x2F;whisper-ui">https:&#x2F;&#x2F;github.com&#x2F;hayabhay&#x2F;whisper-ui</a>
magicsethover 2 years ago
Is it possible to create a streaming endpoint that returns real-time transcriptions?
评论 #34774249 未加载
评论 #34779991 未加载
monkeydustover 2 years ago
Run this locally for a few work related tasks. One useful feature is being able to provide in your own &#x27;jargon&#x27; in the initial prompt which improves recognition quality (&#x27;--initial_prompt &#x27;jargon1 jargon 2 ... &#x27;)
tsychoover 2 years ago
Is there an open source speech recognition model which can be restricted to a smaller domain-specific dictionary?<p>Use case: I want to transcribe my poker hands while playing, eg: &quot;Flop was 2 of spaces, 3 of diamonds and King of spades&quot;, &quot;Button raised to $20&quot; etc.<p>When I tried using Whisper and some other model, the recognition accuracy was atrocious, and it kept finding non-poker words that sounded similar to poker words. I want to restrict its search space to my own list of poker words which should significantly increase the accuracy (theoretically).<p>Any suggestions on how to go about this?
评论 #34779581 未加载
评论 #34779645 未加载
elliotpageover 2 years ago
This looks really good, thanks! Really appreciate this and all the other Whisper implementations in this thread as I am sorting up transcriptions for my 120+ podcast episodes.
评论 #34778049 未加载
jonititanover 2 years ago
That&#x27;s very interesting. I&#x27;ve been using whisper via pip also but I&#x27;m surprised you haven&#x27;t sought to optimize whisper at all?<p>I&#x27;ve been looking at using compilation in torch but not successful yet as otherwise it can take awhile to run. <a href="https:&#x2F;&#x2F;pytorch.org&#x2F;tutorials&#x2F;intermediate&#x2F;torch_compile_tutorial.html" rel="nofollow">https:&#x2F;&#x2F;pytorch.org&#x2F;tutorials&#x2F;intermediate&#x2F;torch_compile_tut...</a>
sgtover 2 years ago
Is the Whisper model better than say Youtube&#x27;s auto transcribing? I hope it is because the one on YT gets so much wrong it&#x27;s almost comical.
评论 #34772661 未加载
评论 #34781147 未加载
评论 #34772563 未加载
评论 #34772442 未加载
INTPenisover 2 years ago
Has anyone looked at the code? Because as a Swedish citizen I must say that anything I use by Schibsted is a hot mess from a UX perspective.
评论 #34778025 未加载
deskamessover 2 years ago
Related&#x2F;Off Topic: Is there a documented way to improve the accuracy of a particular language model? Say we can put in the effort to collect 1000&#x27;s of verified&#x2F;transcribed samples of a language that is currently scored poorly (WER). What steps do I have to take to get those improvements into the system?
评论 #34774367 未加载
teucrisover 2 years ago
Very cool - I have a homegrown setup where a script scans my iCloud audio notes directory and generates transcriptions for any new notes. Works like a charm.
评论 #34785904 未加载
mklover 2 years ago
Looks interesting. I noticed that the README says &quot;containe&quot; or &quot;containes&quot; several times, where I think you mean &quot;container(s)&quot;.
评论 #34771866 未加载
MitPittover 2 years ago
There&#x27;s no GUI though
评论 #34778079 未加载