TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Whisper.api: Open-source, self-hosted speech-to-text with fast transcription

216 点作者 innovatorved超过 1 年前

15 条评论

nchudleigh超过 1 年前
This is awesome.<p>For anyone confused about the project, it is using whisper.cpp, a C-based runner and translation of the open whisper model from OpenAI. It is built by the team behind GGML and llama.cpp. <a href="https:&#x2F;&#x2F;github.com&#x2F;ggerganov">https:&#x2F;&#x2F;github.com&#x2F;ggerganov</a><p>You can fork this code, run it on your own server, and hit the API. The server itself will use FFmpeg to convert the audio file into the required format and run the C translation of the whisper model against the file.<p>By doing this you can separate yourself from the requirement of paying the fee that OpenAI charges for their Whisper service and fully own your translations. The models that the author has supplied here are rather small but should run decent on a CPU. If you want to go to larger model sizes you would likely need to change the compilation options and use a server with a GPU.<p>Similar to this project, my product <a href="https:&#x2F;&#x2F;superwhisper.com" rel="nofollow noreferrer">https:&#x2F;&#x2F;superwhisper.com</a> is using these whisper.cpp models to provide really good Dictation on macOS.<p>Its runs really fast on the M series chips. Most of this message was dictated using superwhisper.<p>Congrats to the author of this project. Seems like a useful implementation of the whisper.cpp project.<p>I wonder if they would accept it upstream in the examples.
评论 #37229743 未加载
innovatorved超过 1 年前
Many of you are asking if the project is completely self-hosted and does not rely on any third-party services. Yes, it is completely self-hosted and does not rely on any third-party services. The user is for authentication, so no one can use the service without authentication.
评论 #37228023 未加载
评论 #37243135 未加载
评论 #37228916 未加载
innovatorved超过 1 年前
Whisper API - Speech to Text Transcription<p>This open source project provides a self-hostable API for speech to text transcription using a finetuned Whisper ASR model. The API allows you to easily convert audio files to text through HTTP requests. Ideal for adding speech recognition capabilities to your applications.<p>Key features:<p>- Uses a finetuned Whisper model for accurate speech recognition - Simple HTTP API for audio file transcription - User level access with API keys for managing usage - Self-hostable code for your own speech transcription service - Quantized model optimization for fast and efficient inference - Open source implementation for customization and transparency
评论 #37228059 未加载
评论 #37227792 未加载
评论 #37227687 未加载
评论 #37228438 未加载
ChrisArchitect超过 1 年前
Not to be confused with<p><i>Whisper – open source speech recognition by OpenAI</i> <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34985848">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34985848</a>
评论 #37228434 未加载
评论 #37228586 未加载
edgarvaldes超过 1 年前
Related to whisper: whisperX is a god send. I can finally watch old or uncommon tv series with subtitles.
评论 #37228290 未加载
pizzafeelsright超过 1 年前
This is not fully self-hosted so much as middle-ware, no?
评论 #37227844 未加载
评论 #37227855 未加载
geekodour超过 1 年前
Nice! This will be very useful for me. Think I can run this locally can spin a basic telegram bot around it for personal use.<p>One issue I faced with all the whisper based transcript generators is that there seems to be no good way to make editing&#x2F;correcting the generated text with word level timestamp. I created a small web based tool[0] for that.<p>By any chance if anyone is looking to edit transcripts generated using whisper, you&#x27;d probably find it useful.<p>[0] <a href="https:&#x2F;&#x2F;github.com&#x2F;geekodour&#x2F;wscribe-editor">https:&#x2F;&#x2F;github.com&#x2F;geekodour&#x2F;wscribe-editor</a>
LeoPanthera超过 1 年前
So is &quot;real time&quot; translation a thing yet? I&#x27;ve long wanted to be able watch non-english television and have the audio translated into English subtitles. It&#x27;s doable for pre-recorded things, but not for live.<p>An iPhone app that could do this from the microphone would also be amazing. Google Translate and it&#x27;s various competitors from Microsoft&#x2F;Apple are nearly there, but they all stop listening inbetween sentences. Something that just listened constantly, printing translated text onto the screen, would be amazing.
评论 #37228201 未加载
评论 #37230876 未加载
评论 #37230473 未加载
评论 #37228379 未加载
distantsounds超过 1 年前
how is this open source, or self-hosted, when it requires an API key and a login from a third party?
评论 #37227868 未加载
v7n超过 1 年前
Many live streamers, and platforms, would love to have custom real-time transcription elements. I actually looked into this exact project of yours when I thought about creating such a thing.<p>Even if it meant delaying the broadcast for a second while transcribing the accessibility value could be immense.
Dig1t超过 1 年前
&gt;Get Your token<p>If it&#x27;s completely self-hosted why do I need to get a token? Where does the actual model run?
评论 #37228219 未加载
pdntspa超过 1 年前
So Whisper is all the rage with speech-to-text, but what about text-to-speech?
评论 #37232779 未加载
grzes超过 1 年前
i dont understand the excitement here. it&#x27;s just a HTTP wrapper for CLI command. you can build it easily on your own with any decent RAD framework
1024core超过 1 年前
Does Android OS come with ASR?
评论 #37227969 未加载
tnhoang088超过 1 年前
Thank you. I love it