TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Would you recommend OpenAI Whisper for Speech to text?

10 点作者 jerrygoyal大约 2 年前
I'm building a product that requires speech-to-text. I'm thinking of going with Whisper as it seems cheap $0.006/min and heard the transcribed text quality is good. Are there any better alternatives?

9 条评论

drag0s大约 2 年前
- AssemblyAI was the winner for the tests we did some months ago, very reliable and accurate.<p>- Deepgram also looks interesting, recently they released a new model (Nova), they also offer Whisper for a cheaper price ($0.0048&#x2F;min), I&#x27;ve briefly played a little bit with it but the DX looked a bit bad. They&#x27;re also offering $200 in credits now.<p>- If you&#x27;re on a really tight budget. Most browsers [1] support the SpeechRecognition API [2] where you can transcribe for free. Depends on the browser it works better, for example in Google Chrome it works excellent as the browser actually sends the audio to the cloud (probably uses GCP&#x27;s Google Cloud Speech to Text)<p>[1] <a href="https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;SpeechRecognition#browser_compatibility" rel="nofollow">https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;SpeechRecog...</a> [2] <a href="https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;SpeechRecognition" rel="nofollow">https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;SpeechRecog...</a>
FloatArtifact大约 2 年前
I&#x27;ve experimented with whisper. I don&#x27;t know of a way to do commands without parsing dictation. Bottom line, the model has to pass 30 seconds of audio to my knowledge. So say if you&#x27;re utterance is 5 seconds, you&#x27;ll need 25 seconds of silence.<p>Depending on the platform you&#x27;re targeting.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;dictation-toolbox&#x2F;dragonfly">https:&#x2F;&#x2F;github.com&#x2F;dictation-toolbox&#x2F;dragonfly</a> Might be interesting to you.
tikkun大约 2 年前
I&#x27;ve tried a few:<p>Whisper is cheapest<p>AssemblyAI and Google Cloud Speech to Text are more accurate<p>Overall, I wouldn&#x27;t recommend Whisper unless the transcription accuracy doesn&#x27;t need to be high. I&#x27;m hoping they release the &quot;GPT-4&quot; equivalent of Whisper.
satvikpendem大约 2 年前
You can self host it too if you want, that&#x27;s the good part about Whisper, since it&#x27;s open source.
qup大约 2 年前
I&#x27;ve been using whisper since it was there and it&#x27;s also open source and I know I can host my own. I use it with I would say 95% accuracy, possibly more.<p>I&#x27;m interacting with GPT, so it usually doesn&#x27;t care about the mistakes, it normally interprets them as what they are supposed to be.
java_beyb大约 2 年前
if your decision is cost-oriented, then Whisper API is the cheapest - at least based on what other API companies promote on their websites.<p>however, depending on what you&#x27;re building, you may consider local speech-to-text by running speech-to-text on user&#x27;s devices, basically you do not pay for the cloud.<p>you should understand whether you&#x27;ll need model adaptation -like adding custom industry jargon or so. whisper might be challenging.
ezedv大约 2 年前
You can use TranscribeMe, it&#x27;s for Telegram and WhatsApp; it&#x27;s totally free! <a href="https:&#x2F;&#x2F;transcribeme.app" rel="nofollow">https:&#x2F;&#x2F;transcribeme.app</a>
muttantt大约 2 年前
use deepgram, they recently added Whisper as a model too
adyashakti大约 2 年前
free ios app: <a href="https:&#x2F;&#x2F;apps.apple.com&#x2F;us&#x2F;app&#x2F;aiko&#x2F;id1672085276" rel="nofollow">https:&#x2F;&#x2F;apps.apple.com&#x2F;us&#x2F;app&#x2F;aiko&#x2F;id1672085276</a>