科技回声

9 条评论

drag0s大约 2 年前

- AssemblyAI was the winner for the tests we did some months ago, very reliable and accurate.- Deepgram also looks interesting, recently they released a new model (Nova), they also offer Whisper for a cheaper price ($0.0048/min), I've briefly played a little bit with it but the DX looked a bit bad. They're also offering $200 in credits now.- If you're on a really tight budget. Most browsers [1] support the SpeechRecognition API [2] where you can transcribe for free. Depends on the browser it works better, for example in Google Chrome it works excellent as the browser actually sends the audio to the cloud (probably uses GCP's Google Cloud Speech to Text)[1] <a href="https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition#browser_compatibility" rel="nofollow">https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecog...</a> [2] <a href="https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition" rel="nofollow">https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecog...</a>

FloatArtifact大约 2 年前

I've experimented with whisper. I don't know of a way to do commands without parsing dictation. Bottom line, the model has to pass 30 seconds of audio to my knowledge. So say if you're utterance is 5 seconds, you'll need 25 seconds of silence.Depending on the platform you're targeting.<a href="https://github.com/dictation-toolbox/dragonfly">https://github.com/dictation-toolbox/dragonfly</a> Might be interesting to you.

tikkun大约 2 年前

I've tried a few:Whisper is cheapestAssemblyAI and Google Cloud Speech to Text are more accurateOverall, I wouldn't recommend Whisper unless the transcription accuracy doesn't need to be high. I'm hoping they release the "GPT-4" equivalent of Whisper.

satvikpendem大约 2 年前

You can self host it too if you want, that's the good part about Whisper, since it's open source.

qup大约 2 年前

I've been using whisper since it was there and it's also open source and I know I can host my own. I use it with I would say 95% accuracy, possibly more.I'm interacting with GPT, so it usually doesn't care about the mistakes, it normally interprets them as what they are supposed to be.

java_beyb大约 2 年前

if your decision is cost-oriented, then Whisper API is the cheapest - at least based on what other API companies promote on their websites.however, depending on what you're building, you may consider local speech-to-text by running speech-to-text on user's devices, basically you do not pay for the cloud.you should understand whether you'll need model adaptation -like adding custom industry jargon or so. whisper might be challenging.

ezedv大约 2 年前

You can use TranscribeMe, it's for Telegram and WhatsApp; it's totally free! <a href="https://transcribeme.app" rel="nofollow">https://transcribeme.app</a>

muttantt大约 2 年前

use deepgram, they recently added Whisper as a model too

adyashakti大约 2 年前

free ios app: <a href="https://apps.apple.com/us/app/aiko/id1672085276" rel="nofollow">https://apps.apple.com/us/app/aiko/id1672085276</a>

9 条评论

drag0s大约 2 年前

FloatArtifact大约 2 年前

tikkun大约 2 年前

satvikpendem大约 2 年前

You can self host it too if you want, that's the good part about Whisper, since it's open source.

qup大约 2 年前

java_beyb大约 2 年前

ezedv大约 2 年前

You can use TranscribeMe, it's for Telegram and WhatsApp; it's totally free! <a href="https://transcribeme.app" rel="nofollow">https://transcribeme.app</a>

muttantt大约 2 年前

use deepgram, they recently added Whisper as a model too

adyashakti大约 2 年前

free ios app: <a href="https://apps.apple.com/us/app/aiko/id1672085276" rel="nofollow">https://apps.apple.com/us/app/aiko/id1672085276</a>

Ask HN: Would you recommend OpenAI Whisper for Speech to text?

9 条评论

Ask HN: Would you recommend OpenAI Whisper for Speech to text?

9 条评论