TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How to transcribe a couple thousand calls per day?

11 点作者 bojangleslover8 个月前
We have tried Microsoft Speech Service and found it to be way too complicated.<p>The Azure OpenAI whisper deployment has a pretty low quota.<p>Running it using ggeranov&#x27;s whisper on a Mac works fairly well but it&#x27;s not in our corporate network.<p>I really need to batch transcribe these calls. I am a few weeks behind.<p>I have access to a server with 2x RTX 4090. It is all up and ready to go with the Nvidia drivers.<p>By the way, these calls are an average of 90s. Not long.

5 条评论

dodysw8 个月前
I transcribed between 3000 to 4000 of 10s-30s short videos, every day for almost 2 years for fun. A cheap desktop linux with second hand x-mining RTX 3060 and 3080Ti, connected over home network using basic Gradio and faster-whisper, so they can be exposed as public API and called from corporate network. Relatively easy and much cheaper compared to commercial offerings at the time. These GPUs are over powered for the task and every day only spent 1 to 2 hours of actual encoding, it&#x27;s so quick, and it&#x27;s using the biggest whisper model with audio preprocessing and VAD to improve success rate.
solardev8 个月前
Does it have to use Whisper? If so, can&#x27;t you just run it on that server instead of the Mac? <a href="https:&#x2F;&#x2F;github.com&#x2F;openai&#x2F;whisper&#x2F;discussions&#x2F;1463">https:&#x2F;&#x2F;github.com&#x2F;openai&#x2F;whisper&#x2F;discussions&#x2F;1463</a><p>If it doesn&#x27;t, there are a bunch of other speech recognition APIs. Most of them use older techs but might be good enough: <a href="https:&#x2F;&#x2F;www.gladia.io&#x2F;blog&#x2F;openai-whisper-vs-google-speech-to-text-vs-amazon-transcribe" rel="nofollow">https:&#x2F;&#x2F;www.gladia.io&#x2F;blog&#x2F;openai-whisper-vs-google-speech-t...</a><p>Personally I found Otter.ai works really well for the transcription part, but they don&#x27;t have an API: <a href="https:&#x2F;&#x2F;otter.ai" rel="nofollow">https:&#x2F;&#x2F;otter.ai</a><p>You can also just upload them all to YouTube in a private playlist and it&#x27;ll automatically transcribe them for you.
philipkiely8 个月前
This is a complete shameless plug but I just published some documentation on automatically building Whisper inference engines with TensorRT-LLM which has the batch inference that you&#x27;re looking for: <a href="https:&#x2F;&#x2F;docs.baseten.co&#x2F;performance&#x2F;examples&#x2F;whisper-trt" rel="nofollow">https:&#x2F;&#x2F;docs.baseten.co&#x2F;performance&#x2F;examples&#x2F;whisper-trt</a>
arthurdelerue8 个月前
We use Whisper Large on NLP Cloud (<a href="https:&#x2F;&#x2F;nlpcloud.com&#x2F;home&#x2F;playground&#x2F;asr" rel="nofollow">https:&#x2F;&#x2F;nlpcloud.com&#x2F;home&#x2F;playground&#x2F;asr</a>). It works very well and it&#x27;s simple to set up in my opinion. If you have a batch to process you could simply subscribe to their pay-as-you-go plan for a couple of weeks&#x2F;months maybe?
eevmanu8 个月前
Consider &quot;Whisper Large V3&quot; on console.groq.com, imo is fast reliable and cheap ($0.03&#x2F;hour transcribed).