TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What is your recommended speech to text/audio transcription tool?

3 点作者 elektor将近 2 年前
Currently, I use a GUI for Whisper AI (https:&#x2F;&#x2F;github.com&#x2F;Const-me&#x2F;Whisper) to upload MP3s of interviews to get text transcripts. However, I&#x27;m hoping to find another tool that would recognize and split out the text per speaker.<p>Does such a thing exist?

3 条评论

tikkun将近 2 年前
For an end user application, Otter.ai is the best I&#x27;ve seen - I wish there was a better faster one built on top of Whisper, but there isn&#x27;t a good one that I&#x27;ve seen.<p>If you&#x27;re looking for an API - then check AssemblyAI, Google Cloud transcription, Deepgram. I have a list here: <a href="https:&#x2F;&#x2F;llm-utils.org&#x2F;List+of+AI+APIs" rel="nofollow noreferrer">https:&#x2F;&#x2F;llm-utils.org&#x2F;List+of+AI+APIs</a>
solardev将近 2 年前
Descript.com was pretty good at it when I tried it, but it&#x27;s pretty expensive: <a href="https:&#x2F;&#x2F;www.descript.com&#x2F;transcription" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.descript.com&#x2F;transcription</a><p>We ended up using Otter.ai, which if I remember correctly didn&#x27;t have as good a speaker separation model, but it was good enough for the price: <a href="https:&#x2F;&#x2F;otter.ai&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;otter.ai&#x2F;</a><p>There&#x27;s also the much more expensive, human-powered Rev: <a href="https:&#x2F;&#x2F;www.rev.com&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.rev.com&#x2F;</a>
tmaly将近 2 年前
Microsoft has a tool that accepts wav or mp3 and transcribes it.<p>But I do not think it can distinguish between speakers.<p>How well does Whisper work in terms of correctness for single speakers?
评论 #36302378 未加载