TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

OpenAI releases Whisper v3, new generation open source ASR model

117 点作者 crakenzak超过 1 年前

12 条评论

nshm超过 1 年前
Good improvements for many languages, numbers here<p><a href="https:&#x2F;&#x2F;github.com&#x2F;openai&#x2F;whisper&#x2F;blob&#x2F;main&#x2F;language-breakdown.svg">https:&#x2F;&#x2F;github.com&#x2F;openai&#x2F;whisper&#x2F;blob&#x2F;main&#x2F;language-breakdo...</a>
评论 #38167740 未加载
评论 #38167917 未加载
评论 #38168102 未加载
评论 #38167551 未加载
dang超过 1 年前
Related ongoing threads:<p><i>New models and developer products</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38166420">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38166420</a><p><i>OpenAI DevDay, Opening Keynote Livestream [video]</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38165090">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38165090</a>
Nitrolo超过 1 年前
Does anyone know of a nice UI wrapper for something like whisper.cpp?<p>I need to write a lot of long texts for work and some good dictation software would be great. I know there&#x27;s Dragon, but somehow I have not been able to find something that fits my need and is free.
评论 #38171776 未加载
评论 #38193772 未加载
评论 #38178388 未加载
评论 #38168917 未加载
评论 #38169715 未加载
jsight超过 1 年前
This seems like the best free voice recognition in general.<p>Is there a model that is the best at wake word detection? The last that I looked, it seemed like this was fairly lacking.
评论 #38169673 未加载
alex_young超过 1 年前
Still doesn&#x27;t look like it can do real-time unfortunately.<p>Edit: I understand that you can use small samples and approximate something like streaming, but the limitation here is you wind up without context for the samples, increasing WER. It would be nice if there was some streaming option.
评论 #38167811 未加载
评论 #38309043 未加载
评论 #38265337 未加载
评论 #38167663 未加载
评论 #38167645 未加载
评论 #38173469 未加载
评论 #38169357 未加载
GaggiX超过 1 年前
This is great, but I hope in the future there would be a speech-to-text model with a focus on low-resource languages, probably by balancing the dataset similar to No Language Left Behind (NLLB) released by Meta, it&#x27;s a translation model that works really well even with low-resource languages, it would be really cool something similar for speech transcription.
ComputerGuru超过 1 年前
They say whisper-3 will be available via the api soon. Does anyone know why only whisper-1 was ever made available via the api (no whisper-2)?
评论 #38168220 未加载
评论 #38168195 未加载
评论 #38168179 未加载
csjh超过 1 年前
Only 3GB, interesting to see how small SOTA models in other domains are compared to LLMs like Falcon-180B.
评论 #38184630 未加载
singularity2001超过 1 年前
did they break the api?<p>from openai import OpenAI<p>Traceback (most recent call last): File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt; ImportError: cannot import name &#x27;OpenAI&#x27; from &#x27;openai&#x27;<p>If so where is the current documentation?
joshspankit超过 1 年前
Does anyone know if it’s able to do diarization with 3?
评论 #38180749 未加载
spandextwins超过 1 年前
With comments GitHub looks like HN except one less click to click.
tomrod超过 1 年前
Word from my GenAI contact is that this (or similar announcement) replaces the need for RAG.
评论 #38167635 未加载