TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

OpenAI releases Whisper v3, new generation open source ASR model

117 pointsby crakenzakover 1 year ago

12 comments

nshmover 1 year ago
Good improvements for many languages, numbers here<p><a href="https:&#x2F;&#x2F;github.com&#x2F;openai&#x2F;whisper&#x2F;blob&#x2F;main&#x2F;language-breakdown.svg">https:&#x2F;&#x2F;github.com&#x2F;openai&#x2F;whisper&#x2F;blob&#x2F;main&#x2F;language-breakdo...</a>
评论 #38167740 未加载
评论 #38167917 未加载
评论 #38168102 未加载
评论 #38167551 未加载
dangover 1 year ago
Related ongoing threads:<p><i>New models and developer products</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38166420">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38166420</a><p><i>OpenAI DevDay, Opening Keynote Livestream [video]</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38165090">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38165090</a>
Nitroloover 1 year ago
Does anyone know of a nice UI wrapper for something like whisper.cpp?<p>I need to write a lot of long texts for work and some good dictation software would be great. I know there&#x27;s Dragon, but somehow I have not been able to find something that fits my need and is free.
评论 #38171776 未加载
评论 #38193772 未加载
评论 #38178388 未加载
评论 #38168917 未加载
评论 #38169715 未加载
jsightover 1 year ago
This seems like the best free voice recognition in general.<p>Is there a model that is the best at wake word detection? The last that I looked, it seemed like this was fairly lacking.
评论 #38169673 未加载
alex_youngover 1 year ago
Still doesn&#x27;t look like it can do real-time unfortunately.<p>Edit: I understand that you can use small samples and approximate something like streaming, but the limitation here is you wind up without context for the samples, increasing WER. It would be nice if there was some streaming option.
评论 #38167811 未加载
评论 #38309043 未加载
评论 #38265337 未加载
评论 #38167663 未加载
评论 #38167645 未加载
评论 #38173469 未加载
评论 #38169357 未加载
GaggiXover 1 year ago
This is great, but I hope in the future there would be a speech-to-text model with a focus on low-resource languages, probably by balancing the dataset similar to No Language Left Behind (NLLB) released by Meta, it&#x27;s a translation model that works really well even with low-resource languages, it would be really cool something similar for speech transcription.
ComputerGuruover 1 year ago
They say whisper-3 will be available via the api soon. Does anyone know why only whisper-1 was ever made available via the api (no whisper-2)?
评论 #38168220 未加载
评论 #38168195 未加载
评论 #38168179 未加载
csjhover 1 year ago
Only 3GB, interesting to see how small SOTA models in other domains are compared to LLMs like Falcon-180B.
评论 #38184630 未加载
singularity2001over 1 year ago
did they break the api?<p>from openai import OpenAI<p>Traceback (most recent call last): File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt; ImportError: cannot import name &#x27;OpenAI&#x27; from &#x27;openai&#x27;<p>If so where is the current documentation?
joshspankitover 1 year ago
Does anyone know if it’s able to do diarization with 3?
评论 #38180749 未加载
spandextwinsover 1 year ago
With comments GitHub looks like HN except one less click to click.
tomrodover 1 year ago
Word from my GenAI contact is that this (or similar announcement) replaces the need for RAG.
评论 #38167635 未加载