TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Jargonic: Industry-Tunable ASR Model

56 点作者 agold97大约 1 个月前

5 条评论

gronky_大约 1 个月前
I just tried the demo on the homepage and I don’t know what kind of sorcery this is but it’s blowing my mind.<p>I input a bunch of completely made up words (Quastral Syncing, Zarnix Meshing, HIBAX, Bilxer) and used them in a sentence and the model zero-shotted perfect speech recognition!<p>It’s so counterintuitive for me that this would work. I would have bet that you have to provide at least one audio sample in order for the model to recognize a word it was never trained on.<p>Providing it to the model in text modality and it being able to recognize it in the audio modality must be an emergent property.
suchire大约 1 个月前
Is their WER graph just completely made up? It’s comically bad
four_fifths大约 1 个月前
so if i understand this correctly — you want the speech recognition model to identify a vocabulary of specific terms that it wasn&#x27;t trained on. instead of fine-tuning with training data that includes the new vocabulary, you input the full vocabulary at test time as a list of words and the model is able to generate transcripts that include words from the vocabulary.<p>seems like it could be very useful but it really comes down to the specifics.<p>you can prompt whisper with context — how does this compare?<p>how large of a vocabulary can it work with? if it&#x27;s a few dozen words it&#x27;s only gonna help for niche use cases. if it can handle 100s-1000s with good performance that could completely replace fine-tuning for many uses
评论 #43544345 未加载
评论 #43544426 未加载
FloatArtifact大约 1 个月前
How does this keyword spotting compare versus grammar or intent approach for speech recognition commands with dictation?<p>How does keyword spotting handle complex phrases as commands?
htrp大约 1 个月前
perhaps it&#x27;s using openai advanced voice or another tts to create waveforms for comparison?