TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

WhisperNER: Unified Open Named Entity and Speech Recognition

133 点作者 timbilt6 个月前

6 条评论

vessenes6 个月前
The title is dense and the paper is short. But the demo is outstanding: (<a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;aiola&#x2F;whisper-ner-v1" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;aiola&#x2F;whisper-ner-v1</a>). The sample audio is submitted with &quot;entity labels&quot; set to &quot;football-club, football-player, referee&quot; and WhisperNER returns tags Arsenal and Juventus for the football-club tag. They suggest &quot;personal information&quot; as a tag to try on audio.<p>Impressive, very impressive. I wonder if it could listen for credit cards or passwords.
评论 #42218277 未加载
timbilt6 个月前
GitHub repo: <a href="https:&#x2F;&#x2F;github.com&#x2F;aiola-lab&#x2F;whisper-ner">https:&#x2F;&#x2F;github.com&#x2F;aiola-lab&#x2F;whisper-ner</a><p>Hugging Face Demo: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;aiola&#x2F;whisper-ner-v1" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;aiola&#x2F;whisper-ner-v1</a><p>Pretty good article that focuses on the privacy&#x2F;security aspect of this — having a single model that does ASR and NER:<p><a href="https:&#x2F;&#x2F;venturebeat.com&#x2F;ai&#x2F;aiola-unveils-open-source-ai-audio-transcription-model-that-obscures-sensitive-info-in-realtime&#x2F;" rel="nofollow">https:&#x2F;&#x2F;venturebeat.com&#x2F;ai&#x2F;aiola-unveils-open-source-ai-audi...</a>
评论 #42210691 未加载
评论 #42210137 未加载
will-burner6 个月前
Is there any reason why this would work better or is needed compared to taking audio and 1. doing ASR with whisper for instance 2. applying an NER model to the transcribed text?<p>There are open source NER models that can identify any specified entity type (<a href="https:&#x2F;&#x2F;universal-ner.github.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;universal-ner.github.io&#x2F;</a>, <a href="https:&#x2F;&#x2F;github.com&#x2F;urchade&#x2F;GLiNER">https:&#x2F;&#x2F;github.com&#x2F;urchade&#x2F;GLiNER</a>). I don&#x27;t see why this WhisperNER approach would be any better than doing ASR with whisper and then applying one of these NER models.
评论 #42215634 未加载
alienallys6 个月前
On a similar note, I&#x27;ve a request for the HN community. Can anyone recommend a low-latency NER model&#x2F;service.<p>I&#x27;m building an assistant that gives information on local medical providers that match your criteria. I&#x27;m struggling with query expansion and entity recognition. For any incoming query, I would want to NER for medical terms (which are limited in scope and pre-determined), and subsequently where I would do Query rewriting and expansion.
评论 #42215015 未加载
uniqueuid6 个月前
It&#x27;s so great to see that we finally move away from the thirty year old triple categorization of people, organizations and locations.<p>This of course means that we now have to think about all the irreconcilable problems of taxonomy, but I&#x27;ll take that any day over the old version :)
clueless6 个月前
&quot;The model processes audio files and simultaneously applies NER to tag or mask specific types of sensitive information directly within the transcription pipeline. Unlike traditional multi-step systems, which leave data exposed during intermediary processing stages, Whisper-NER eliminates the need for separate ASR and NER tools, reducing vulnerability to breaches.&quot;