TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Weak supervision to isolate sign language communicators in crowded news videos

58 点作者 matroid9 个月前

6 条评论

akira25019 个月前
&gt; I believe that we can solve continuous sign language translation convincingly<p>American Sign Language is not English, in fact, it&#x27;s not even particularly close to English. Much of the language is conveyed with body movements outside of the hands and fingers, particularly with facial expressions and &quot;named placeholders.&quot;<p>&gt; All this is to say, that we need to build a 5000 hour scale dataset for Sign Language Translation and we are good to go. But where can we find this data? Luckily news broadcasters often include special news segments for the hearing-impaired.<p>You need _way_ more than just 5000 hours of video. People who are deaf of hard of hearing, in my experience, dislike the interpreters in news broadcasts. It&#x27;s very difficult, as an interpreter, to provide _worthwhile_ translations of what is being spoken _as_ it is being spoken.<p>It&#x27;s more of a bad and broken transliteration that if you struggle to think about you can parse out and understand.<p>The other issue is most interpreters are hearing and so use the language slightly differently from actual deaf persons, and training on this on news topics will make it very weak when it comes to understanding and interpreting anything outside of this context. ASL has &quot;dialects&quot; and &quot;slang.&quot;<p>Hearing people always presume this will be simple. They should really just take an ASL class and worth with deaf and hearing impaired people first.
评论 #41260184 未加载
评论 #41260720 未加载
评论 #41261900 未加载
评论 #41261307 未加载
评论 #41259991 未加载
评论 #41260879 未加载
评论 #41260263 未加载
评论 #41260094 未加载
egberts19 个月前
Using news broadcast as a training model to populate LLM is a poor precedence.<p>Repetition of a sign usually indicates an additional emphasis.<p>The dialect needs to be all covered and multiply mapped to its word.<p>Furthermore, YouTube has an excellent collection of really bad or fake ASL interpreters in many news broadcasts, so bad, really really bad, worse than Al Gore Hanging Chad news broadcast or the &quot;hard-of-hearing&quot; inset box during Saturday Night Live News broadcast.<p>You still need an RID-certified or CDI-certified ASL interpreter to vet the source.<p><a href="https:&#x2F;&#x2F;m.youtube.com&#x2F;watch?v=GwSh0dAaqIA" rel="nofollow">https:&#x2F;&#x2F;m.youtube.com&#x2F;watch?v=GwSh0dAaqIA</a><p><a href="https:&#x2F;&#x2F;rid.org&#x2F;certification&#x2F;available-certifications&#x2F;" rel="nofollow">https:&#x2F;&#x2F;rid.org&#x2F;certification&#x2F;available-certifications&#x2F;</a>
zie9 个月前
1st: I sign ASL not ISL like the OP is talking about.<p>In the ASL world, most news translations into ASL are delayed or sped up from the person talking and&#x2F;or the captions if they happen to also be available.<p>You are going to have sync problems.<p>Secondly, it&#x27;s not just moving the hands, body movements, facial expressions, etc all count in ASL , I&#x27;m betting they count in ISL as well.<p>Thirdly the quality of interpretation can be really bad. Horrendous. it&#x27;s not so common these days, but it was fairly common that speakers would hire an interpreter and mistakenly hire someone willing to just move their arms randomly. I had it happen once at a doctors office. The &quot;interpreter&quot; was just lost in space. The doctor and I started writing things down and the interpreter seemed a little embarrassed at least.<p>Sometimes they hire sign language students, you can imagine hiring a first year french student to interpret for you, it&#x27;s no different really. Sometimes they mean well, sometimes they are just there for the paycheck.<p>I bet it&#x27;s a lot worse with ISL, because it&#x27;s still very new, most students are not taught in ISL, there are only about 300 registered interpreters for millions of deaf people in India. <a href="https:&#x2F;&#x2F;islrtc.nic.in&#x2F;history-0" rel="nofollow">https:&#x2F;&#x2F;islrtc.nic.in&#x2F;history-0</a><p>We are still very much struggling with vocal to English transcriptions using AI. Despite loads of work from lots of companies and researchers. They are getting better, and in ideal scenarios are actually quite useful. Unfortunately the world is far from ideal.<p>The other day on a meeting with 2 people using the same phone. The AI transcription was highly confused and it went very, very wrong.<p>I&#x27;m not trying to discourage you, and it&#x27;s great to see people trying. I wish you lots of success, just know it&#x27;s not an easy thing and I imagine lots of lifetimes of work will be needed to generate useful signed language to written language services that are on-par with the best of the voice to text systems we have today.
评论 #41268052 未加载
hi-v-rocknroll9 个月前
I&#x27;m wondering how long it will take for LLMs to be able to generate complete (one of many) sign language(s) on-the-fly and put the various sign language(s) translators out of a job. The crux seems to be that sign language differs significantly from spoken language and includes facial movements and nonverbal emotional tonality.
评论 #41262149 未加载
agarsev9 个月前
Sign language researcher here! I would recommend you look a bit at the scientific literature on the topic. I know it can be a bit overwhelming and hard to know to separate the actual info from the garbage, so I can try and select for you a few hand picked papers. IMO, trying to understand sign language oneself, or at least getting basic notions, is fundamental to understand where the real problems lie.<p>Unfortunately there&#x27;s no getting away from that. While the scarcity of data indeed is an issue, and your idea is nice (congratulations!) the actual problem is the scarcity of useful data. Since sign language doesn&#x27;t correspond to the oral language, there are many problems with alignment and just <i>what</i> to translate to. Glosses (oral language words used as representation for signs) are not enough at all, since they don&#x27;t capture the morphology and grammar of the language, which among other things heavily relies on space and movement. Video + audio&#x2F;audio captions is nearly useless.<p>Good luck with your efforts, this is a fascinating area where we get to combine the best of CS, AI, linguistics... but it&#x27;s hard! As I said, let me know if you want some literature, by PM&#x2F;email if you want, and I&#x27;ll get back to you later.
jallmann9 个月前
Sign languages have such enormous variability that I have always thought having fluent sign language recognition &#x2F; translation probably means we have solved AGI.<p>Detecting the presence of sign language in a video is an interesting subset of the problem and is important for building out more diverse corpora. I would also try to find more conversational sources of data, since news broadcasts can be clinical as others have mentioned. Good luck.