TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Lip Reading as a Service (Read Their Lips by Symphonic Labs)

49 点作者 draugadrotten8 个月前

6 条评论

luma8 个月前
Thinking through some potentially interesting sources for videos where two people are talking but we don&#x27;t know what was said and well, I think this is a decent starting point: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=KLcfpU2cubo" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=KLcfpU2cubo</a><p>Sadly, doesn&#x27;t work too great in this situation:<p>&gt; That they didnt go through but i would tell you theyre just a chill look at here lets do it chills with all of our great men and they look at every chance they go oh do you want to the black man well thats my gosh thats my gosh thats my gosh thats my gosh thats my gosh thats my gosh thats
评论 #41506243 未加载
mtVessel8 个月前
So far, it&#x27;s no HAL 9000.<p>Uploaded video dialog:<p>Bowman: You know, of course, though, he&#x27;s right about the 9000 series having a perfect operational record. They do.<p>Poole: Unfortunately that sounds a little like famous last words.<p>Bowman: Yeah, still it was his idea to carry out the failure mode analysis, wasn&#x27;t it?<p>Poole: mmm<p>Bowman: Should certainly indicate... (away from camera): his integrity and self-confidence<p>Bowman: If he were wrong, it&#x27;d be the surest way of proving it.<p>Poole: It would be if he knew he was wrong.<p>Results:<p>&quot;Of course there is recommended getting necessary to have a perfect operational rank i know youre going to be the first to do that youre going to get the best youre going to get the best youre going to get the best youre going to get the best youre going to get the best of yours if you want to rock better sure its well perfect.&quot;
评论 #41506736 未加载
pogue8 个月前
Has anyone tried this with some video where they know what the person is saying?<p>I&#x27;d be interested to know how accurate it is, from what angles it will read lips at (front facing, side, etc).<p>Sounds promising if it works well. Imagine all the historical videos without sound you could try to finally know what was being said.
评论 #41505664 未加载
echelon8 个月前
Great way to build labeled training data.<p>User-submitted videos (with audio for STT), user-crafted bounding boxes (we might not need these soon), and user-guided RLHF.<p>The submitted videos are likely diverse, challenging (otherwise the human might just do it), and representative of solving actual customer problems.
评论 #41505035 未加载
shrubble8 个月前
Wondering how well this will perform on the viral video “Benny Lava” and if it will be part of a group of videos used to create a synthetic benchmark.
tchock238 个月前
Has anyone tried this out on Radiohead&#x27;s &quot;Just&quot; video yet?