TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Don't Look Twice: Faster Video Transformers with Run-Length Tokenization

75 点作者 jasondavies6 个月前

6 条评论

kmeisthax6 个月前
I'm wondering if it would make sense to use an H.264/5/6/AV1 encoder as the tokenizer, and then find some set of embeddings that correspond to the data in the resulting bitstream. The tokenization they're doing is morally equivalent to what video codecs already do.
评论 #42163858 未加载
pavlov6 个月前
Would event camera input data be useful here?<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Event_camera" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Event_camera</a><p>“Event cameras do not capture images using a shutter as conventional (frame) cameras do. Instead, each pixel inside an event camera operates independently and asynchronously, reporting changes in brightness as they occur, and staying silent otherwise.”
cyberax6 个月前
Interestingly, biological vision for reptiles (and probably other species) works largely on the same principle. It tends to filter out static background.
评论 #42164588 未加载
smusamashah6 个月前
Isn&#x27;t this like Differential Transformers that worked based on differences?
评论 #42159517 未加载
评论 #42159461 未加载
robbiemitchell6 个月前
For training, would it be useful to stabilize the footage first?
评论 #42159541 未加载
评论 #42157247 未加载
trash_cat6 个月前
What would be the applications of this that is different from regular transformers? Perhaps stupid question.