TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Don't Look Twice: Faster Video Transformers with Run-Length Tokenization

75 pointsby jasondavies6 months ago

6 comments

kmeisthax6 months ago
I'm wondering if it would make sense to use an H.264/5/6/AV1 encoder as the tokenizer, and then find some set of embeddings that correspond to the data in the resulting bitstream. The tokenization they're doing is morally equivalent to what video codecs already do.
评论 #42163858 未加载
pavlov6 months ago
Would event camera input data be useful here?<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Event_camera" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Event_camera</a><p>“Event cameras do not capture images using a shutter as conventional (frame) cameras do. Instead, each pixel inside an event camera operates independently and asynchronously, reporting changes in brightness as they occur, and staying silent otherwise.”
cyberax6 months ago
Interestingly, biological vision for reptiles (and probably other species) works largely on the same principle. It tends to filter out static background.
评论 #42164588 未加载
smusamashah6 months ago
Isn&#x27;t this like Differential Transformers that worked based on differences?
评论 #42159517 未加载
评论 #42159461 未加载
robbiemitchell6 months ago
For training, would it be useful to stabilize the footage first?
评论 #42159541 未加载
评论 #42157247 未加载
trash_cat6 months ago
What would be the applications of this that is different from regular transformers? Perhaps stupid question.