Have not read the paper in much depth yet but this looks like great work, super interesting. Thanks for sharing.<p>Question: in the example of prediction on untrained tasks, what exactly hasn't been trained? The paper talks about video being one of the trained tasks. Did you simply retrain model without video examples and then test performance?
Can you explain in a bit layman terms what exactly has been done here ? What I understood is that there is a single NN trained for multiple tasks - but what is the benefit ?
Preferred original title when submitting links: "Show HN: OmniNet - A unified architecture for multi-modal multi-task learning"<p>HN is a bit strict. I'd say "X is all you need" gets less attention from users than a very technical headline. The most popular submission recently had MITM in the title (<a href="https://news.ycombinator.com/best" rel="nofollow">https://news.ycombinator.com/best</a>)