TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Meta AI: Fist high-performance self-supervised algorithm for multiple modalities

89 pointsby leirbagarcover 3 years ago

11 comments

earth2marsover 3 years ago
Can someone explain this like I am 5. What are the use cases when it says works on images, text etc? Why is this a big deal? What&#x27;s the human input here? And what output to expect?<p>From what I understand, human validation (supervision) is not happening while algorithm is training on data. Is that right? Will this be open to the public via standard ML frameworks or proprietary?
评论 #30024860 未加载
评论 #30024569 未加载
评论 #30023883 未加载
评论 #30023243 未加载
评论 #30022744 未加载
silence48over 3 years ago
I don&#x27;t believe it&#x27;s the actual first but this is pretty awesome. too bad its facebook :&#x2F;
评论 #30021085 未加载
WithinReasonover 3 years ago
Basically they cut out a part of the input and make the network predict the missing part. (edit: they actually predict the average of all features). This works for images, audio, text. This produces high quality feature representations for data which can be used to build specialised networks on. The two main tricks are:<p>1. Do the cutout in feature space, not the original input space. (edit: cutout is actually in input space)<p>2. The above would likely just collapse the features to 0, so they use the same network that does the reconstruction to produce the features (!). In their own words:<p>&quot;We first encode a masked version of the training sample (model in <i>student mode</i>) and then construct training targets by encoding the unmasked version of the input sample with the same model but when parameterized as an exponentially moving average of the model weights (model in <i>teacher mode</i>)&quot;
评论 #30022205 未加载
评论 #30022041 未加载
dreneiover 3 years ago
dang&#x2F;mods: The title here has a small typo: it misspells algorithm.
评论 #30021695 未加载
评论 #30021890 未加载
macilaciloveover 3 years ago
It seems that they pass everything through an autoencoder first, and a different network tries to predict from a partially masked input the &quot;correct&quot; autoencoder latent space representation of the unmasked input. If it works, the decoder of the autoencoder can generate(guess) the unmasked data from the latent space.
algo_traderover 3 years ago
Is there progress on general structured&#x2F;relational&#x2F;graphed data modalities?<p>In practice, you spend time and expertise to reform the data into previously-known-to-work form.<p>FWIW, our datasets are huge, with dense data&#x2F;noise ratio.
评论 #30023538 未加载
Bombthecatover 3 years ago
Crazy, and people think AI isn&#x27;t moving forward anymore..
评论 #30022647 未加载
alarakover 3 years ago
I don&#x27;t understand how this is different from BYOL? I&#x27;d appreciate it if someone could give a small explanation.
评论 #30023829 未加载
asix66over 3 years ago
<a href="https:&#x2F;&#x2F;archive.is&#x2F;Cm81W" rel="nofollow">https:&#x2F;&#x2F;archive.is&#x2F;Cm81W</a>
macleginnover 3 years ago
Do they need a pre-trained modality-specific model for each modality to train this?
评论 #30023065 未加载
评论 #30021946 未加载
EZ-Cheezeover 3 years ago
Now do it with matrices GPT style<p>Sheeeeeeeeeeeeeeeeeeit