Really interesting work there, and I particularly liked the gif-based story telling - <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAndq_MjAVBs4j3lmxEX71nMrCLpAasklndZyE8F7yj3slyafRsNauzW4yRxI_Ncg7Sp5jllAXpItsjA-BOmdB2O1jP3Awu09-DVRHBE_Urf58yzm5tDBBpM-aibZxmgA9O6CySCCRdSMMqG7vj-OU07jHa0OU0YixCxRB0Q3APMQbn8Vz5rEBp70ZNogH/s900/image3.gif" rel="nofollow">https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg...</a>
So the main idea is that teachers can use examples with PII/user data from gmail etc, but distill task-specific capabilities to students that don’t contain PII/user data. Seems kinda useful for companies that want to utilize private information safely.<p>Edit: misunderstood the paper on first skim, they don’t actually want smaller students, they might actually want bigger ones. So this is more a privacy thing
Sounds like a stretch of the concept of social learning, and more like vanilla model distillation.<p>Social learning exist to transcend the limited processing power and limited training data exposure of individual agents through multimodal transfer of their own individual models (distilled down from an individual's entire worldview, sense-of-self, perspectives, skills, semantic memory etc)<p>LLMs already exploit the propositional transfer of human models over language, and they abuse their massive compute capacity to compress them all in a giant model to simulate-them-all. For sure internally it does have some notion of distribution - as it at least has to distribute the compute at train time - but this is not an agent level distribution - not to confuse with the weaker metaphor of an "agent" used in model architectures -, and the end product presents itself as a singular "agent" with <i>all</i> of the processing power and <i>all</i> the training data that is infinitely copyable.<p>> "A teacher model provides instructions or few-shot examples to a student model without sharing its private data."<p>So the real concern is not utilizing social learning to transcend compute and training data limitations, it is about creating inferior models that can be distributed back into the world without giving up all of the secret sauce.<p>For sure this could work, one could create inferior "agents" from stronger "agents", but we cannot create an even stronger "agent" through the dialogue of two strong "agent"s, because everything to be shared is already perfectly encoded in the model&architecture and perfectly copyable. Therefore this is not social learning at all.<p>To abuse back their anthropomorphization, they are trying to create a deliberately stupid kid to send out to the world so that the kid doesn't tell all the things mommy and daddy already knows and could have perfectly taught. Because one can make more money from selling/renting a bundle of differently stupid agents than a singular state-of-the-art one I guess?
Think about the following scenario: I write a calculus book and the agents of this model just modify every example and every definition and change a little the ordering of the material to teach students. Now they are using my book but it seems they are not using my book. Are they trying to copy without copying?