This is one of those announcements that seems unremarkable on read-through but could be industry-changing in a decade. The driving force between consolidation & monopoly in the tech industry is that bigger firms with more data have an advantage over smaller firms because they can deliver features (often using machine-learning) that users want and small startups or individuals simply cannot implement. This, in theory, provides a way for users to maintain control of their data while granting permission for machine-learning algorithms to inspect it and "phone home" with an improved model, <i>without revealing the individual data</i>. Couple it with a P2P protocol and a good on-device UI platform and you could in theory construct something similar to the WWW, with data stored locally, but with all the convenience features of centralized cloud-based servers.
Their papers mentioned in the article:<p>Federated Learning: Strategies for Improving Communication Efficiency (2016)
<a href="https://arxiv.org/abs/1610.05492" rel="nofollow">https://arxiv.org/abs/1610.05492</a><p>Federated Optimization: Distributed Machine Learning for On-Device Intelligence (2016)
<a href="https://arxiv.org/abs/1610.02527" rel="nofollow">https://arxiv.org/abs/1610.02527</a><p>Communication-Efficient Learning of Deep Networks from Decentralized Data (2017)
<a href="https://arxiv.org/abs/1602.05629" rel="nofollow">https://arxiv.org/abs/1602.05629</a><p>Practical Secure Aggregation for Privacy Preserving Machine Learning (2017)
<a href="http://eprint.iacr.org/2017/281" rel="nofollow">http://eprint.iacr.org/2017/281</a>
Reminds me of a talk I saw by Stephen Boyd from Stanford a few years ago: <a href="https://www.youtube.com/watch?v=wqy-og_7SLs" rel="nofollow">https://www.youtube.com/watch?v=wqy-og_7SLs</a><p>(Slides only here: <a href="https://www.slideshare.net/0xdata/h2o-world-consensus-optimization-and-machine-learning-stephen-boyd" rel="nofollow">https://www.slideshare.net/0xdata/h2o-world-consensus-optimi...</a>)<p>At that time I was working at a healthcare startup, and the ramifications of consensus algorithms blew my mind, especially given the constraints of HIPAA. This could be massive within the medical space, being able to train an algorithm with data from everyone, while still preserving privacy.
The paper: <a href="https://arxiv.org/pdf/1602.05629.pdf" rel="nofollow">https://arxiv.org/pdf/1602.05629.pdf</a><p>The key algorithmic detail: it seems they have each device perform multiple batch updates to the model, and then average all the multi-batch updates. "That is, each client locally takes one step of gradient descent on the current model using its local data, and the server then takes a weighted average of the resulting models. Once the algorithm is written this way, we can add more
computation to each client by iterating the local update. "<p>They do some sensible things with model initialization to make sure weight update averaging works, and show in practice this way of doing things requires less communication and gets to the goal faster than a more naive approach. It seems like a fairly straighforward idea from the baseline SGD, so the contribution is mostly in actually doing it.
"Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud."<p>So I assume this would help with privacy in a sense that you can train model on user data without transmitting it to the server. Is this in any way similar to something Apple calls 'Differential Privacy' [0] ?<p>"The key idea is to use the powerful processors in modern mobile devices to compute higher quality updates than simple gradient steps."<p>"Careful scheduling ensures training happens only when the device is idle, plugged in, and on a free wireless connection, so there is no impact on the phone's performance."<p>It's crazy what the phones of near future will be doing while 'idle'.<p>------------------------<p>[0] <a href="https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/" rel="nofollow">https://www.wired.com/2016/06/apples-differential-privacy-co...</a>
This is fascinating, and makes a lot of sense. There aren't too many companies in the world that could pull something like this off.. amazing work.<p>Counterpoint: perhaps they don't need your data if they already have the model that describes you!<p>If the data is like oil, but the algorithm is like gold.. then they still extract the gold without extracting the oil. You're still giving it away in exchange for the use of their service.<p>For that matter, run the model in reverse, and while you might not get the exact data... we've seen that machine learning has the ability to generate something that simulates the original input...
This is quite amazing, beyond the homomorphic privacy implications being executed at scale in production -- they're also finding a way to harness billions of phones to do training on all kinds of data. They don't need to pay for huge data centers when they can get users to do it for them. They also can get data that might otherwise have never left the phone in light of encryption trends.
This is speculative, but it seems like the privacy aspect is oversold as it may be possible to reverse engineer the input data from the model updates. The point is that the model updates themselves are specific to each user.
This is an amazing development. Google is in a unique position to run this on truly massive scale.<p>Reading this, I couldn't shake the feeling that I heard all of this somewhere before in a work of fiction.<p>Then I remembered - here's the relevant clip from "Ex Machina":<p><a href="https://youtu.be/39MdwJhp4Xc" rel="nofollow">https://youtu.be/39MdwJhp4Xc</a>
While a neat architectural improvement, the cynic in me thinks this is a fig leaf for the voracious inhalation of your digital life they're already doing.
Even if this only allowed device based training and not privacy advantages it's exciting as a way of compression. Rather than sucking up device upload bandwidth you keep the data local and send the tiny model weight delta!
Tangentially related to this - numerai is a crowdsourced hedge fund that uses structure preserving encryption to be able to distribute it's data, while at the same time ensuring that it can be mined.<p><a href="https://medium.com/numerai/encrypted-data-for-efficient-markets-fffbe9743ba8" rel="nofollow">https://medium.com/numerai/encrypted-data-for-efficient-mark...</a><p>Why did they not build something like this ? I'm kind of concerned that my private keyboard data is being distributed without security. The secure aggregation protocol doesn't seem to be doing anything like this.
This is literally non-stochastic gradient descent where the batch update simply comes from a single node and a correlated set of examples. Nothing mind-blowing about it.
To be honest I have thought about this for long for distributed computing. If we have a problem which takes a lot of time to compute but problem can be computed with small pieces and then combined then why can't we pay user to subscribe for the computation? This is a major step toward thr big goal.
I don't work with ML for my day job but find it exhilaratingly interesting. (true story!)<p>When I first read this I was thinking: surely we can already do distributed learning, isnt that what for example SparkML does?<p>Is the benefit of this in the outsourcing of training of a large model to a bunch of weak devices?
I think the implications go even beyond privacy and efficiency. One could estimate each user's contribution to fidelity gains of the model. At least as an average within a batch. I imagine such an attribution to rewarded in money or credibility in the future.
Where is the difference between that and distributed computing? A part of the specific usage for ML I don't see many differences, seti@home was an actual revolution made of actual volunteers (I don't know how many google users will be aware of that).
I had exactly this idea about a year ago!<p>I know ideas without execution don't worth anything, but I'm just happy to see my vision is on the right direction.
I would argue there is no such thing. The model will after the update now incooperate your traning data as a seen example, clever use of optimization would enable you to partly reconstruct the example.