Federated Learning

167 点作者 dedalus将近 6 年前

15 条评论

First, I've loved that Google open sourced Tensor Flow Federated as a way to encourage the rest of the world to adopt this method of decentralized machine learning.Second, I was a bit disheartened that this concept had to be explained with a comic strip to make it accessible because I hoped the benefits were clear to everyone.Third, I read the comic strip, learned new things (secure aggregation protocol, wtf, amazing!), kicked myself for being smug and appreciated the huge amount of effort that someone invested to communicate this.

评论 #19945448 未加载

评论 #19947543 未加载

walterbell将近 6 年前

Google mentioned at I/O that speech recognition will soon (this summer?) be performed locally on Android devices, with no voice data being sent to Google, because they have been able to reduce the size of the model dramatically. Is that related to federated learning?Paper: <a href="https://arxiv.org/abs/1811.06621" rel="nofollow">https://arxiv.org/abs/1811.06621</a>

评论 #19945537 未加载

ahelwer将近 6 年前

All right, I'm cynical as all heck about ad companies and privacy, but this has me optimistic. Somebody disillusion me, why shouldn't I be optimistic?

评论 #19945208 未加载

评论 #19945102 未加载

评论 #19945251 未加载

评论 #19945411 未加载

评论 #19945112 未加载

评论 #19945032 未加载

评论 #19945368 未加载

评论 #19945296 未加载

archgoon将近 6 年前

So, correct me if I'm wrong, but this basically only works when you've already done your data exploration phase, you've committed to a particular topology, and now you just want to optimize your weights?It seems that this won't work so great if you don't have any initial data to bootstrap yourself with. So, perhaps the idea is you bootstrap with a few people, do your explorations, and then scale it out with federation?

ivan_ah将近 6 年前

This is very interesting for many reasons. First we have the privacy stance, which is a tremendous step for big G. Whoever managed to push this through in the "machine" of internal office politics deserves applause. The very fact of acknowledging that users might want to control their data locally rather than rsync everything all the time is a big step—it takes us off the "give me all your data" train that we have been on for some time.Talking about specific applications of your users' data makes a lot more sense: "If you share X with us, you're helping to build a better model Y that helps you with Z." Then the prompt "Do you want to share X?" makes a lot more sense than the current generic prompts "App V wants to access all your data W?" which doesn't tell you anything.The anonymisation-by-aggregation aspect is interesting on it's own since it provides a practical approach we can use today and not have to wait for homomorphic encryption. There will probably still be "data leakage" but I can see how aggregation can be fundamentally better than trying to shared anonymized data by fuzzing identifiers, randomization, and binning, which are notoriously hard to pull off and suffer from de-anonymisation attacks by cross linking with other datasets.Research-wise this could be a whole new field. Let's revisit all the ML algorithms and look at the ones that lend themselves to federated updates. Perhaps certain ML algorithms have been overlooked historically because they are not "cutting edge" but lend themselves better to distributed model updates? (I bet this is already a thing...)The communication complexity aspects are also very interesting since it forces us to think about bandwidth needed to communicate model updates and training batching. For high-bandwidth settings we could consider training a model from scratch, for medium bandwidth you can send model updates regularly, but what would be particularly interesting to see async and VERY low bandwidth updates—like just a few MB every, exchanged once in a while when connectivity is available.

评论 #19945522 未加载

ximeng将近 6 年前

Linked paper on using this for Google Keyboard (<a href="https://arxiv.org/pdf/1903.10635.pdf" rel="nofollow">https://arxiv.org/pdf/1903.10635.pdf</a>) highlights that there are nevertheless still privacy issues with this approach:While Federated Learning removes the need to upload raw user material — here OOV words — to the server, the privacy risk of unintended memorization still exists (as demonstrated in (Carlini et al., 2018)). Such risk can be mitigated, usually with some accuracy cost, using techniques including differential privacy (McMahan et al., 2018). Exploring these trade-offs is beyond the scope of this paper.

评论 #19945837 未加载

wybiral将近 6 年前

How do they assure you that the training algorithm isn't just exfiltrating your data?Edit: By that I mean... What's stopping the model from being as simple as "learn my personal information"?

评论 #19945248 未加载

评论 #19945209 未加载

pas将近 6 年前

What happens with the zero-sum cancelling out phase if one device disappears during the process?

评论 #19945372 未加载

评论 #19945261 未加载

gok将近 6 年前

Federated learning is a potentially really great idea, but it's important to be upfront about its limitations. Just because I can't prove that a piece of data came from your device doesn't mean that a machine learned model trained on that data isn't violating your privacy.For example, say we deployed federated learning to train a predictive language model, and allowed it to learn from emails, say, inside Google. Looking at what the model predicts when you type "Here at Google our next secret project is..." could very likely reveal something they wouldn't want widely revealed.

jonathanhd将近 6 年前

I'm genuinely still unsure if this is a parody or not. The first half of the comic just describes Google's business model and the second seems to be trying to outsource the cost of G/TPUs to the end user. Then at the end they go bankrupt and (presumably) sell their control over the data to a vulture fund.None of this addresses the fundamental problem of advertising companies, once people learn what they're doing they just want them to feck off and leave them alone, without any regard for future promises.

评论 #19946552 未加载

im3w1l将近 6 年前

My gut feeling tells me not to believe their promises that it's impossible to deduce the data from the model updates. That there should be attacks.My stylistic criticism is that they portray white men in a demeaning way that they would never dare do to any other group.edited to make a weaker claim

评论 #19945848 未加载

arthurcolle将近 6 年前

How can the data be sent in an encrypted manner that can then be useful without the server having a copy of the private keys used to encrypt the data itself?

评论 #19945407 未加载

评论 #19945413 未加载

satokema将近 6 年前

Renting my phone out to process data gives me a bad feeling. The airplane mode guy is now just straight up turning off the phone and battery.

评论 #19945323 未加载

arkades将近 6 年前

call me jaded but:If you’re paying for PR firms to produce cartoons about how good you are for privacy, you’re probably terrible for privacy.This feels like Google’s Joe Camel moment.

评论 #19945153 未加载

评论 #19945096 未加载

评论 #19945074 未加载

评论 #19945369 未加载

unreal37将近 6 年前

So instead of sending the data to Google encrypted for them to analyze, it analyzes the data on your device and sends that data to Google encrypted for them to combine the results.But your data still gets sent to Google. I don't see the difference. It's just another layer on top.

评论 #19945086 未加载

评论 #19945080 未加载