Prior to this, Google also successfully patented dropout layers, is that right? This being the case, what's the implication for hobbyists (like myself), businesses and researchers, etc? It's hard to imagine not being able to use such an abstract concept in your code, for fear of litigation.<p>For a layman like myself, it almost sounds like someone patenting the idea of a chair, or a table.
Well, attempted to patent batch normalization back in 2015, it looks like its application status is still pending. Which in some ways is worse, since that sounds closer to when it was first becoming popular, and from the description it does sound like they tried to patent the general computational method, not any specific implementation.<p>Can people patent computing F(x) when F is just some function when it has such a low descriptive complexity? Where's the cutoff?
Apparently, the patent has already been granted in Europe according to this page - <a href="https://piip.co.kr/en-us/news/batch-normalization-layers-google" rel="nofollow">https://piip.co.kr/en-us/news/batch-normalization-layers-goo...</a>
I see two possible explanations to this patent :<p>- Google is rather open about deep learning development, they want to protect the ecosystem from patent trolls. It is a defensive patent to eventually punish unfair players who don't want to play the collaborative game<p>- Google noticed OpenAI, which (legally) built stuff using some Google's findings in the field. Now OpenAI is aiming at becoming a multi-billion dollars successful "non-profit" company, and Google wants its share of the money if it happens.
Isn't this outdated retro technology compared to layer normalization?<p>>Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case. This significantly reduces the training time in feed-forward neural networks. However, the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity. Unlike batch normalization, layer normalization performs exactly the same computation at training and test times. It is also straightforward to apply to recurrent neural networks by computing the normalization statistics separately at each time step. Layer normalization is very effective at stabilizing the hidden state dynamics in recurrent networks. Empirically, we show that layer normalization can substantially reduce the training time compared with previously published techniques.<p><a href="https://arxiv.org/abs/1607.06450" rel="nofollow">https://arxiv.org/abs/1607.06450</a><p>just forget batched normalization<p>instead of computing mean and variance over a batch, compute mean and variance over all the incoming dendrites of a neuron, for each neuron; result: now you are using the same function during train and test time, putting it on a more rigorous mathematical footing, and it's adaptable to RNN