That's interesting. Is it still that the Rectified Linear Unit (ReLU) is the prevailing activation function in deep neural networks, because of the the vanishing gradients with activation functions like tanh? If so the conclusions from the paper would apply to a very wide range of deep neural networks.