The posted article isn't particularly fascinating, but for a bit of fun, there's an OpenAI project where they demonstrate that due to the non-linear rounding of Float32 values you can actually train "non-linear" linear networks: <a href="https://openai.com/blog/nonlinear-computation-in-linear-networks/" rel="nofollow">https://openai.com/blog/nonlinear-computation-in-linear-netw...</a>
You don't need advanced math to answer this question. If there's no activation function then all the weights in each layer can be multiplied together and the whole network is just a linear classifier.