> We propose the nested network architecture since it shares the parameters via repetitions of sub-network activation functions. In other words, a NestNet can provide a special parameter-sharing scheme. This is the key reason why the NestNet has much better approximation power than the standard network.<p>It would be interesting to see an experiment that compares their CNN2 model with other parameter-sharing schemes such as networks using hyper-convolutions [0][1][2].<p>[0] Ma, T., Wang, A. Q., Dalca, A. V., & Sabuncu, M. R. (2022). Hyper-Convolutions via Implicit Kernels for Medical Imaging. arXiv preprint arXiv:2202.02701.<p>[1] Chang, O., Flokas, L., & Lipson, H. (2019, September). Principled weight initialization for hypernetworks. In International Conference on Learning Representations.<p>[2] Ukai, K., Matsubara, T., & Uehara, K. (2018, November). Hypernetwork-based implicit posterior estimation and model averaging of cnn. In Asian Conference on Machine Learning (pp. 176-191). PMLR.