Besides the the title being a bit exaggerated, it's been long-acknowledged in the GNN community that graph rewiring often helps learning; see [1, 2] for example. Also, at first skim, I'm surprised there's no discussion on oversmoothing/oversquashing in this paper. It seems like what they're calling "overfitting" on the graph structure could be interpreted as the GNN's node representations saturating/starving because of poor graph topology.<p>[1] - <a href="https://arxiv.org/abs/2111.14522" rel="nofollow noreferrer">https://arxiv.org/abs/2111.14522</a>
[2] - <a href="https://arxiv.org/abs/2210.02997" rel="nofollow noreferrer">https://arxiv.org/abs/2210.02997</a>
Isn't the natural solution to just use attention layers instead of graph convolution layers then? Then there the attention mechanism will end up learning the useful graph structure via the attention weights?
I'm curious how this relates to MDL-tied regularization. I don't see any explicit reference to the L2 norm here, and I'm not too smart about graph neural networks.<p>But it seems like overfitting is almost necessary unless some kind of obscure variational technique is used? They did have one paper where they used a progressively compressed GNN to isolate an approximation for a cosmological equation smaller and more accurate than had been previously achieved. I'll update here if I find it.<p>Edit: Ah, here we are: <a href="https://arxiv.org/abs/2006.11287" rel="nofollow noreferrer">https://arxiv.org/abs/2006.11287</a> I believe there was some very good followup work posted on it this year that I saw some movement during the 3-5 months that I was really active on Twitter (brief time that it was), seems like an extremely promising area, and one I should learn if I truly do care about the Kolmogorov complexity of a problem.