They show that a decoder only transformer (which gpts are) are rnns with infinite hidden state size. Infinite hidden state size is a pretty strong thing! Sounds interesting to me.
I've seen at least 6 such papers, all being like "<popular architecture> are actually <a bit older concept>". Neural networks are generic enough that you can make them equivalent to almost everything.