41 点作者 DreamGen超过 1 年前

2 条评论

They show that a decoder only transformer (which gpts are) are rnns with infinite hidden state size. Infinite hidden state size is a pretty strong thing! Sounds interesting to me.

评论 #39026638 未加载

Icko超过 1 年前

I've seen at least 6 such papers, all being like "<popular architecture> are actually <a bit older concept>". Neural networks are generic enough that you can make them equivalent to almost everything.

评论 #39019193 未加载

评论 #39019211 未加载

评论 #39019068 未加载

评论 #39019147 未加载

Transformers Are Multi-State RNNs

2 条评论

Transformers Are Multi-State RNNs

2 条评论