科技回声

Note: GNU AGPLv3. Industry labs won’t touch this with a hundred foot pole. Given that they’re the only ones with access to serious resources, it could be a while before we see a large model of this architecture

This is exciting because it is an architecture that had so much promise, but we could never solve the gradient/parallelization problems better than transformers.<p>This code will allow people yo experiment and see if it is a viable architecture at foundation/frontier model scale.

Recent and related:<p><i>xLSTM: Extended Long Short-Term Memory</i> - <a href="https://news.ycombinator.com/item?id=40294650">https://news.ycombinator.com/item?id=40294650</a> - May 2024 (73 comments)

Could someone provide a quick summary where they stand compared to transformer architectures? Do they have real world scale results that are competitive?

I'm not clear on what advantage this architecture has over mamba/Griffin. They also have the linear scaling, better sequence parallelism and are competitive in performance with transformers.

Are there any studies on predicting neural architecture scaling? E.g. a small training dataset which indicates performance on a large training dataset?

Congrats to the x.AI team!

Could someone provide a quick summary where they stand compared to transformer architectures? Do they have real world scale results that are competitive?

I'm not clear on what advantage this architecture has over mamba/Griffin. They also have the linear scaling, better sequence parallelism and are competitive in performance with transformers.

Are there any studies on predicting neural architecture scaling? E.g. a small training dataset which indicates performance on a large training dataset?

Congrats to the x.AI team!

xLSTM code release by NX-AI

7 条评论

xLSTM code release by NX-AI

7 条评论