Does anyone know if this is using the Mamba architecture[1] instead of transformers? It looks like it uses a state space model (SSM) layer.<p>[1]: <a href="https://arxiv.org/abs/2312.00752" rel="nofollow">https://arxiv.org/abs/2312.00752</a>
Piece with less detail than the source linked from the article: <a href="https://www.together.ai/blog/stripedhyena-7b" rel="nofollow">https://www.together.ai/blog/stripedhyena-7b</a>