科技回声

9 条评论

It seems Sepp Hochreiter has talked already about this model since Oct 2023: <a href="https://github.com/huggingface/transformers/issues/27011">https://github.com/huggingface/transformers/issues/27011</a>In the scaling law comparison, I wonder if it is reasonable to compare number of parameters between Llama, Mamba, RWKV, xLSTM? Isn't compute time more relevant? E.g. in the figure about scaling laws, replace num of params by compute time.Specifically, the sLSTM has still recurrence (memory mixing) in it, i.e. you cannot fully parallelize the computation. So scaling up Transformer could still look better when you look at compute time.It seems neither the code nor the model params are released. I wonder if that will follow.

评论 #40295657 未加载

评论 #40295288 未加载

评论 #40297477 未加载

KhoomeiK大约 1 年前

For those who don't know, the senior author on this paper (Sepp Hochreiter) was the first author on the original paper with Schmidhuber introducing LSTMs in 1997.

评论 #40295783 未加载

评论 #40295393 未加载

WithinReason大约 1 年前

I like the color coded equations, I wish they would become a thing. We have syntax highlighting for programming languages, it's time we have it for math too.

评论 #40296528 未加载

评论 #40309527 未加载

GistNoesis大约 1 年前

Can someone explain the economics behind this ?The claim is something than will replace the transformer, a technology powering a good chunk of AI companies.The paper's authors seems to be either from a public university, or Sepp Hochreiter's private company or labs nx-ai.com <a href="https://www.nx-ai.com/en/xlstm" rel="nofollow">https://www.nx-ai.com/en/xlstm</a>Where is the code ? What is the license ? How are they earning money ? Why publish their secret recipe ? Will they not be replicated ? How will the rewards be commensurate with the value their algorithm bring ? Who will get money from this new technology ?

评论 #40296513 未加载

评论 #40297616 未加载

评论 #40298575 未加载

smusamashah大约 1 年前

Can someone ELI5 this? Reading comments it sounds like it's going to replace transformers which LLMs are based on? Is it something exponentially better than current tech on scale?

评论 #40299851 未加载

jasonjmcghee大约 1 年前

They reference "a GPT-3 model with 356M parameters"So GPT-3 Medium (from the GPT-3 paper) - feels pretty disingenuous to list that as no one is referencing that model when they say "GPT-3", but the 175B model.I wasn't aware that size of the model (356M) was released- what am I missing here?I also think it's relatively well understood that (with our current methods) transformers have a tipping point with parameter count, and I don't know of any models less than ~3B that are useful- arguably 7B.Compare these benchmarks to, say, the RWKV 5/6 paper <a href="https://arxiv.org/abs/2404.05892" rel="nofollow">https://arxiv.org/abs/2404.05892</a>

评论 #40300267 未加载

elygre大约 1 年前

I have no idea about what this is, so going off topic:The name XLSTM reminds me of the time in the late eighties when my university professor got accepted to hold a presentation on WOM: write-only memory.

评论 #40296053 未加载

评论 #40296693 未加载

sigmoid10大约 1 年前

Another week, another paper that thinks they can revive recurrent networks. Although this time the father of LSTM is a co-author, so this paper should not come as a surprise. Sadly, the results seem to indicate that even by employing literally all tricks of the trade, their architecture can't beat the throughput of flash-attention (not by a long shot, but that is not surprising for recurrent designs) and, on top of that, it is even slower than Mamba, which offers similar accuracy at lower cost. So my money is on this being another DOA architecture, like all the others we've seen this year already.

评论 #40295784 未加载

评论 #40296205 未加载

beAbU大约 1 年前

I thought this was some extension or enhancement to XSLT.

评论 #40297352 未加载

9 条评论

albertzeyer大约 1 年前

评论 #40295657 未加载

评论 #40295288 未加载

评论 #40297477 未加载

KhoomeiK大约 1 年前

For those who don't know, the senior author on this paper (Sepp Hochreiter) was the first author on the original paper with Schmidhuber introducing LSTMs in 1997.

评论 #40295783 未加载

评论 #40295393 未加载

WithinReason大约 1 年前

I like the color coded equations, I wish they would become a thing. We have syntax highlighting for programming languages, it's time we have it for math too.

评论 #40296528 未加载

评论 #40309527 未加载

GistNoesis大约 1 年前

评论 #40296513 未加载

评论 #40297616 未加载

评论 #40298575 未加载

smusamashah大约 1 年前

Can someone ELI5 this? Reading comments it sounds like it's going to replace transformers which LLMs are based on? Is it something exponentially better than current tech on scale?

xLSTM: Extended Long Short-Term Memory

9 条评论

xLSTM: Extended Long Short-Term Memory

9 条评论