Jamba: Production-grade Mamba-based AI model

346 点作者 bubblehack3r大约 1 年前

22 条评论

There was a recent thread on explaining Mamba <a href="https://news.ycombinator.com/item?id=39501982">https://news.ycombinator.com/item?id=39501982</a> (<a href="https://www.kolaayonrinde.com/blog/2024/02/11/mamba.html" rel="nofollow">https://www.kolaayonrinde.com/blog/2024/02/11/mamba.html</a>)There was another one on the same thing, probably better <a href="https://news.ycombinator.com/item?id=39482428">https://news.ycombinator.com/item?id=39482428</a> (<a href="https://jackcook.com/2024/02/23/mamba.html" rel="nofollow">https://jackcook.com/2024/02/23/mamba.html</a>)

评论 #39856065 未加载

a_wild_dandan大约 1 年前

To those curious about the tradeoffs between transformer and state space model layers, I highly recommend Sasha Rush's video on it: <a href="https://www.youtube.com/watch?v=dKJEpOtVgXc" rel="nofollow">https://www.youtube.com/watch?v=dKJEpOtVgXc</a>

评论 #39860588 未加载

eigenvalue大约 1 年前

Has anyone gotten this to work in linux using 1 or 2 4090s? I get stuck on "Loading checkpoint shards: 71%" and then it bails. But weirdly nvidia-smi shows plenty of VRAM available. My machine has 256gb of RAM so I don't think that's the problem either. Really excited to try this one.

Reubend大约 1 年前

It's great to see a full production level model using Mamba. But when it comes to long context window benchmarks, I'd love to see performance as well as throughput. I was under the impressions that Mamba has huge increases in throughput at the cost of modest losses in accuracy when using long contexts.

评论 #39855406 未加载

评论 #39856509 未加载

skybrian大约 1 年前

> Jamba boasts an extensive context window of 256K tokens, equivalent to around 210 pages of text, while fitting up to 140K tokens on a single 80GB GPU.I realize this is a big improvement, but it’s striking how inefficient LLM’s are, that you need 80GB of GPU memory to analyze less than 1 megabyte of data. That’s a lot of bloat! Hopefully there’s a lot of room for algorithmic improvements.

评论 #39855393 未加载

评论 #39857845 未加载

评论 #39860568 未加载

评论 #39861610 未加载

评论 #39857344 未加载

评论 #39861496 未加载

gautamcgoel大约 1 年前

Why include self-attention layers at all? In other words, why not just alternate SSM and MLP layers?

评论 #39855518 未加载

google234123大约 1 年前

I’m pretty sure computational chemists were combining NNs with Kalman Filters for a while now… I recall the issue it was slow due to the N^2 size of the covariance matrix

评论 #39856528 未加载

unraveller大约 1 年前

Jamba-v0.1-hybrid-MoE (16x6B?) is like giving a big NOS boost to a mixtral 8x7B tier LLM. If true 256k context, 3x longer, faster & cheaper than anything else, it should mean an end to the One Model To Rule Them All mindset for now. The big boys will have to offer some version of it as separate but close side-kick integration to their hero offering.

ninjahatori大约 1 年前

On a side note: working over longer contexts also reminds me of MemGPT(<a href="https://github.com/cpacker/MemGPT">https://github.com/cpacker/MemGPT</a>) I think a similar concept can be applied to Mamba architecture models too.

zelphirkalt大约 1 年前

Is there a Sparabo too?It is always funny to see old names associated with totally different new things!

toddmorey大约 1 年前

Released with open weights!

CGamesPlay大约 1 年前

Does this mean that I can continue a chat without needing to send a full transcript? This feels like it could make inference a lot cheaper for multi-step dialogs.

haddr大约 1 年前

Will it be possible to run such model family in ollama?

评论 #39856247 未加载

kjkjadksj大约 1 年前

People need to pick better names. Mamba is already a popular python package and internet search tools are on their knees already.

moneycantbuy大约 1 年前

would a 192GB RAM mac studio or even a 7950x with 192GB RAM be practical for running this model for inference and possibly fine tuning? Especially if I don't need very low latency e.g. 1 token per second is fine for inference. i also have two 3090s.

评论 #39873084 未加载

kelseyfrog大约 1 年前

I'm glad we're seeing exploration into scaling post-transformer LLM architectures, but I'm disappointed that it has a context window. That was kind of the selling point of Mamba(and SSM models in general), right linear scaling because state+input=next_state+output?

评论 #39857954 未加载

评论 #39855364 未加载

评论 #39857084 未加载

zzzzzzzzzz10大约 1 年前

Where can I download and use it?

cs702大约 1 年前

Please link to the original post:<a href="https://www.ai21.com/blog/announcing-jamba" rel="nofollow">https://www.ai21.com/blog/announcing-jamba</a>Jamba looks fabulous. Good performance for its size and much more efficient than the available open alternatives.The key idea: One of out of every eight transformer blocks in Jamba applies dot-product attention with quadratic cost, but the other seven out of eight apply a Mamba layer with linear cost. And the entire model is a mixture of experts(MoE) so only ~12B parameters are used at once for inference.Thank you to the folks at AI21 for making Jamba available!

评论 #39858412 未加载

评论 #39858869 未加载

ipsum2大约 1 年前

@dang this is blogspam for the official post: <a href="https://www.ai21.com/blog/announcing-jamba" rel="nofollow">https://www.ai21.com/blog/announcing-jamba</a>

评论 #39856540 未加载

krasin大约 1 年前

The license is a proper open-source one: Apache 2.0. Thanks, AI21 Labs.

评论 #39857161 未加载

评论 #39857972 未加载

评论 #39855385 未加载

sleepingreset大约 1 年前

god damn

htrp大约 1 年前

compute still has cost?

评论 #39856567 未加载