TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Jamba: Production-grade Mamba-based AI model

346 pointsby bubblehack3rabout 1 year ago

22 comments

smusamashahabout 1 year ago
There was a recent thread on explaining Mamba <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39501982">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39501982</a> (<a href="https:&#x2F;&#x2F;www.kolaayonrinde.com&#x2F;blog&#x2F;2024&#x2F;02&#x2F;11&#x2F;mamba.html" rel="nofollow">https:&#x2F;&#x2F;www.kolaayonrinde.com&#x2F;blog&#x2F;2024&#x2F;02&#x2F;11&#x2F;mamba.html</a>)<p>There was another one on the same thing, probably better <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39482428">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39482428</a> (<a href="https:&#x2F;&#x2F;jackcook.com&#x2F;2024&#x2F;02&#x2F;23&#x2F;mamba.html" rel="nofollow">https:&#x2F;&#x2F;jackcook.com&#x2F;2024&#x2F;02&#x2F;23&#x2F;mamba.html</a>)
评论 #39856065 未加载
a_wild_dandanabout 1 year ago
To those curious about the tradeoffs between transformer and state space model layers, I highly recommend Sasha Rush&#x27;s video on it: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=dKJEpOtVgXc" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=dKJEpOtVgXc</a>
评论 #39860588 未加载
eigenvalueabout 1 year ago
Has anyone gotten this to work in linux using 1 or 2 4090s? I get stuck on &quot;Loading checkpoint shards: 71%&quot; and then it bails. But weirdly nvidia-smi shows plenty of VRAM available. My machine has 256gb of RAM so I don&#x27;t think that&#x27;s the problem either. Really excited to try this one.
Reubendabout 1 year ago
It&#x27;s great to see a full production level model using Mamba. But when it comes to long context window benchmarks, I&#x27;d love to see performance as well as throughput. I was under the impressions that Mamba has huge increases in throughput at the cost of modest losses in accuracy when using long contexts.
评论 #39855406 未加载
评论 #39856509 未加载
skybrianabout 1 year ago
&gt; Jamba boasts an extensive context window of 256K tokens, equivalent to around 210 pages of text, while fitting up to 140K tokens on a single 80GB GPU.<p>I realize this is a big improvement, but it’s striking how inefficient LLM’s are, that you need 80GB of GPU memory to analyze less than 1 megabyte of data. That’s a lot of bloat! Hopefully there’s a lot of room for algorithmic improvements.
评论 #39855393 未加载
评论 #39857845 未加载
评论 #39860568 未加载
评论 #39861610 未加载
评论 #39857344 未加载
评论 #39861496 未加载
gautamcgoelabout 1 year ago
Why include self-attention layers at all? In other words, why not just alternate SSM and MLP layers?
评论 #39855518 未加载
google234123about 1 year ago
I’m pretty sure computational chemists were combining NNs with Kalman Filters for a while now… I recall the issue it was slow due to the N^2 size of the covariance matrix
评论 #39856528 未加载
unravellerabout 1 year ago
Jamba-v0.1-hybrid-MoE (16x6B?) is like giving a big NOS boost to a mixtral 8x7B tier LLM. If true 256k context, 3x longer, faster &amp; cheaper than anything else, it should mean an end to the One Model To Rule Them All mindset for now. The big boys will have to offer some version of it as separate but close side-kick integration to their hero offering.
ninjahatoriabout 1 year ago
On a side note: working over longer contexts also reminds me of MemGPT(<a href="https:&#x2F;&#x2F;github.com&#x2F;cpacker&#x2F;MemGPT">https:&#x2F;&#x2F;github.com&#x2F;cpacker&#x2F;MemGPT</a>) I think a similar concept can be applied to Mamba architecture models too.
zelphirkaltabout 1 year ago
Is there a Sparabo too?<p>It is always funny to see old names associated with totally different new things!
toddmoreyabout 1 year ago
Released with open weights!
CGamesPlayabout 1 year ago
Does this mean that I can continue a chat without needing to send a full transcript? This feels like it could make inference a lot cheaper for multi-step dialogs.
haddrabout 1 year ago
Will it be possible to run such model family in ollama?
评论 #39856247 未加载
kjkjadksjabout 1 year ago
People need to pick better names. Mamba is already a popular python package and internet search tools are on their knees already.
moneycantbuyabout 1 year ago
would a 192GB RAM mac studio or even a 7950x with 192GB RAM be practical for running this model for inference and possibly fine tuning? Especially if I don&#x27;t need very low latency e.g. 1 token per second is fine for inference. i also have two 3090s.
评论 #39873084 未加载
kelseyfrogabout 1 year ago
I&#x27;m glad we&#x27;re seeing exploration into scaling post-transformer LLM architectures, but I&#x27;m disappointed that it <i>has</i> a context window. That was kind of the selling point of Mamba(and SSM models in general), right linear scaling because state+input=next_state+output?
评论 #39857954 未加载
评论 #39855364 未加载
评论 #39857084 未加载
zzzzzzzzzz10about 1 year ago
Where can I download and use it?
cs702about 1 year ago
Please link to the original post:<p><a href="https:&#x2F;&#x2F;www.ai21.com&#x2F;blog&#x2F;announcing-jamba" rel="nofollow">https:&#x2F;&#x2F;www.ai21.com&#x2F;blog&#x2F;announcing-jamba</a><p>Jamba looks <i>fabulous</i>. Good performance for its size <i>and</i> much more efficient than the available open alternatives.<p>The key idea: One of out of every eight transformer blocks in Jamba applies dot-product attention with quadratic cost, but the other seven out of eight apply a Mamba layer with linear cost. And the entire model is a mixture of experts(MoE) so only ~12B parameters are used at once for inference.<p>Thank you to the folks at AI21 for making Jamba available!
评论 #39858412 未加载
评论 #39858869 未加载
ipsum2about 1 year ago
@dang this is blogspam for the official post: <a href="https:&#x2F;&#x2F;www.ai21.com&#x2F;blog&#x2F;announcing-jamba" rel="nofollow">https:&#x2F;&#x2F;www.ai21.com&#x2F;blog&#x2F;announcing-jamba</a>
评论 #39856540 未加载
krasinabout 1 year ago
The license is a proper open-source one: Apache 2.0. Thanks, AI21 Labs.
评论 #39857161 未加载
评论 #39857972 未加载
评论 #39855385 未加载
sleepingresetabout 1 year ago
god damn
htrpabout 1 year ago
compute still has cost?
评论 #39856567 未加载