Lamini Memory Tuning: 10x Fewer Hallucinations

128 pointsby galeos11 months ago

15 comments

vessenes11 months ago

Woww, very creative and interesting idea: I understand it as: train a bunch of fact-based LoRAs to zero loss (they mention 100k different ones), then use RAG to pick the appropriate Loras for a query.So cool. The only moat I can think of for such a company would be proprietary fact loras- basically licensing a modern ai encyclopedia.Anyway, really nice idea.

评论 #40676667 未加载

评论 #40677678 未加载

评论 #40678086 未加载

评论 #40682005 未加载

XCSme11 months ago

Doesn't this make the "AI" even less creative and more like full-text-search instead? What makes some data a "fact"? If everything is written in the training data, in the end, won't everything be treated like a fact? So the LLM will have 100% accuracy and 0% creativity.

评论 #40678101 未加载

评论 #40677113 未加载

评论 #40676961 未加载

评论 #40677360 未加载

评论 #40694335 未加载

评论 #40676950 未加载

thisisauserid11 months ago

Am I the only one that cringes at "10x fewer?"How do I multiply positive numbers and get something smaller?Is "1/10th" or "90% less" not better arithmetic?Maybe I should have done more gooder at math but it hurts my ears (eyes).

评论 #40677005 未加载

评论 #40677915 未加载

评论 #40677051 未加载

评论 #40677015 未加载

评论 #40676979 未加载

评论 #40677041 未加载

评论 #40683029 未加载

peter_l_downs11 months ago

Has anyone here used this, or anything similar? This sounds phenomenal if it really works. Looks like “contact us” is the only way to try it or buy it right now, and the purported benefit (memorize facts up to the model training size, basically trillions of tokens of facts) is wild. I’d love to try a system running this way to understand the failure modes, like for instance how does it reliably infer which “facts” to use?

KeyBoardG11 months ago

10x less is a weird way of saying 90% less, or better yet reduced to 10% from before.

评论 #40676339 未加载

wokwokwok11 months ago

The website says:> At inference time, the model retrieves the most relevant experts at each layer and merges back into the base model to respond to the user query.The paper says:> At inference time, only the relevant experts are retrieved from the index, allowing the LLM to store a large number of facts while maintaining low inference latency. We use specialized GPU kernels written in Triton Tillet et al. (2019) to accelerate the lookup of experts....but darned if I can understand from either what they're actually doing when they say that.Why do you need a custom GPU kernel for this outside of the normal NN layers?Can anyone see an explanation of how they pick which expert to use?

评论 #40676906 未加载

raffraffraff11 months ago

nit: I hate trying to work out what "10x fewer" or "10x less" mean.Let's say I have a counter value, X = 100. I reduce that to 10. How can I phrase that in English?"Value reduced to 10% of original value""Value reduced by 90%“"New value is one tenth the original Value"Using multiplication with a positive integer and saying "less" just seems incomprehensible when it flies by during a sentence and I can't stop myself from mentally saying "No", like Neo at the end of the Matrix when the three agents fire a volley of bullets at him down the corridor. "No, this sentence stops right here while I pick it apart"

评论 #40678737 未加载

评论 #40678712 未加载

评论 #40679469 未加载

aetherspawn11 months ago

Here I was hoping that there would be some kind of regulatory framework or protections put in place for AI before it became smart enough to actually take over the world.Being able to say "you are wrong, taking over the world is a bad idea" and have the model respond with "oh you are completely right, I am very sorry for that" was our first line of defense.I wonder if this model will argue insistently that you are wrong if you try and tell it that 1+1=3, and if so, whether that expands to philosophical issues such as the model arguing back at you based on its formed opinions on ethics and history.

xrd11 months ago

It feels like the two dumb ways to customize an open LLM are fine tuning and RAG. The former is expensive and complicated, the latter adds complexity to your queries but doesn't require up front compute for retraining.I couldn't tell how expensive this is up front, or what complexity it adds to the setup. Anyone know?It's definitely an interesting idea but if you have to pay $100k for all that LoRA, what margins are left over?

评论 #40680444 未加载

UniverseHacker11 months ago

"Hallucinations" are the creative aspect of LLMs, which is what they are more useful for- if anything we want more of them. We already have much simpler systems that search and regurgitate facts. We need more intelligent hallucinations that are consistent with and extend rather than conflict with the data.

评论 #40677145 未加载

评论 #40678283 未加载

luke-stanley11 months ago

If less hallucinations are the goal, surely this is a bit over the top? Surely if you have the ground truth facts available, then fine-tuning for EVERY subject area seems much more work than using the facts with retrieval augmented generation, and making sure that the facts line up?

ssheng11 months ago

Creative idea. Any data on how much it takes to load the LoRAs and how much latency it adds to the generation speed?

youssefabdelm11 months ago

Hopefully someone reproduces results with code... cant find any code they shared

评论 #40677993 未加载

ziptron11 months ago

What is the hallucination rate of, for example, a Llama3 or GPT4?

评论 #40676514 未加载

29athrowaway11 months ago

Can it win at Jeopardy?