TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Lamini Memory Tuning: 10x Fewer Hallucinations

128 pointsby galeos11 months ago

15 comments

vessenes11 months ago
Woww, very creative and interesting idea: I understand it as: train a bunch of fact-based LoRAs to zero loss (they mention 100k different ones), then use RAG to pick the appropriate Loras for a query.<p>So cool. The only moat I can think of for such a company would be proprietary fact loras- basically licensing a modern ai encyclopedia.<p>Anyway, really nice idea.
评论 #40676667 未加载
评论 #40677678 未加载
评论 #40678086 未加载
评论 #40682005 未加载
XCSme11 months ago
Doesn&#x27;t this make the &quot;AI&quot; even less creative and more like full-text-search instead? What makes some data a &quot;fact&quot;? If everything is written in the training data, in the end, won&#x27;t everything be treated like a fact? So the LLM will have 100% accuracy and 0% creativity.
评论 #40678101 未加载
评论 #40677113 未加载
评论 #40676961 未加载
评论 #40677360 未加载
评论 #40694335 未加载
评论 #40676950 未加载
thisisauserid11 months ago
Am I the only one that cringes at &quot;10x fewer?&quot;<p>How do I multiply positive numbers and get something smaller?<p>Is &quot;1&#x2F;10th&quot; or &quot;90% less&quot; not better arithmetic?<p>Maybe I should have done more gooder at math but it hurts my ears (eyes).
评论 #40677005 未加载
评论 #40677915 未加载
评论 #40677051 未加载
评论 #40677015 未加载
评论 #40676979 未加载
评论 #40677041 未加载
评论 #40683029 未加载
peter_l_downs11 months ago
Has anyone here used this, or anything similar? This sounds phenomenal if it really works. Looks like “contact us” is the only way to try it or buy it right now, and the purported benefit (memorize facts up to the model training size, basically trillions of tokens of facts) is wild. I’d love to try a system running this way to understand the failure modes, like for instance how does it reliably infer which “facts” to use?
KeyBoardG11 months ago
10x less is a weird way of saying 90% less, or better yet reduced to 10% from before.
评论 #40676339 未加载
wokwokwok11 months ago
The website says:<p>&gt; At inference time, the model retrieves the most relevant experts at each layer and merges back into the base model to respond to the user query.<p>The paper says:<p>&gt; At inference time, only the relevant experts are retrieved from the index, allowing the LLM to store a large number of facts while maintaining low inference latency. We use specialized GPU kernels written in Triton Tillet et al. (2019) to accelerate the lookup of experts.<p>...but darned if I can understand from either what they&#x27;re <i>actually doing</i> when they say that.<p>Why do you need a custom GPU kernel for this outside of the normal NN layers?<p>Can anyone see an explanation of <i>how</i> they pick <i>which</i> expert to use?
评论 #40676906 未加载
raffraffraff11 months ago
nit: I hate trying to work out what &quot;10x fewer&quot; or &quot;10x less&quot; mean.<p>Let&#x27;s say I have a counter value, X = 100. I reduce that to 10. How can I phrase that in English?<p>&quot;Value reduced to 10% of original value&quot;<p>&quot;Value reduced by 90%“<p>&quot;New value is one tenth the original Value&quot;<p>Using multiplication with a positive integer and saying &quot;less&quot; just seems incomprehensible when it flies by during a sentence and I can&#x27;t stop myself from mentally saying &quot;No&quot;, like Neo at the end of the Matrix when the three agents fire a volley of bullets at him down the corridor. &quot;No, this sentence stops right here while I pick it apart&quot;
评论 #40678737 未加载
评论 #40678712 未加载
评论 #40679469 未加载
aetherspawn11 months ago
Here I was hoping that there would be some kind of regulatory framework or protections put in place for AI before it became smart enough to actually take over the world.<p>Being able to say &quot;you are wrong, taking over the world is a bad idea&quot; and have the model respond with &quot;oh you are completely right, I am very sorry for that&quot; was our first line of defense.<p>I wonder if this model will argue insistently that you are wrong if you try and tell it that 1+1=3, and if so, whether that expands to philosophical issues such as the model arguing back at you based on its formed opinions on ethics and history.
xrd11 months ago
It feels like the two dumb ways to customize an open LLM are fine tuning and RAG. The former is expensive and complicated, the latter adds complexity to your queries but doesn&#x27;t require up front compute for retraining.<p>I couldn&#x27;t tell how expensive this is up front, or what complexity it adds to the setup. Anyone know?<p>It&#x27;s definitely an interesting idea but if you have to pay $100k for all that LoRA, what margins are left over?
评论 #40680444 未加载
UniverseHacker11 months ago
&quot;Hallucinations&quot; are the creative aspect of LLMs, which is what they are more useful for- if anything we want more of them. We already have much simpler systems that search and regurgitate facts. We need more intelligent hallucinations that are consistent with and extend rather than conflict with the data.
评论 #40677145 未加载
评论 #40678283 未加载
luke-stanley11 months ago
If less hallucinations are the goal, surely this is a bit over the top? Surely if you have the ground truth facts available, then fine-tuning for EVERY subject area seems much more work than using the facts with retrieval augmented generation, and making sure that the facts line up?
ssheng11 months ago
Creative idea. Any data on how much it takes to load the LoRAs and how much latency it adds to the generation speed?
youssefabdelm11 months ago
Hopefully someone reproduces results with code... cant find any code they shared
评论 #40677993 未加载
ziptron11 months ago
What is the hallucination rate of, for example, a Llama3 or GPT4?
评论 #40676514 未加载
29athrowaway11 months ago
Can it win at Jeopardy?