TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Researchers upend AI status quo by eliminating matrix multiplication in LLMs

81 点作者 disillusioned111 个月前

9 条评论

tomohelix11 个月前
The relevant paper: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2406.02528" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2406.02528</a><p>In summary, they forced the model to process data in ternary system and then build a custom FPGA chip to process the data more efficiently. Tested to be &quot;comparable&quot; to small models (3B), theoretically scale to 70B, unknown for SOTAs (&gt;100B params).<p>We have always known custom chips are more efficient especially for tasks like these where it is basically approximating an analog process (i.e. the brain). What is impressive is how fast it is prgressing. These 3B params models would demolish GPT2 which was, what, 4-5 years old? And they would be pure scifi tech 10 years ago.<p>Now they can run on your phone.<p>A machine, running locally on your phone, that can listen and respond to anything a human may say. Who could have confidently claim this 10 years ago?
评论 #40794864 未加载
评论 #40799530 未加载
anon29111 个月前
Note that the architecture does use matmuls. They just defined ternary matmuls to not be &#x27;real&#x27; matrix multiplication. I mean... it is certainly a good thing for power consumption to be wrangling less bits, but from a semantic standpoint, it is matrix multiplication.
JKCalhoun11 个月前
&quot;Call my broker, tell him to sell all my NVDA!&quot;<p>Combined with the earlier paper this year that claimed LLMs work fine (and faster) with trinary numbers (rather than floats? or long ints?) — the idea of running a quick LLM local is looking better and better.
评论 #40794820 未加载
ChrisArchitect11 个月前
[dupe]<p>Some more discussion a few weeks ago: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40620955">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40620955</a>
bee_rider11 个月前
Noooooooo<p>The whole point of AI was to sell premium GEMMs and come up with funky low precision accelerators.
mysteria11 个月前
There&#x27;s additional discussion on the same research in an earlier thread [1].<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40787349">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40787349</a>
MiguelX41311 个月前
The pre-print is <a href="https:&#x2F;&#x2F;doi.org&#x2F;10.48550&#x2F;arXiv.2406.02528" rel="nofollow">https:&#x2F;&#x2F;doi.org&#x2F;10.48550&#x2F;arXiv.2406.02528</a>
aixpert11 个月前
these quantization are throwing away an advantage of analog computers to handle imprecise &quot;floats&quot;
skeledrew11 个月前
Heh, Nvidia may want to take steps to bury this. Will likely be a humongous loss for them if it pans out.