TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs

230 点作者 cpldcpu9 天前

12 条评论

cpldcpu9 天前
Some more background information:<p>One of the original proposals for in-DRAM compute: <a href="https:&#x2F;&#x2F;users.ece.cmu.edu&#x2F;~omutlu&#x2F;pub&#x2F;in-DRAM-bulk-AND-OR-ieee_cal15.pdf" rel="nofollow">https:&#x2F;&#x2F;users.ece.cmu.edu&#x2F;~omutlu&#x2F;pub&#x2F;in-DRAM-bulk-AND-OR-ie...</a><p>First demonstration with off-the-shelf parts: <a href="https:&#x2F;&#x2F;parallel.princeton.edu&#x2F;papers&#x2F;micro19-gao.pdf" rel="nofollow">https:&#x2F;&#x2F;parallel.princeton.edu&#x2F;papers&#x2F;micro19-gao.pdf</a><p>DRAM Bender, the tool they are using to implement this: <a href="https:&#x2F;&#x2F;github.com&#x2F;CMU-SAFARI&#x2F;DRAM-Bender">https:&#x2F;&#x2F;github.com&#x2F;CMU-SAFARI&#x2F;DRAM-Bender</a><p>Memory-Centric Computing: Recent Advances in Processing-in-DRAM<a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2412.19275" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2412.19275</a>
评论 #43894204 未加载
userbinator9 天前
Did anyone else notice the absolutely insane author lists of references 1 and 3?<p>I was expecting to find this 2016 article in there: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=12469270">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=12469270</a><p>This 2019 one does show up: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22712811">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22712811</a><p>Of course, this &quot;out of spec&quot; behaviour of DRAM, more specifically the ability to do copying, is also implicated in this infamous bug: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=5314959">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=5314959</a><p>It seems more than one person independently observed such a thing, and thought &quot;this might be a useful behaviour&quot;.
评论 #43895417 未加载
评论 #43894918 未加载
walterbell9 天前
<i>&gt; By intentionally issuing DRAM commands that violate manufacturer-specified timing parameters.. [gaining] massive parallelism up to 65,536 bitwise operations in parallel.</i><p>Take that, binary blobs for DRAM training!
robwwilliams9 天前
This is just mind-bendingly weird and wonderfully creative. It can pay to work in the weeds! Bravo.
评论 #43892404 未加载
Bolwin9 天前
They&#x27;re doing matrix operations in the Dram itself? That sounds insane and also fascinating
评论 #43891592 未加载
评论 #43891586 未加载
chasd008 天前
In the hardware world are there risks of taking advantage of a bug knowing that the manufacturer may someday fix the bug? I know in the software world it&#x27;s a bad idea to leverage a bug in a platform to enable a feature (or fix another bug). The bug you&#x27;re counting on being present may get fixed 15 years in the future and then your system explodes and no one knows why.<p>edit: seems like there was a recent discussion about something similar... undefined behavior in some C function iirc
评论 #43897286 未加载
评论 #43900370 未加载
评论 #43897024 未加载
protocolture8 天前
&gt;General matrix-vector multiplication (GeMV)<p>Ok, so my math isnt great.<p>When I was studying Quaternions during my 3d math class (That I failed the first time, like I said, not a math guy) they briefly covered the history of matrix calculation in graphics development.<p>My understanding is that Quaternions became popular because they are <i>almost</i> as accurate as matrices but much less complex computationally.<p>Has anyone tried building an LLM using Quats instead of matrices?<p>Or are the optimisations with Quaternions more useful in realtime?
评论 #43893422 未加载
评论 #43893436 未加载
评论 #43893448 未加载
评论 #43893416 未加载
morphle9 天前
A bit unscientific that they don&#x27;t cite the original Intelligent RAM (IRAM) sources from 1997:<p><a href="https:&#x2F;&#x2F;scholar.google.com&#x2F;scholar?hl=en&amp;as_sdt=0%2C5&amp;q=iram+patterson&amp;btnG=" rel="nofollow">https:&#x2F;&#x2F;scholar.google.com&#x2F;scholar?hl=en&amp;as_sdt=0%2C5&amp;q=iram...</a>
评论 #43892466 未加载
willvarfar9 天前
Can we expect to see matrix multiplication and perhaps other ops move from classic CPUs out into the DRAM, perhaps with deliberate hardware support?<p>And does such a processing shift give advantage to Samsung etc? Where does this leave NVIDIA etc?
评论 #43892616 未加载
lolc7 天前
Funny hack. Without having read the paper I&#x27;d assume the operations to be thermally unstable. So LLM inference results will vary based on environmental temperature :-)
评论 #43904604 未加载
xiphias29 天前
This woule be a cool way to make a cheap inferencing device for the largest LLMs
swimwiththebeat9 天前
So is this a new technique of doing computations within existing DRAM to overcome the memory wall issue of modern computing?