Some more background information:<p>One of the original proposals for in-DRAM compute:
<a href="https://users.ece.cmu.edu/~omutlu/pub/in-DRAM-bulk-AND-OR-ieee_cal15.pdf" rel="nofollow">https://users.ece.cmu.edu/~omutlu/pub/in-DRAM-bulk-AND-OR-ie...</a><p>First demonstration with off-the-shelf parts:
<a href="https://parallel.princeton.edu/papers/micro19-gao.pdf" rel="nofollow">https://parallel.princeton.edu/papers/micro19-gao.pdf</a><p>DRAM Bender, the tool they are using to implement this:
<a href="https://github.com/CMU-SAFARI/DRAM-Bender">https://github.com/CMU-SAFARI/DRAM-Bender</a><p>Memory-Centric Computing: Recent Advances in Processing-in-DRAM<a href="https://arxiv.org/abs/2412.19275" rel="nofollow">https://arxiv.org/abs/2412.19275</a>
Did anyone else notice the absolutely insane author lists of references 1 and 3?<p>I was expecting to find this 2016 article in there: <a href="https://news.ycombinator.com/item?id=12469270">https://news.ycombinator.com/item?id=12469270</a><p>This 2019 one does show up: <a href="https://news.ycombinator.com/item?id=22712811">https://news.ycombinator.com/item?id=22712811</a><p>Of course, this "out of spec" behaviour of DRAM, more specifically the ability to do copying, is also implicated in this infamous bug: <a href="https://news.ycombinator.com/item?id=5314959">https://news.ycombinator.com/item?id=5314959</a><p>It seems more than one person independently observed such a thing, and thought "this might be a useful behaviour".
<i>> By intentionally issuing DRAM commands that violate
manufacturer-specified timing parameters.. [gaining] massive parallelism up to 65,536 bitwise operations in parallel.</i><p>Take that, binary blobs for DRAM training!
In the hardware world are there risks of taking advantage of a bug knowing that the manufacturer may someday fix the bug? I know in the software world it's a bad idea to leverage a bug in a platform to enable a feature (or fix another bug). The bug you're counting on being present may get fixed 15 years in the future and then your system explodes and no one knows why.<p>edit: seems like there was a recent discussion about something similar... undefined behavior in some C function iirc
>General matrix-vector multiplication (GeMV)<p>Ok, so my math isnt great.<p>When I was studying Quaternions during my 3d math class (That I failed the first time, like I said, not a math guy) they briefly covered the history of matrix calculation in graphics development.<p>My understanding is that Quaternions became popular because they are <i>almost</i> as accurate as matrices but much less complex computationally.<p>Has anyone tried building an LLM using Quats instead of matrices?<p>Or are the optimisations with Quaternions more useful in realtime?
A bit unscientific that they don't cite the original Intelligent RAM (IRAM) sources from 1997:<p><a href="https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=iram+patterson&btnG=" rel="nofollow">https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=iram...</a>
Can we expect to see matrix multiplication and perhaps other ops move from classic CPUs out into the DRAM, perhaps with deliberate hardware support?<p>And does such a processing shift give advantage to Samsung etc? Where does this leave NVIDIA etc?
Funny hack. Without having read the paper I'd assume the operations to be thermally unstable. So LLM inference results will vary based on environmental temperature :-)