TechEcho

12 comments

cpldcpu14 days ago

Some more background information:One of the original proposals for in-DRAM compute: <a href="https://users.ece.cmu.edu/~omutlu/pub/in-DRAM-bulk-AND-OR-ieee_cal15.pdf" rel="nofollow">https://users.ece.cmu.edu/~omutlu/pub/in-DRAM-bulk-AND-OR-ie...</a>First demonstration with off-the-shelf parts: <a href="https://parallel.princeton.edu/papers/micro19-gao.pdf" rel="nofollow">https://parallel.princeton.edu/papers/micro19-gao.pdf</a>DRAM Bender, the tool they are using to implement this: <a href="https://github.com/CMU-SAFARI/DRAM-Bender">https://github.com/CMU-SAFARI/DRAM-Bender</a>Memory-Centric Computing: Recent Advances in Processing-in-DRAM<a href="https://arxiv.org/abs/2412.19275" rel="nofollow">https://arxiv.org/abs/2412.19275</a>

评论 #43894204 未加载

userbinator14 days ago

Did anyone else notice the absolutely insane author lists of references 1 and 3?I was expecting to find this 2016 article in there: <a href="https://news.ycombinator.com/item?id=12469270">https://news.ycombinator.com/item?id=12469270</a>This 2019 one does show up: <a href="https://news.ycombinator.com/item?id=22712811">https://news.ycombinator.com/item?id=22712811</a>Of course, this "out of spec" behaviour of DRAM, more specifically the ability to do copying, is also implicated in this infamous bug: <a href="https://news.ycombinator.com/item?id=5314959">https://news.ycombinator.com/item?id=5314959</a>It seems more than one person independently observed such a thing, and thought "this might be a useful behaviour".

评论 #43895417 未加载

评论 #43894918 未加载

walterbell14 days ago

> By intentionally issuing DRAM commands that violate manufacturer-specified timing parameters.. [gaining] massive parallelism up to 65,536 bitwise operations in parallel.Take that, binary blobs for DRAM training!

robwwilliams14 days ago

This is just mind-bendingly weird and wonderfully creative. It can pay to work in the weeds! Bravo.

评论 #43892404 未加载

Bolwin14 days ago

They're doing matrix operations in the Dram itself? That sounds insane and also fascinating

评论 #43891592 未加载

评论 #43891586 未加载

chasd0013 days ago

In the hardware world are there risks of taking advantage of a bug knowing that the manufacturer may someday fix the bug? I know in the software world it's a bad idea to leverage a bug in a platform to enable a feature (or fix another bug). The bug you're counting on being present may get fixed 15 years in the future and then your system explodes and no one knows why.edit: seems like there was a recent discussion about something similar... undefined behavior in some C function iirc

评论 #43897286 未加载

评论 #43900370 未加载

评论 #43897024 未加载

protocolture13 days ago

>General matrix-vector multiplication (GeMV)Ok, so my math isnt great.When I was studying Quaternions during my 3d math class (That I failed the first time, like I said, not a math guy) they briefly covered the history of matrix calculation in graphics development.My understanding is that Quaternions became popular because they are almost as accurate as matrices but much less complex computationally.Has anyone tried building an LLM using Quats instead of matrices?Or are the optimisations with Quaternions more useful in realtime?

评论 #43893422 未加载

评论 #43893436 未加载

评论 #43893448 未加载

评论 #43893416 未加载

morphle14 days ago

A bit unscientific that they don't cite the original Intelligent RAM (IRAM) sources from 1997:<a href="https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=iram+patterson&btnG=" rel="nofollow">https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=iram...</a>

评论 #43892466 未加载

willvarfar14 days ago

Can we expect to see matrix multiplication and perhaps other ops move from classic CPUs out into the DRAM, perhaps with deliberate hardware support?And does such a processing shift give advantage to Samsung etc? Where does this leave NVIDIA etc?

评论 #43892616 未加载

lolc12 days ago

Funny hack. Without having read the paper I'd assume the operations to be thermally unstable. So LLM inference results will vary based on environmental temperature :-)

评论 #43904604 未加载

xiphias214 days ago

This woule be a cool way to make a cheap inferencing device for the largest LLMs

swimwiththebeat14 days ago

So is this a new technique of doing computations within existing DRAM to overcome the memory wall issue of modern computing?

12 comments

cpldcpu14 days ago

评论 #43894204 未加载

userbinator14 days ago

评论 #43895417 未加载

评论 #43894918 未加载

walterbell14 days ago

robwwilliams14 days ago

This is just mind-bendingly weird and wonderfully creative. It can pay to work in the weeds! Bravo.

评论 #43892404 未加载

Bolwin14 days ago

They're doing matrix operations in the Dram itself? That sounds insane and also fascinating

评论 #43891592 未加载

评论 #43891586 未加载

chasd0013 days ago

评论 #43897286 未加载

评论 #43900370 未加载

评论 #43897024 未加载

protocolture13 days ago

评论 #43893422 未加载

评论 #43893436 未加载

评论 #43893448 未加载

评论 #43893416 未加载

morphle14 days ago

评论 #43892466 未加载

willvarfar14 days ago

评论 #43892616 未加载

lolc12 days ago

Funny hack. Without having read the paper I'd assume the operations to be thermally unstable. So LLM inference results will vary based on environmental temperature :-)

评论 #43904604 未加载

xiphias214 days ago

This woule be a cool way to make a cheap inferencing device for the largest LLMs

swimwiththebeat14 days ago

So is this a new technique of doing computations within existing DRAM to overcome the memory wall issue of modern computing?

Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs

12 comments

Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs

12 comments