TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Time Between The Lines: how memory access affects performance (2015)

11 点作者 signa119 天前

3 条评论

mikewarot7 天前
I imagine coprocessors that don&#x27;t have separate memory or instructions... they are effectively huge arrays of look up tables, so that the instructions have the data flow through them. We&#x27;re at the stage where this is possible for all but the biggest of LLMs.<p>A side effect of doing this mapping, even without the hardware, is that the mapping makes a given task inherently parallel, and much, much easier to spread across low cost CPUs. I think of it as a universal solvent for computation.
GarvielLoken7 天前
This is actually what DOTS (Unity’s Data-Oriented Technology Stack) do in Unity, so very good to use a game engine as an example! It reportedly is just as enourmous of an performance gain as you show in the article.<p><a href="https:&#x2F;&#x2F;unity.com&#x2F;dots" rel="nofollow">https:&#x2F;&#x2F;unity.com&#x2F;dots</a>
LorenPechtel7 天前
Yup, memory access dominates an awful lot of things. We keep obsessing about more and faster cores, but if they&#x27;re just waiting on memory it doesn&#x27;t really do that much. A while back I did an experiment with the Sieve of Eratosthenes--and found that with modern systems the scattered memory access dominates. Finding all primes up to value X was much faster by brute force than using the Sieve. The brute force approach runs entirely from the L1 cache, the only operations outside it are writes. The Sieve ensures the only cache hits are from prefetching.<p>While this is obviously an extreme case the reality is that one must consider the cost of precalculated data, you can do quite a few operations for the cost of reading one answer from a table that doesn&#x27;t fit cache. And there can be substantial benefits when iterating multi-dimensional arrays correctly. A total flip from when I started out where you considered the cost in memory from precalculating (is it worth the memory to build this table of square roots??) to now where the cost in time from looking up the answer (is it worth the memory fetch to look up that square root? Nope.)