TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Quintessential readings for software optimization

5 pointsby kansaiover 6 years ago
What are some of the best resources for getting more familiar with stuff like playing nicely with branch prediction, instruction pipelines, and building data to fit well in with CPU cache?

1 comment

Paul_Diraqover 6 years ago
The best resource in my opinion is still &quot;What every programmer should know about memory.&quot; It is now over ten years old but still the most comprehesive and comprehensible post I know about that topic.<p><a href="https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;250967&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;250967&#x2F;</a><p>AFAIK there have been no major breakthroughts since that time in memory technology (Transactional memory could be one). While modern processors may be build a bit different (e.g. added shared L3 chache) you will have no problem understanding them.<p>There is also this github :<p><a href="https:&#x2F;&#x2F;github.com&#x2F;Kobzol&#x2F;hardware-effects" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Kobzol&#x2F;hardware-effects</a><p>for programs which should show such effects. (I haven&#x27;t used them yet but they may be interesting for you to play with.)<p>IT Hare has published a nice article about those costs including rules of thumb costs.<p><a href="http:&#x2F;&#x2F;ithare.com&#x2F;infographics-operation-costs-in-cpu-clock-cycles&#x2F;" rel="nofollow">http:&#x2F;&#x2F;ithare.com&#x2F;infographics-operation-costs-in-cpu-clock-...</a><p>Then there are the Agner tables. Giving you latencies and reciprocal throughput for your instructions.<p><a href="https:&#x2F;&#x2F;www.agner.org&#x2F;optimize&#x2F;instruction_tables.pdf" rel="nofollow">https:&#x2F;&#x2F;www.agner.org&#x2F;optimize&#x2F;instruction_tables.pdf</a><p>If you are through these you have to read the intel and AMD optimization guides. (ARM may or may not have something similar.)<p>The paper by Kazushige Goto (&quot;Anatomy of a High Performance Matrix Multiplication&quot;) is an example of Cache and TLB considerations in a non-trivial example.<p><a href="http:&#x2F;&#x2F;www.cs.utexas.edu&#x2F;users&#x2F;pingali&#x2F;CS378&#x2F;2008sp&#x2F;papers&#x2F;gotoPaper.pdf" rel="nofollow">http:&#x2F;&#x2F;www.cs.utexas.edu&#x2F;users&#x2F;pingali&#x2F;CS378&#x2F;2008sp&#x2F;papers&#x2F;g...</a>