TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

What every programmer should know about SSDs

452 点作者 sprachspiel将近 4 年前

25 条评论

bob1029将近 4 年前
Things I have learned about SSDs:<p>If you want to go fast &amp; save NAND lifetime, use append-only log structures.<p>If you want to go even faster &amp; save even more NAND lifetime, batch your writes in software (i.e. some ring buffer with natural back-pressure mechanism) and then serialize them with a single writer into an append-only log structure. Many newer devices have something like this at the hardware level, but your block size is still a constraint when working in hardware. If you batch in software, you can hypothetically write multiple logical business transactions <i>per</i> block I&#x2F;O. When you physical block size is 4k and your logical transactions are averaging 512b of data, you would be leaving a lot of throughput on the table.<p>Going down 1 level of abstraction seems important if you want to extract the most performance from an SSD. Unsurprisingly, the above ideas also make ordinary magnetic disk drives more performant &amp; potentially last longer.
评论 #27573875 未加载
评论 #27574940 未加载
评论 #27575721 未加载
评论 #27574057 未加载
评论 #27574242 未加载
评论 #27587499 未加载
评论 #27577604 未加载
评论 #27574157 未加载
jedberg将近 4 年前
This page tells me a lot about SSDs, but it doesn&#x27;t tell me why I need to know these things. It doesn&#x27;t really give me any indication about how I should change my behavior if I know that I&#x27;ll be running on SSD vs spinning disk.<p>I&#x27;ve always been told, &quot;just treat SSDs like slow, permanent memory&quot;.
评论 #27574807 未加载
评论 #27587860 未加载
评论 #27573355 未加载
评论 #27576479 未加载
评论 #27573514 未加载
评论 #27575529 未加载
klodolph将近 4 年前
If you care about SSDs, one paper you <i>should</i> read is “Don’t Stack Your Log on My Log” by Yang et al. 2014<p><a href="https:&#x2F;&#x2F;www.usenix.org&#x2F;system&#x2F;files&#x2F;conference&#x2F;inflow14&#x2F;inflow14-yang.pdf" rel="nofollow">https:&#x2F;&#x2F;www.usenix.org&#x2F;system&#x2F;files&#x2F;conference&#x2F;inflow14&#x2F;infl...</a><p>&gt; Log-structured applications and file systems have been used to achieve high write throughput by sequentializing writes. Flash-based storage systems, due to flash memory’s out-of-place update characteristic, have also relied on log-structured approaches. Our work investigates the impacts to performance and endurance in flash when multiple layers of log-structured applications and file systems are layered on top of a log-structured flash device. We show that multiple log layers affects sequentiality and increases write pressure to flash devices through randomization of workloads, unaligned segment sizes, and uncoordinated multi-log garbage collection. All of these effects can combine to negate the intended positive affects of using a log. In this paper we characterize the interactions between multiple levels of independent logs, identify issues that must be considered, and describe design choices to mitigate negative behaviors in multi-log configurations.
andrewmcwatters将近 4 年前
My opinion is probably... not technically correct... until you have to deal with drive reliability and write guarantees, but I don&#x27;t think programmers actually have to know anything about SSDs in the same way that developers had to know particular things about HDDs.<p>This is out of pure speculation, but there had to be a period of time during the mass transition to SSDs that engineers said, OK, how do we get the hardware to be compatible with software that is, for the most part, expecting that hard disk drives are being used, and just behave like really fast HDDs.<p>So, there&#x27;s almost certainly some non-zero amount of code out there in the wild that is or was doing some very specific write optimized routine that one day was just performing 10 to 100 times faster, and maybe just because of the nature of software is still out there today doing that same routine.<p>I don&#x27;t know what that would look like, but my guess would be that it would have something to do with average sized write caches, and those caches look entirely different today or something.<p>And today, there&#x27;s probably some SSD specific code doing something out there now, too.
评论 #27573374 未加载
评论 #27573294 未加载
评论 #27574240 未加载
rossdavidh将近 4 年前
Interesting, and fun to read and think about! And, as a professional programmer for 17 years now, not once have I done anything where this would have been important for me to know (even if I had been running my code on a system with SSD&#x27;s). So, I&#x27;m not convinced the title is at all accurate.<p>But, fun to read and think about.
评论 #27575553 未加载
dang将近 4 年前
What someone else said about that in 2014:<p><i>What every programmer should know about solid-state drives</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=9049630" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=9049630</a> - Feb 2015 (31 comments)
评论 #27575539 未加载
FpUser将近 4 年前
It is really puzzling why &quot;every programmer&quot; should burden their already overloaded brains with this. If they&#x27;re reading&#x2F;writing some config&#x2F;data files this knowledge would not help one bit. If they&#x27;re using database then it falls to the database vendor&#x27;s to optimize for this scenario.<p>So I think that unless this &quot;every programmer&quot; is a database storage engine developer (not too many of them I guess) their only concern would be mostly - how close my SSD to that magical point where it has to be cloned and replaced before shit hits the fan.
rabuse将近 4 年前
A little off topic, but I bought a new Macbook Pro with the M1 chip with 8GB of RAM, and I&#x27;m worried about the swap usage of this machine wearing out the SSD too quickly. Is this an actual concern, as my swap has been in the multiple GB range with my use?
评论 #27573777 未加载
评论 #27576435 未加载
评论 #27574383 未加载
评论 #27573975 未加载
评论 #27575357 未加载
kortilla将近 4 年前
The title should be “why SSDs mean programmers no longer have to think about hard drives”.<p>These are all reasons SSDs are much more pleasant to work with than old platter disks.
评论 #27573788 未加载
评论 #27574757 未加载
teddyh将近 4 年前
What <i>everyone</i> should know is that flash drives can lose their data when left unpowered for as little as three months.
评论 #27574394 未加载
评论 #27573549 未加载
dataflow将近 4 年前
What&#x27;s the flash translation layer made of? Is the flash technology used for that more durable than the rest of the SSD itself? (like say MLC vs. QLC?)
评论 #27573474 未加载
评论 #27573235 未加载
riobard将近 4 年前
One thing I&#x27;m still puzzled about SSD over-provisioning, which is also mentioned by the tutorial (<a href="https:&#x2F;&#x2F;codecapsule.com&#x2F;2014&#x2F;02&#x2F;12&#x2F;coding-for-ssds-part-4-advanced-functionalities-and-internal-parallelism&#x2F;" rel="nofollow">https:&#x2F;&#x2F;codecapsule.com&#x2F;2014&#x2F;02&#x2F;12&#x2F;coding-for-ssds-part-4-ad...</a>) recommended by the article:<p>&gt; A drive can be over-provisioned simply by formatting it to a logical partition capacity smaller than the maximum physical capacity. The remaining space, invisible to the user, will still be visible and used by the SSD controller.<p>Does the controller read the partition table to decide that the space beyond logic partition is safe to use as scrap?
评论 #27575663 未加载
评论 #27575565 未加载
dan-robertson将近 4 年前
See this paper from 2017, <i>The unwritten contract of solid state drives</i>: <a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;3064176.3064187" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;3064176.3064187</a>
Agentlien将近 4 年前
This reminds me of a recent interview[0] by Digital Foundry with the Core Technology Director of Ratchet and Clank: Rift Apart.<p>Near the beginning they talk about how targeting the PlayStation 5, which has an SSD, drastically changed how they went about making the game.<p>In short, the quick data transfer meant they were CPU bound rather than disk bound and could afford to have a lot of uncompressed data streamed directly into memory with no extra processing before use.<p>[0] <a href="https:&#x2F;&#x2F;youtu.be&#x2F;-YpCQrPRpE0" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;-YpCQrPRpE0</a>
1_player将近 4 年前
A lot of talk about pages, but no mention about how big these pages are. From a quick look on Google, most SSDs have 4kB pages, with some reaching 8kB or even 16kB.
评论 #27573260 未加载
2OEH8eoCRo0将近 4 年前
&gt;Drives not Disks<p>And where did the word &quot;drive&quot; come from? I thought it referred to motors that spin the media, which SSDs also do not have.
DrNuke将近 4 年前
A number of high-level techniques help rationalize data management and transfer, but the mileage of practical implementations may vary a lot. Generally speaking, only a small number of applications really need to take care and add a further layer of abstraction, that because the best practices already codified into any widespread language do an acceptable job already.
personjerry将近 4 年前
How big is the write cache usually and how does it work? Typically I&#x27;ve seen the write caches be something like 32MB in size, but the &quot;top speed&quot; seems to be sustained for files much bigger than 32MB, which doesn&#x27;t make sense to me if that top speed is supposedly from writing to the cache. How does that work?
评论 #27573183 未加载
评论 #27573077 未加载
评论 #27573014 未加载
mikewarot将近 4 年前
If you leave un-partitioned space on the SSD, how the heck does the SSD know it is ok to erase it? Wouldn&#x27;t it be safer to partition it as an extra drive letter, format it, and then leave that drive alone? That would allow the OS to <i>trim</i> all the &quot;empty&quot; blocks.
评论 #27577169 未加载
ropeladder将近 4 年前
If sequential and random reads are mostly the same on SSDs, does that make the distinction between columnar and row-based databases&#x2F;data storage less important?
评论 #27575489 未加载
rectang将近 4 年前
I wince at the amount of wear the `git clean -dxf; npm ci` cycle must be putting on my SSD.
评论 #27574685 未加载
CoolGuySteve将近 4 年前
The claim about parallelism isn&#x27;t true. Most benchmarks and my own experience show that sequential reads are still significantly faster than random reads on most NVME drives.<p>However, random read performance is only somewhere between a 3rd to half as fast as sequential compared to a magnetic disk where it&#x27;s often 1&#x2F;10th as fast.
评论 #27573586 未加载
wly_cdgr将近 4 年前
There&#x27;s nothing whatsoever I should need to know about SSDs as a Javascript programmer and if there is then the programmers on the lower levels haven&#x27;t done their jobs right and are wasting my time
评论 #27576124 未加载
BatteryMountain将近 4 年前
So.. interesting topic. Last year I experimented with some C# + Samsung 970 Evo Plus Nvme + MessagePack (with compression) + Zfs .. to benchmark how fast I could dump objects from .net memory to disk.<p>The numbers involved was insane and I played with various scenarios, with&#x2F;without compression (MessagePack feature), with&#x2F;without typeless serializer (MessagePack feature), with&#x2F;without async and then the difference between using sync vs async and forcing disk flushes. I also weighed the difference between writing 1 fat file (append only) or millions of small files. I also checked the difference between using .net streams versus using File.WriteAllBytes (C# feature, an all-in-memory operation, good for small writes, bad for bigger files or async serialization + writing). I also played with the amount of objects involved (100K, 1M, 10M, 50M).<p>I cannot remember all the numbers involved, but I still have the code for all of it somewhere, so maybe I can write a blogpost about it. But I do remember being utttterly stunned about how fast it actually was to freeze my application state to disk and to thaw it again (the class name was Freezer :p).<p>The whole reason was, I started using Zfs and read up a bit about how it works. I also have some idea about how ssd&#x27;s work. I also have some idea how serialization works and writing to disk works (streams etc).. I also have a rough idea how mysql, postgres, sql server save their datafiles to disk and what kind of compromises they make. So one day I was just sitting being frustrated with my data access layers and it dawned on me to try and build my own storage engine for fun, so I started by generating millions of objects that sits in memory, which I then serialized with MessagePack using a Parallel.Foreach (C# feature) to a samsung 970 evo plus to see how fast it would be. It blew my mind and I still don&#x27;t trust that code enough to use it in production but it does work. Another reason why I tried it out, was because at work we have some postgres tables with 60m+ rows that are getting slow and I&#x27;m convinced we have a bad data model + too many indexes and that 60m rows are not too much (since then we&#x27;ve partitioned the hell out of it in multiple ways but that is a nightmare on its own since I still think we sliced the data the wrong way, according to my intuition and where the data has natural boundaries, time will tell who was right).<p>So I do believe there is a space in the industry where SSD&#x27;s, paired with certain file systems, using certain file sizes and chunking, will completely leave sql databases in the dust, purely by the mechanism on how each of those things work together. I haven&#x27;t put my code out in public yet and only told one other dev about it, mostly because it is basically sacrilege to go against the grain in our community and to say &quot;I&#x27;m going to write my own database engine&quot; sounds nuts even to me.
评论 #27585801 未加载
BrissyCoder将近 4 年前
Why on earth do 99.5% of programmers even need to know what SSD stands for?