TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Why use ECC?

188 点作者 benkuhn超过 9 年前

14 条评论

morelikeborelax超过 9 年前
I&#x27;ve had ECC on my workstations since 2006 when Intel forced it on DP systems due to FB-DIMMs. Being that I have had a good number of correctable memory errors over the past decade I don&#x27;t feel I can go back to not having it, wouldn&#x27;t make any sense when the cost is so low. It was also a case that I couldn&#x27;t get non-ECC 8GB and 16GB DIMMs when I built some of my systems, so it just was a matter of fact that I had to use it.<p>Do people need it? Nah probably now for systems that can handle crashing, but you&#x27;d be nuts to not use it in servers or systems running long jobs - it&#x27;s just a single insurance payment on your system that gives a remote chance of protection.<p>Sadly there are very few studies in to it that show how modern DIMMs still get errors that are correctable. Manufacturing processes are much better, but they haven&#x27;t eliminated the need for it.<p>I do enjoy the &quot;why would I need it, I&#x27;ve never had memory errors&quot; attitude though when people likely have no idea why their application or OS crashed. And the accounts of people who eventually diagnose memory errors after weeks of random crashes which would have been reported immediately if they had ECC.
评论 #10638794 未加载
评论 #10639951 未加载
codinghorror超过 9 年前
I just want to be clear that in the original referenced article, I am not anti-ECC per se, I just found myself caught in the massive cognitive dissonance between &quot;you must have ECC in all your computers otherwise they will constantly and silently corrupt your data + crash&quot; and &quot;statistically speaking, most computers in the world do not use ECC&quot;. How can both of these things be true?<p>The argument for ECC is credible (I personally think rowhammer is the best example of this actually mattering, but ironically a) you can rowhammer ECC memory just fine and b) DDR4 has hardware features to mitigate rowhammer -- which shows how quickly things are changing), but it also seems to hinge on whether you have hundreds to thousands of computers all working together, e.g. the positive effects of ECC only seem to matter enough statistically at a _very_ large scale.
评论 #10638945 未加载
评论 #10638759 未加载
评论 #10639925 未加载
评论 #10640291 未加载
评论 #10638792 未加载
orf超过 9 年前
&gt; Alternately, it might be a plan to create literal cloud computing.<p>Thanks, I just snorted tea onto my keyboard after reading that. Seems like if you have the money and are maintaining a pretty critical system it would be silly not to get ECC RAM (and if you&#x27;re building your own iron the price difference isn&#x27;t that much as far as I can tell).<p>On the EC2 site they say &quot;In our experience, ECC memory is necessary for server infrastructure, and all the hardware underlying Amazon EC2 uses ECC memory&quot;[1]. Amazon maintain a <i>lot</i> of servers and if they think it&#x27;s necessary I&#x27;m inclined to believe them.<p>1. <a href="https:&#x2F;&#x2F;aws.amazon.com&#x2F;ec2&#x2F;faqs&#x2F;" rel="nofollow">https:&#x2F;&#x2F;aws.amazon.com&#x2F;ec2&#x2F;faqs&#x2F;</a>
评论 #10638865 未加载
melted超过 9 年前
I think ECC is inevitable. With 128GB DIMMs being produced now and NV-DIMMs (DDR4 flash-on-dimm) just around the corner, some kind of hardware error detection is necessary.<p>This is similar to high capacity spinning drives. With smaller ones you could just go with RAID5 and not worry about anything, but when drives are 3-4TB and up, you have to use RAID6, because the spec error rate becomes too high to rely on a single parity drive.<p>Here, too, when you machine has 1-2TB of hybrid RAM&#x2F;NVM you HAVE TO have some way to detect failures, even if it&#x27;s not particularly good. Performance characteristics of RAM preclude the more robust algorithms such as wide (32bit) CRC from being used (narrower CRC could still be doable in hardware, though), but parity is a complete no brainer as the first step.
评论 #10640318 未加载
评论 #10640187 未加载
mmagin超过 9 年前
&quot;If ECC were actually important, it would be used everywhere and not just servers.&quot;<p>Ha. I wish I could get laptops with ECC RAM.
评论 #10638741 未加载
评论 #10640115 未加载
评论 #10640145 未加载
Spooky23超过 9 年前
Atwood&#x27;s original article was puzzling to me, and the conclusions just didn&#x27;t compute.<p>I can recall at least a half dozen times when I was a DBA in olden times that ECC either corrected was essential in the isolation of faults on my Informix and later Oracle boxes, running mostly on Sun and RS&#x2F;6000 at the time.<p>Sun had a nice habit of shipping defective CPUs and memory in the late 90s. The details are foggy, but I remember correlating ECC faults to long transactions that would fail, and getting a bunch of stuff out of Sun.<p>Than again, that was 15+ years ago, so maybe the newfangled memory we have these days is more reliable.
评论 #10639755 未加载
评论 #10639687 未加载
rando289超过 9 年前
I looked at the atwood article, which really didn&#x27;t give useful numbers, except the one citation. I opened that, found the fit number, converted to ~ .5 errors per year and thought, eh, skipping ecc is fine.<p>I didn&#x27;t notice this: &quot;From the graph above, we can see that a fault can easily cause hundreds or thousands of per month.&quot; Now I want ecc again.
评论 #10638856 未加载
hrez超过 9 年前
Speaking of Google&#x27;s shipping containers. Sun had that too- <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Sun_Modular_Datacenter" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Sun_Modular_Datacenter</a><p>&quot;A data center of up to 280 servers can be rapidly deployed by shipping the container&quot;
aidenn0超过 9 年前
The last time I built a system with ECC it was impossible to tell which cpu&#x2F;motherboard combos supported ECC; the time before that was when AMD had super-affordable systems with ECC support, but I gave up trying to figure out if AMD even supported ECC on their workstation parts.
caycep超过 9 年前
The problem is, there is also additional hidden costs to using ECC, mostly due to what is available in the ecosystem. Namely - I want ECC. Great, then I need to get to an X99 board with xeon chips running at higher TDPs. Depending on the case I use, then I would need to upgrade PSU and fans&#x2F;coolers.<p>Or, (especially in the mini-ITX world), I can choose one of the server boards or &quot;workstation&quot; C236 boards. Then I would lose official desktop windows support, or lose a m.2 SSD slot, or onboard sound, or other &quot;desktop workstation&quot; features.<p>It is still not easy to do ECC in this day and age...
brandon272超过 9 年前
From my point of view it boils down to how critical stability and uptime within any given environment is. I&#x27;m not going to wring my hands over whether or not the hardware powering my todo list startup includes ECC RAM. I would wring my hands over it if it I were building a system for military, healthcare, or critical infrastructure applications, to use a few examples.
评论 #10639547 未加载
santaclaus超过 9 年前
Is it possible to get some of the benefits of ECC without ECC by serializing out the entire state of a program at some set rate? For example, one could read in the n&#x27;th serialized checkpoint, run the program to the n+1&#x27;th checkpoint, and compare the original n+1&#x27;th checkpoint to the new n+1&#x27;th checkpoint. If these differ, cosmic rays flipped a bit in the interim. This would, of course, break if the code itself doesn&#x27;t guarantee bit compatible results over multiple runs (due to the use of certain parallel algorithms, etc). I suppose this would double the runtime, however...
评论 #10639140 未加载
shin_lao超过 9 年前
ECC is a requirement for servers when we validate installations of our database product.<p>Basically the take away is that without ECC you can expect a memory error every two days on machines with a lot of RAM.
Confiks超过 9 年前
Is there someone else who consistently reads &#x27;Elliptic Curve Cryptography&#x27; and is disappointed by the subject of the article?
评论 #10639327 未加载