[mdlug] ECC RAM failure data - jre

Aaron Kulkis akulkis00 at gmail.com
Thu Feb 26 06:53:13 EST 2009


john_re wrote:
> Do you use ECC RAM? Do you have any data about failure rates?
> 
> I'm evaluating this for a system with 8GB DRAM, &
> http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Errors_and_error_correction
> says
> "Tests[ecc]give widely varying error rates, but about 10-12upset/bit-hr
> is typical, roughly one bit error, per month, per gigabyte of memory.
> 

Wow... that seems extremely high to me... for circuitry that's
supposed to be driving transistors back and forth between full
cut-off and and such high saturation that the transistor behaves
like a diode... I find it amazing that an error rate this high
is tolerated within any segment of the industry.

> In most computers used for serious scientific or financial computing and
> as servers, ECC is the rule rather than the exception, as can be seen by
> examining manufacturers' specifications."

Absolutely.  A single bit error can/will produce errors, any
of which is going to cost somebody $$$$.

> 
> 
> So, for that data 8GB DRAM is about 8 errors per month, ie about
> one per 3-4 days.
> 
> What rates do you have?



More information about the mdlug mailing list