[mdlug] ECC RAM failure data - jre

Dan Pritts danno at umich.edu
Thu Feb 26 12:04:30 EST 2009


On Thu, Feb 26, 2009 at 03:25:48AM -0800, john_re wrote:
> Do you use ECC RAM? Do you have any data about failure rates?

Most servers use ECC memory.  to me it seems like a wise investment.
I don't use it on desktops, nor do i use it on my home server (just
a desktop motherboard in a tower case).  

it's unfortunately a lot more expensive.  most peoples' pcs don't
crash because of it, so manufacturers aren't motivated to spend the
extra money; if everybody did it would be a minor investment but
without the economies of scale it adds a lot to the cost of a
motherboard and the memory.

To be honest, I've never seen any reporting from the OS about ecc
failure/correction rates, but i've never gone looking either.

It appears that support is available in linux 2.6.16:

  http://bluesmoke.sourceforge.net/

I just played with it a little on a test server but it wasn't
installed in my kernel at boot time, so it hasn't found any errors
(yet).  I'll try to remember to look in a few days to see if it
finds anything.

I've had plenty of problems with desktop & laptop PCs over the years
that turned out to be related to bad RAM, but these issues were not
a single bit-flip, but widespread problems.  I've learned to always
memtest a system when i add new ram (although the last time i
actually *found* bad ram was by trial and error on a crashy powerpc
mac).

danno

ps - note that there are two features typically used by server
memory, ECC and buffering.  generally you see ram with both buffering
and ECC but i am pretty sure i've seen non-buffered ECC out there.
Make sure that you buy the right stuff for whatever mobo you buy.
 
   http://en.wikipedia.org/wiki/Fully_Buffered_DIMM




More information about the mdlug mailing list