[mdlug] ECC RAM failure data - jre
Dan Pritts
danno at umich.edu
Thu Feb 26 12:04:30 EST 2009
On Thu, Feb 26, 2009 at 03:25:48AM -0800, john_re wrote:
> Do you use ECC RAM? Do you have any data about failure rates?
Most servers use ECC memory. to me it seems like a wise investment.
I don't use it on desktops, nor do i use it on my home server (just
a desktop motherboard in a tower case).
it's unfortunately a lot more expensive. most peoples' pcs don't
crash because of it, so manufacturers aren't motivated to spend the
extra money; if everybody did it would be a minor investment but
without the economies of scale it adds a lot to the cost of a
motherboard and the memory.
To be honest, I've never seen any reporting from the OS about ecc
failure/correction rates, but i've never gone looking either.
It appears that support is available in linux 2.6.16:
http://bluesmoke.sourceforge.net/
I just played with it a little on a test server but it wasn't
installed in my kernel at boot time, so it hasn't found any errors
(yet). I'll try to remember to look in a few days to see if it
finds anything.
I've had plenty of problems with desktop & laptop PCs over the years
that turned out to be related to bad RAM, but these issues were not
a single bit-flip, but widespread problems. I've learned to always
memtest a system when i add new ram (although the last time i
actually *found* bad ram was by trial and error on a crashy powerpc
mac).
danno
ps - note that there are two features typically used by server
memory, ECC and buffering. generally you see ram with both buffering
and ECC but i am pretty sure i've seen non-buffered ECC out there.
Make sure that you buy the right stuff for whatever mobo you buy.
http://en.wikipedia.org/wiki/Fully_Buffered_DIMM
More information about the mdlug
mailing list