Once in awhile a new study comes along that no one has done before, for whatever reason (cost, availability, data, etc.). While perusing blogs today I saw a ZDNet story from
Robin Harris that jumped off of my screen and smashed into pieces on my keyboard. I think that every sysadmin should read it.
DRAM error rates were studied at Google over a period of 30 months (yes, that’s 2 1/2 years). The study included tens of thousands of Google servers.
The results: “error rates are hundreds to thousands of times higher than thought – a mean of 3,751 correctable errors per DIMM per year.”
Rather than regurgitate the article I highly recommend that you take some time to read directly from ZDNet. It’s amazing that this study was able to take place and that the results were made public.