星期日, 12月 17, 2006

Linux server memory check

Posted by nixcraft in Linux, Troubleshooting, Sys admin, Tips

If your server crashes regularly it could be a buggy kernel, a driver, power supply or any other hardware part. Memory (RAM) is one of the critical server parts. Bad memory can cause various problems such as random Linux server restart or program segfaults.

Generally, I recommend using memtester command. It is an effective userspace tester for stress-testing the memory subsystem. It is very effective at finding intermittent and non deterministic faults under Linux.

Recently Rahul shah email me another interesting method for testing memory. His idea is based upon md5 checksum and dd command.

First find out memory site using free command.
$ free
Output:

 total       used       free     shared    buffers     cached
Mem: 768304 555616 212688 0 22012 270996
-/+ buffers/cache: 262608 505696
Swap: 979956 0 979956

In above example my server has 768304K memory. Now use dd command as follows:
$ dd if=/dev/urandom bs=768304 of=/tmp/memtest count=1050
$ md5sum /tmp/memtest; md5sum /tmp/memtest; md5sum /tmp/memtest

According to him if the checksums do not match, you have faulty memory guaranteed. Read dd command man page to understand all options. dd will create /tmp/memtest file. It will cache data in memory by filling up all memory during read operation. Using md5sum command you are reading same data from memory (as it was cached).

Look like a good hack to me. However I still recommend using memtester userland program. Another option is to use memtest86 program ISO. Download ISO, burn the same on a CD, reboot your system with it test it (it may take more time). From project home page:
Memtest86 is thorough, stand alone memory test for x86 architecture computers. BIOS based memory tests are a quick, cursory check and often miss many of the failures that are detected by Memtest86.