Benchmarking Linux RAID

My project last weekend was to build a Linux storage server for my network. Sunday, I discussed benchmarking SATA controllers under Linux. Yesterday, I discussed some considerations for using Linux software RAID. I explained why a mirrored disk drive array (RAID level 1) might best suit my needs, but I had concerns about the performance.

So, I thought I'd run some benchmarks to determine whether the mirrored configuration would be a good choice. Today, I'll discuss the results.

Here are my test conditions:

Here is the procedure I used to run the tests:

  • Bring the system to single user mode
  • Use mdadm to construct a multidevice (more in a sec)
  • Create a fresh ext3 filesystem on the full multidevice
  • Mount the filesystem
  • Run the benchmark

The mdadm command manages the Linux software RAID functions. It allows you to build a RAID configuration. It provides a device, called a multidevice, that appears to the system like a disk drive, which access the array.

For instance, here is the command I used to build a mirrored (RAID level 1) configuration:

mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda /dev/sdb

I ran three trials, each with a different array configuration.

The first trial was my control case. I built a linear array (--level=linear), which really isn't a RAID at all. It just joins the disks together so they appear as one big drive. I used this to test the overhead of the software RAID system.

The remaining trials were striped (--level=0) and mirrored (--level=1) configurations.

Here are the results:

                    ------Sequential Output------ --Sequential Input-- --Random-
                   -Per Chr-  --Block-- -Rewrite- -Per Chr-  --Block-- --Seeks--
              Size K/sec %CP  K/sec %CP K/sec %CP K/sec %CP  K/sec %CP  /sec %CP
base hd         6G 41310  96  64334  23 34373  10 46819  96  79255   7 235.4   1
linear          6G 42172  97  63289  42 33291  18 46701  97  78033  17 236.4   1
striped         6G 42018  98 109092  67 48674  30 45481  98 116921  26 339.5   1
mirrored        6G 41313  96  50917  38 29154  17 45735  95  76402  17 331.5   1

The columns that are of most interest to me are the block output (write) and block input (read) results. The results of these tests are reported as data throughput in kilobytes per second, and processor overhead reported as percent of CPU usage.

The base hd trial shows the results I reported the other day in my SATA controller tests. This is the performance of a drive connected to the controller, with a ext3 filesystem built on a 10GB partition. The block I/O test results show 63 Mbytes/sec write and 77 Mbytes/sec read performance.

The linear trial, as mentioned above, was my control case to measure the overhead of the RAID system without any actual RAID functions performed. It shows a very modest reduction (1.5%) in block data throughput, which is promising. The processor overhead, however, is noticeable. Writing uses 42% of the processor, reading 17%.

The striped trial shows that indeed a RAID 0 striped configuration greatly increased the disk performance. Somewhat surprisingly, CPU overhead goes up quite a bit too. Now writing uses 67% of the processor, reading 26%.

The mirrored trial shows that write performance did decrease fairly significantly, as expected. Block output dropped from 63 Mbytes/sec (the base hd trial) to 50 Mbytes/sec, a loss of about 20%. Read performance decreased only slightly, from 77 MBytes/sec to 75 Mbytes/sec, which is good. Somewhat surprisingly, the CPU overheat hit that was taken by the striped configuration didn't happen here. The CPU load was close to what was seen in the baseline linear test.

The benchmarks told me that if I used a mirrored configuration for my server, I could expect to see a sustained 50 Mbytes/sec block output performance (best case), with 38% processor load.

The results indicate that a simple mirrored storage array would be suitable for my needs.

A 100Mbps Ethernet network provides file transfer rates in the 5-10 Mbytes/sec range, so the mirrored array should be plenty fast enough to keep the network pipes filled.

The biggest impact is the CPU load: it will be like losing a third of a core on my dual-core system. That doesn't make me happy, but I probably can live with it. I'll just be sure not to run any compiles while my wife is doing backups.

Comments

Comments have been closed for this entry.

redundancy

For reliability, you should really put the drives on 2 different controllers when available.

Since you have 1 SATA port available on your mobo, it would be interesting to see how the performance faired with 1 drive on the cheap card and 1 on the motherboard.

That's a really good idea.

That's a really good idea. Thanks.

I think this should be easy. I ought to be able to disconnect a drive data cable from the add-on controller and just re-connect it to an on-board port. The system should scan all the disk chains at boot and find the drives to assemble the RAID.

Raid5/6

I personally run two machines with 4 drives on raid 5. I never benchmarked the drives outside of raid and was wondering, if you have another drive, what kind of performance you would get compared to the other raid levels. Although your drives would have to be on two different controllers, and mine are on one, I still think the information would be interesting to note.

Thanks, Kaji

try to use raid10,f2

There is a mirrored raid type that would give you doube the read performance of raid1 - about the same performance as raid0. it is called raid10,f2 - you could try it out:

mdadm --create /dev/md0 --level=10 -p n2 --raid-devices=2 /dev/sda /dev/sdb

It gives you the advantages both of the redundancy of raid1 and the speed of raid0

If you are suggesting RAID10

If you are suggesting RAID10 Far2, the command line should be:
mdadm --create /dev/md0 --level=10 -p f2 --raid-devices=2 /dev/sda /dev/sdb

Right? The -p n2 would make it a Near2 - which is still good, but doesnt have quite the read speed advantage of Far2.