You need to consider a couple of factors in determining what is the “right” configuration:
– What performance do you need. You really should do some math and figure out what data you’re going to want to support. Look at the data rate of the video streams and the number of streams you’re going to work with simultaneously.
– How much data protection (if any) do you want? Depending on the number of disks and RAID level chosen, you can survive a different number of drive failures before you lose data
– How much space do you need? Different RAID levels result in different amounts of usable space.
– How much money do you want to spend?
Now for a bit more theory and general comments:
So what does RAID do for us? You really get 3 things from RAID:
– Better performance. This comes from striping data across multiple drives. A given disk can handle only so much throughput — both in terms of data rate and in the number of IOPS. By spreading your data across multiple drives, you get increase performance since each drive only has to service a fraction of the total IO work.
– Increase reliability. This is done by either mirroring or parity. This essentially makes a second copy of your data across your array (or at least the ability to recreate your data). Of course you lose “usable” disk space when doing this.
– Larger volume size. You may need a volume larger than a disk, so you need to combine several disks into one to give you a big enough volume.
What are the different RAID levels and what do they mean in practical terms? (See the wikipedia RAID article for more details)
– RIAD 0: This is a simple stripe across all of your disks. Like David said, for a given number of disks, this gives you the most performance since all disks are contributing to the IO work load. Of course there is redundancy here, so the loss of a single disk kills the entire RAID set.
– RAID 1: This is simple mirroring of disks. Two disks are (essentially) exact copies of each other. When you write to the volume, each disk writes the same thing. You only get half of the raw space as usable for storage with mirroring.
– RAID 5: Striped parity (see wikipedia for details). With RAID5, you lose essentially 1 disk worth of storage for parity. A big performance caveat of RAID5 is the parity calculation. Any write you make requires parity to be calculated in addition to performing the actual IO operation. Modifying data also incurs additional performance overhead in that you have to read all of the data in the stripe being modified to perform a new parity calculation. Then you have to write the new data and new parity. You need to make sure your RAID controller (for hardware RAID) can keep up with this.
– RAID 6: Similar to RAID 5, but with two different parity blocks. This is a common solution to recovery issues with RAID5 arrays (see below).
– RAID 10: This combines RAID 1 and 0 (also called RAID 1+0). In this scenario you take pairs of disks and mirror (RAID1) them and then create stripe (RAID0) across the mirrored pairs. This gets you the best performance and redundancy at the cost of space. Depending on how nice Mr Murphy is, you could lose half of the disks (just not two in a mirrored pair) and have no data loss.
In practical terms, you really only see RAID5, 6 and RAID10 for significant data storage with RAID1 hanging around for boot/OS/System drives.
What happens when a disk dies?
– RAID 5: You keep running, but with degraded performance. Whenever you have to a block of data that was on the failed disk, you have to recreate that data from parity resulting in some performance degradation. When you replace the failed disk and rebuild the volume, all of the data on all of the disks must be read to either populate the new disk (recalc the parity or recreate the data). This can take a long time and has a performance hit across the entire array. The heavy work placed on the remaining drives in the array can help Mr Murphy appear and give you a second failure during the recovery resulting in loss of data.
– RAID 6: Similar to RAID5, but you have a second parity bit and can suffer 2 disk failures before data loss vs the 1 disk in RAID5.
– RAID 10: When you lose a disk, you keep running without any degradation in performance. The rebuild only requires reading from 1 other disk and no parity or other calculations, resulting in a faster recovery. The performance of the array during the rebuild is only for data on the mirror set being rebuilt — you may have IO contention on that one disk since it’s being read for the rebuild.
My preferences (take as you like):
– I don’t like RAID5 with disks bigger than 1 TB. Too many vendors and storage experts I work with consider the double failure too likely. I also don’t like that the performance degradation during recovery and long recovery times. Of course I see a lot of people on these forums happily using RAID5 and I see a lot of video specific storage vendors proposing it, so I may just be overly conservative.
– My preference is RAID 10. I get the best performance and recovery and disks are relatively cheap, so total volume isn’t so much of an issue.