Vadim Carter
Forum Replies Created
-
Alex, here are a couple of papers that I had saved as a reference a while ago that seem to support your assertion that RAID 6 is more reliable than RAID 10:
Microsoft Exchange Server 2007 and IBM
System Storage N series with RAID-DP Best practicesDisk failures in the real world:
What does an MTTF of 1,000,000 hours mean to you?From a purely practical perspective, however, ever increasing drive sizes result in ever increasing rebuild times. On a moderately loaded array consisting of 16x4TB spindles in a RAID 6 configuration a drive rebuild can take over 24 hours. With 48x4TB drives, a rebuild of one drive can take days. Same drive configurations in a RAID 10 arrangement only take a few hours to rebuild with a lot less I/O load on the remaining array members. If MTTR of a degraded RAID set is measured in hours vs. being measured in days, it seems that the risk of data loss is less with RAID 10, IMHO.
[Alex Gerulaitis] “Perhaps you’re talking about “silent rot”, not all incidents of data corruption? A URE is data corruption, and redundant RAIDs do protect against them, to a degree?”
Unrecoverable Read Errors “could” cause data corruption. The idea is to detect them, flag them, and re-write the data to a different block before the damage is done (using checksums with RAIDZ or scheduled parity scrubs in a RAID 5 / RAID 6 array). What happens most often is that a drive fails and a rebuild starts; while a rebuild is taking place and due to heavy i/o, a few drives start throwing UREs and if there is just one or two UREs within the same stripe (assuming RAID 6) it goes okay but as soon as there are three UREs within the same stripe – there is data corruption. I have seen it occur a few times. A disproportionately large number of data corruption cases however is due to some sort of filesystem corruption – an OS-level problem that a RAID cannot protect against.
[Alex Gerulaitis] “Are checksums a function of RAIDZ or ZFS?”
They are a ZFS feature. So, just to clarify:
RAIDZ = RAID 5 as implemented in ZFS
RAIDZ2 = RAID 6, two-dimensional parity, as implemented in ZFS
RAIDZ3 = no traditional RAID level, triple parity, as implemented in ZFSZFS also supports striping (RAID 0), mirroring (RAID 1), as well as double and triple mirrors. Checksums are supported regardless of the RAID Level being used.
[Alex Gerulaitis] “Agreed, parity calculations are expensive – yet computing power grows much, much faster than disk speeds. Perhaps at some point parity calculations on beefy systems with software RAID will get cheap enough to have a negligible effect on performance if they haven’t already? I’d be more concerned with I/O than parity overhead.”
That’s why I said “all other things being equal”. Most modern hardware already has more than enough of spare CPU capacity to do parity calculations and can have far more RAM and cache than an embedded hardware RAID controller.
Lucid Technology, Inc. / 801 West Bay Dr. Suite 465 / Largo, FL 33770
“Enterprise Data Storage for Everyone!”
Ph.: 727-487-2430
https://www.lucidti.com -
It is true that RAID 6 can sustain a loss of any two drives in a set. RAID 10, however, can lose more than two drives and still remain operational. This is because RAID 10 stripes on top of multiple mirrored pairs of disks and for as long as there is at least one good disk in each mirrored pair, RAID 10 will stay operational.
I do have all the nitty-gritty details and formulas laying around somewhere which explain mathematically which RAID level is more reliable. If my memory is correct, RAID 10 is deemed slightly more reliable than RAID 6.
Earlier threads had a statement about minimizing chances of data corruption. To anyone reading this, make no mistake about it, RAID does not protect against data corruption. Furthermore, as disk drives get larger in capacity, the probability of silent data corruption increases no matter what RAID Level is used. This is why RAIDZ is an excellent choice where data integrity is paramount, each block of data is checksummed and the checksum is then written to a separate area of the disk with a pointer to the original data block. The pointer itself is also checksummed. When a data block is accessed, its checksum is checked and verified thus guaranteeing immediate corruption detection.
One last thing in regard to performance, all other things being equal, RAID 10 will always outperform RAID 6. The reason is simple – parity calculation is an “expensive” operation. Calculating it twice is even more “expensive”.
Lucid Technology, Inc. / 801 West Bay Dr. Suite 465 / Largo, FL 33770
“Enterprise Data Storage for Everyone!”
Ph.: 727-487-2430
https://www.lucidti.com