RAID level reliability

Storage & Archiving

RAID level reliability

Alex Gerulaitis replied 12 years, 10 months ago 7 Members · 25 Replies

Vadim Carter
May 24, 2013 at 5:12 am

Alex, here are a couple of papers that I had saved as a reference a while ago that seem to support your assertion that RAID 6 is more reliable than RAID 10:

Microsoft Exchange Server 2007 and IBM
System Storage N series with RAID-DP Best practices

Disk failures in the real world:
What does an MTTF of 1,000,000 hours mean to you?

From a purely practical perspective, however, ever increasing drive sizes result in ever increasing rebuild times. On a moderately loaded array consisting of 16x4TB spindles in a RAID 6 configuration a drive rebuild can take over 24 hours. With 48x4TB drives, a rebuild of one drive can take days. Same drive configurations in a RAID 10 arrangement only take a few hours to rebuild with a lot less I/O load on the remaining array members. If MTTR of a degraded RAID set is measured in hours vs. being measured in days, it seems that the risk of data loss is less with RAID 10, IMHO.

[Alex Gerulaitis] “Perhaps you’re talking about “silent rot”, not all incidents of data corruption? A URE is data corruption, and redundant RAIDs do protect against them, to a degree?”

Unrecoverable Read Errors “could” cause data corruption. The idea is to detect them, flag them, and re-write the data to a different block before the damage is done (using checksums with RAIDZ or scheduled parity scrubs in a RAID 5 / RAID 6 array). What happens most often is that a drive fails and a rebuild starts; while a rebuild is taking place and due to heavy i/o, a few drives start throwing UREs and if there is just one or two UREs within the same stripe (assuming RAID 6) it goes okay but as soon as there are three UREs within the same stripe – there is data corruption. I have seen it occur a few times. A disproportionately large number of data corruption cases however is due to some sort of filesystem corruption – an OS-level problem that a RAID cannot protect against.

[Alex Gerulaitis] “Are checksums a function of RAIDZ or ZFS?”

They are a ZFS feature. So, just to clarify:

RAIDZ = RAID 5 as implemented in ZFS
RAIDZ2 = RAID 6, two-dimensional parity, as implemented in ZFS
RAIDZ3 = no traditional RAID level, triple parity, as implemented in ZFS

ZFS also supports striping (RAID 0), mirroring (RAID 1), as well as double and triple mirrors. Checksums are supported regardless of the RAID Level being used.

[Alex Gerulaitis] “Agreed, parity calculations are expensive – yet computing power grows much, much faster than disk speeds. Perhaps at some point parity calculations on beefy systems with software RAID will get cheap enough to have a negligible effect on performance if they haven’t already? I’d be more concerned with I/O than parity overhead.”

That’s why I said “all other things being equal”. Most modern hardware already has more than enough of spare CPU capacity to do parity calculations and can have far more RAM and cache than an embedded hardware RAID controller.

Lucid Technology, Inc. / 801 West Bay Dr. Suite 465 / Largo, FL 33770
“Enterprise Data Storage for Everyone!”
Ph.: 727-487-2430
https://www.lucidti.com
Alex Gerulaitis
May 24, 2013 at 7:59 pm

[Vadim Carter] “Alex, here are a couple of papers that I had saved as a reference a while ago that seem to support your assertion that RAID 6 is more reliable than RAID 10”

Thanks Vadim. Couldn’t quite grasp the math on the first attempt, but I’ll try again, perhaps with a double espresso this time.

[Vadim Carter] “With 48x4TB drives, a rebuild of one drive can take days.”

I am hoping nobody does those really wide parity groups without understanding the implications. Personally, I’d do RAID60 on 48-wide group.

[Vadim Carter] “Same drive configurations in a RAID 10 arrangement only take a few hours to rebuild with a lot less I/O load on the remaining array members.”

No contest your honor, but there’s still that pesky URE probability during rebuilds, that can only be addressed by using triple mirroring in RAID10. So where short rebuild times and/or stable high performance are required (vs. higher reliability), RAID10 is a viable option.

[Vadim Carter] “as soon as there are three UREs within the same stripe – there is data corruption. I have seen it occur a few times.”

That’s pretty amazing (that you’ve seen it) given how low the chances of that are based on published URE rates.

Thanks again Vadim.
Vadim Carter
May 25, 2013 at 1:02 am

[Alex Gerulaitis] “I am hoping nobody does those really wide parity groups without understanding the implications. Personally, I’d do RAID60 on 48-wide group.”

You’d be surprised, Alex, how many installations do have those very large RAID sets… I concur – RAID 60 is the right way to do it.

I have briefly looked at the spreadsheet you found and it looks intriguing. I am not an Excel ninja and I cannot attest to the accuracy of the formulas being used. I’ll play with the numbers and try to trace the logic behind all the variables being used. It is certainly a good find and we can build on that 🙂 Thanks, Alex.

Lucid Technology, Inc. / 801 West Bay Dr. Suite 465 / Largo, FL 33770
“Enterprise Data Storage for Everyone!”
Ph.: 727-487-2430
https://www.lucidti.com
Eric Hansen
May 27, 2013 at 6:06 pm

David said “But to quote someone I know, “I’ve never had drives fail faster than I can replace them one at a time.” If you have RAID5, and a drive fails, and you replace it same day, you will not lose data.”

I HAVE lost a drive while a drive was rebuilding in a RAID5 set. After that near heart attack (ATTO tech support helped me recover the RAID) I only do RAID6. When I do have a drive failure and a rebuild begins, I try to get the editors to not use that volume if at all possible. it speeds up the rebuild and reduces the chance of corruption.

I hadn’t thought of RAID60 for larger sets. None of my current volumes goes over 16 drives. But I’m gonna have to look into that very soon for some new 10GbE buildouts. I need more speed!

I am also very much looking forward to ZFS. the ability to expand a volume without reformatting is another awesome feature.

I agree with Bob that RAID level is only one variable in creating a reliable system. You have to look at all single points of failure. this is the main thing I hate about ethernet-based SANs vs FC-based SANs. The Xsans i built had multiple MDCs, multiple switches, multi-path FC controllers, 2 FC connections to every client, multiple power supplies on everything, etc. But the scope of this thread is limited to RAID level, so i digress.

e

Eric Hansen
Production Workflow Designer / Consultant / Colorist / DIT
https://www.erichansen.tv
Alex Gerulaitis
May 27, 2013 at 8:01 pm

[Eric Hansen] “the ability to expand a volume without reformatting is another awesome feature.”

I’ve done this with RAID5 and RAID6 volumes. Am I missing something? (Thought ZFS’s advantages were centered around resiliency, performance, integration with RAID-Z.)

[Eric Hansen] “this is the main thing I hate about ethernet-based SANs vs FC-based SANs”

Eric, would you elaborate on this?

My understanding was that layer 3 Ethernet switches could do auto-failover, something that’s not part of FC where failover is handled primarily on the driver or even ASIC level – but is not part of the “intelligence” of the protocol.
Eric Hansen
May 27, 2013 at 8:47 pm

[Alex Gerulaitis] “[Eric Hansen] “this is the main thing I hate about ethernet-based SANs vs FC-based SANs”

Eric, would you elaborate on this?

My understanding was that layer 3 Ethernet switches could do auto-failover, something that’s not part of FC where failover is handled primarily on the driver or even ASIC level – but is not part of the “intelligence” of the protocol.”

Alex, you definitely trump me here, and i don’t want to hijack the thread. from what I understand, FC can failover because OS X can understand multi-pathing over FC (i might be getting the terminology way wrong here). if you have a RAID with redundant FC controllers going into multiple FC switches (for redundancy), the client computer can understand that. it won’t mount the volume twice even though it can “see” 2 paths to the same volume. if one path disappears, the client system won’t freak out.

I’ve actually never had an ethernet switch fail over its useful life, so i’m not too worried about that. AFP-based Ethernet systems use a single server (no MDC), ethernet cards, and direct attached storage (through a single card). if anything in that chain fails (the server, the SAS card in the server, the ethernet card, the SAS expander on the SAS RAID), the client loses connection to the storage. with FC, all of these things can be redundant if you set them up that way.

if i need to work on the AFP server, i can’t just make it failover to a backup and then take it offline like I could with Xsan.

so this is more NAS vs a true FC SAN setup. I prefer the latter for 24/7 critical systems and it’s too bad that the growth of shared storage systems has been primarily on the NAS side than the SAN side.

Eric Hansen
Production Workflow Designer / Consultant / Colorist / DIT
https://www.erichansen.tv
Alex Gerulaitis
May 27, 2013 at 9:50 pm

[Eric Hansen] “Alex, you definitely trump me here, and i don’t want to hijack the thread. from what I understand, FC can failover because OS X can understand multi-pathing over FC”

I am a n00b in FC or Ethernet failover, all I did was a little reading and a couple of training sessions on a high-end iSCSI based SAN. So this is a very useful discussion for me.

No worries about thread hi-jacking – with Vadim’s help, RAID6 emerged as the winner in terms of reliability. So the thread is more or less done with. Appreciate your concern though – perhaps fork this into a new thread? Something like, “Ethernet vs. FC – which is more resilient?”
Vadim Carter
May 30, 2013 at 2:29 am

“[Eric Hansen] “the ability to expand a volume without reformatting is another awesome feature.”

[Alex Gerulaitis] I’ve done this with RAID5 and RAID6 volumes. Am I missing something? (Thought ZFS’s advantages were centered around resiliency, performance, integration with RAID-Z.)”

I’ll try to explain and put my 2c in.

A traditional hardware controller based RAID array will typically allow RAID5 or RAID6 expansion by restriping a RAID set to include any newly added drive(s) (an inherently dangerous operation). The end result is an increase in the RAID Set size, however, the filesystem size that is residing on the newly expanded RAID set is still the same. There are three options at this point: 1. “Stretch” the existing filesystem over this newly added space, if your OS supports this; or 2. Create a second partition on this newly added space, format, and mount; or 3. Delete everything, reformat, create new filesystem, mount.

ZFS is a filesystem and a volume manager “all in one” so to speak. Adding additional drives is a very simple operation. There is no need to wait for ZFS RAID Set to restripe. There is no reformatting involved. There is no risk to the data. It just works. Like magic.

Lucid Technology, Inc. / 801 West Bay Dr. Suite 465 / Largo, FL 33770
“Enterprise Data Storage for Everyone!”
Ph.: 727-487-2430
https://www.lucidti.com
Alex Gerulaitis
May 31, 2013 at 12:45 am

[Vadim Carter] “There is no need to wait for ZFS RAID Set to restripe. There is no reformatting involved. There is no risk to the data. It just works. Like magic.”

I could see how “no reformatting” works (legacy RAID expansions also don’t require reformatting) but no re-striping? Say, you added another eight drives to an existing set of eight in RAID-Z – how would the performance (transfer rates) of existing files improve w/o re-striping?
Vadim Carter
May 31, 2013 at 2:15 am

[Alex Gerulaitis] “I could see how “no reformatting” works (legacy RAID expansions also don’t require reformatting) but no re-striping? Say, you added another eight drives to an existing set of eight in RAID-Z – how would the performance (transfer rates) of existing files improve w/o re-striping?”

This is another cool thing about ZFS – the whole concept of storage pools. ZFS storage pools can span multiple vdevs (virtual devices), and vdevs themselves consist of block devices, e.g. hard drives or partitions on these hard drives. So, what would happen in the example you gave above is that a second RAID-Z vdev will be created of eight drives and the original RAID-Z vdev and the new RAID-Z vdev will be pooled.

Lucid Technology, Inc. / 801 West Bay Dr. Suite 465 / Largo, FL 33770
“Enterprise Data Storage for Everyone!”
Ph.: 727-487-2430
https://www.lucidti.com

Page 2 of 3

← 1 2 3 →

Reply to this Discussion! Login or Sign Up

Creative Communities of the World Forums