WD Green drives in an Areca RAID

Storage & Archiving

WD Green drives in an Areca RAID

Steve Wang replied 12 years, 8 months ago 10 Members · 29 Replies

Murray North
June 19, 2013 at 2:36 am

Oh and the switch is
HP PROCURVE 2910AL-48G SWITCH 48-port 10/ 100/ 1000 basic L3 fixed port switch
Frank Gothmann
June 19, 2013 at 8:49 pm

Please explain the setup a bit more. The dropped frames occour where? On your machine (which, I assume has the storage attached, ie. is the host) or on the machines accessing the storage via 1GB Ethernet?
While green drives are very likely to drop out of a raid sooner or later, their performance should be good enough for a couple of Prores streams, especially in larger raid setups.
What are you connecting to with the 10GB card? Have you checked your client’s network bandwith? Are you running Qmaster on the same network? That’s where I would start digging. Jumbo frames on?

——
“You also agree that you will not use these products for… the development, design, manufacture or production of nuclear, missiles, or chemical or biological weapons.”
iTunes End User Licence Agreement
Bob Zelin
June 19, 2013 at 9:58 pm

Frank writes –
Please explain the setup a bit more.

REPLY – I have asked Murray what switch he is using, but he has not replied. Does he have an HP 2910al ProCurve like EditShare uses, or is it some piece of crap ? Who knows.

Frank writes –
What are you connecting to with the 10GB card?
REPLY – he has already stated that he has a Small Tree 10GbE card in his server computer. Now, is it configured correctly. Does he have the correct driver? Should we assume that it is going to the SFP+ port on the HP ProCurve (is it a ProCurve, and does he even have an SFP+ – maybe it’s an SFP !). Has he checked the CLI to even see if he is getting the proper communication ?

Frank writes –
Have you checked your client’s network bandwith?
REPLY – does he know how to do this ?
Frank writes – Are you running Qmaster on the same network? That’s where I would start digging.
REPLY – very good point.
Frank writes – Jumbo frames on?
REPLY – if he doesn’t know how to configure the switch, and it’s at MTU1500 by default, then even if Jumbo is on, it’s doing nothing.

He bought a Small Tree 10GbE card – why didn’t he rely on Small Tree to provide a solution for him ? Frank, if you’ve been doing this for a while, you know the answer. He got a DEAL on the HP switch. It’s probably used. In his research he probably spoke with Small Tree and passed out when they told him what they wanted to help him (and you would have been too expensive too, Frank). So of course, there are lots of variables, and from this brief description, the WD Green drives are an obvious first possible issue, but there could be countless issues, and you have just pointed out in your post.

This is what happens when you try to do it yourself. Close – but no cigar.

Bob Zelin

Bob Zelin
Rescue 1, Inc.
maxavid@cfl.rr.com
Murray North
June 20, 2013 at 12:37 am

Thanks for the responses.

I didn’t want to get too bogged in the network setup as my hunch is that it is a problem with the drives. Once I have eliminated the drives as a problem, if any of you have the patience I will outline my setup and see if I can find any problems there, but at this stage I don’t want to waste anyones time.

Thanks for those suggestions Frank, after reading extensively through these forums however, I feel as though I have ticked all those boxes unfortunately.

For your interest the switch is a HP PROCURVE 2910AL-48G SWITCH 48-port 10/ 100/ 1000 basic L3 fixed port switch.

Currently, however, I am just keen on working out whether the drives are providing and insurmountable hurdle to my shared storage system here, and whether maybe wdidle3 or something else can magically save the day.

Will get back soon, thanks all 🙂
Bob Zelin
June 20, 2013 at 12:24 pm

you have an excellent switch. This is the same switch that EditShare often uses for their systems (they also use the Fujitsu XG0224). So, it’s unlikely that you have an issue with your switch. And of course, the Small Tree 10GbE card is excellent as well.

Bob Zelin

Bob Zelin
Rescue 1, Inc.
maxavid@cfl.rr.com
Chris Murphy
August 19, 2013 at 1:24 am

The symptoms you describe could be network or drive related. While you have good network hardware, the cabling quality is unknown and is a top source of network problems at even much lower speeds than 10GigE which is even more finicky about not following the IEEE rules on cable lengths, bend radii, proximity to ballasts, and other electrical equipment, pinching, pressure (think stapling cables to a wall or a table length squishing one, that’s BAD). You should be able to isolate this by putting a single USB drive on the server and just doing some basic (large) file copies with curl or rync, while checking the performance in iotop or equivalent to see if you’re getting source read stalls from the array. Or if only happens over the network. If it’s sourced at the array then you’ll need to figure out why that’s happening, and it wouldn’t surprise me if it’s bad sectors on new green drives.

WDC explicitly proscribes the use of Green, Blue, and Black drives in anything other than raid1 or raid0. The Red’s are proscribed in arrays comprised of more than 4 disks. The RE’s are recommended for 5+ drive arrays.
Murray North
August 19, 2013 at 1:52 am

Thanks for the input Chris. We addressed any cabling issued very early in the piece and had high grade cat6 installed to eliminate the chance of this being a problem.
Just to sign off on this whole saga, we discovered that the problem wasn’t just random dropped frames, it was specifically FCP having problems as soon as it had to access media from a different raid array from the one it was looking at currently. That is, if you were playing back media from only raid A or only raid B, it would be fine, but if you had a sequence with media on raid A, and the playhead approached media from drive B, final cut would drop frames and potentially not recover for 5 or 10 seconds at worst.
We’ve installed a new raid box, media managed all the important media onto it, and now the editors work exclusively from it, and it works capably. To anyone who says green drives can’t work in a raid, and work reliably and well, respectfully, you are wrong. That isn’t to say that we aren’t buying red drives from here on in, but whatever the problem is, and it may still be a green drive thing, it can still work in certain circumstances.
Thanks everyone for their help though, and I hope that this can be of help to someone else. Feel free to message me if you are having similar problems.
Chris Murphy
August 19, 2013 at 2:24 am

Yeah, about the green drives and raid thing. I don’t know that anyone said you can’t do it. Just that use in raid isn’t recommended, including by WDC. In fact WDC says these drives are for secondary usage, implying they don’t recommend them for boot drives either. I don’t see the point in arguing with a manufacturer who is basically saying in a marketing data sheet “we really don’t want your money for your intended use case.”

Further, it’s just a matter of time before there will be problems with these drives. Forums everywhere are full of such stories of raid5’s collapsing when green drives are used. The common sequence is: one drive dies or takes too long in error recovery for the controller, controller kicks out the drive or resets the bus, user replaces the bad drive (which may or may not be bad) and then all it takes is a single bad sector to cause either another kicked drive, bus reset, or an actual sector read error. In all of those scenarios the raid5 rebuild halts, and it’s no longer merely degraded it has collapsed. So then people freak out because only one drive died and this isn’t supposed to happen, blah blah blah.

The real problem with the drive is that the ERC is too long, and it can’t be configured with any of the recent Greens. If you want to play with fire, set the controller error time out so that it’s at least 121 seconds to give the drive enough time to actually report a read error, so that the bad sector is repaired by the raid controller. And also do regular scrubs.

Of course, in the meantime, your application must be able to gracefully contend with up to 2 minute hangs while the drive sorts out whether or not the data on that sector can be read or recovered. Many applications get pretty pissy (let alone the user) when there’s an IO delay of 30 seconds, let alone 2 minutes.

And it’s not like it’s a whole lot better on the Seagate consumer side, where they now have in their marketing spec sheet under Reliablity, a 2400 hour power-on spec. That’s 100 days at 24×7. A Google or Amazon, if they were even to use such a drive, would bust through that spec on day 101, and exceed it by a factor of 7 before the warranty was up.

There’s no good reason for these companies to honor warranties at all when drives are used in situations that are plainly proscribed.
Murray North
August 19, 2013 at 3:38 am

Sorry Chris, my post wasn’t a stab at you. I think somewhere earlier in the thread someone had said that it was impossible and the worst idea ever, so it was more a nod to that. At any rate, you clearly have more drive and RAID knowledge than me, and i’m sure everything you say is true. But all I can say is that for better or worse, we have a few 100TBs of green drive raids, and we follow every precaution we can, regular verifies and the rest of it, and to date we have lost no data. I haven’t had to deal with random hangs, and rebuilds go without issue. I might just be very lucky, but I am just reporting what has happened to me.
Thanks for the tips though 🙂
Chris Murphy
August 19, 2013 at 4:22 am

I didn’t take it as a stab, and even if it did I’m fairly immune. I think the issue you’re likely to see is marginally bad sectors creeping in that aren’t detected during normal or scrub operations because the drive firmware is designed to mask such problems. The point at which they’re unrecoverable is when the drive times out and finally reports a read error. And only on a read error can the controller rebuild the affected chunk from parity and cause the bad sector to be overwritten, at which point the firmware will determine if the bad sector is transient (it just needed to be rewritten) or if it’s persistent, and if it is persistent then the firmware will remap the LBA(s) to a reserve sector and write the data. The ability for parity raid to “self-heal” in normal read operations and in scrubs in the described manner is thwarted with Green drives. The use case and design goal are incongruent, and that is what technically nullifies the warranty.

With ~300TB of Green drives I think you’ll see untimely collapse of an array rather than the normal degrade and rebuild. With this many drives the proper drive from WDC is the Se. Even the Red is limited to 5 drives per array so if you’re over that, technically they could deny warranty because of the use case and design goals aren’t compatible. They say this rather plainly on the marketing spec sheet.

So that’s the extra long version of what “impossible” and “worst idea ever” probably translate into.

Also note that the WDC Blue and Black also are not meant for anything other than raid0 or 1. The first applicable drive is the Red, but that’s for 5 or fewer disk arrays. More than that and it’s the Se.

Page 2 of 3

← 1 2 3 →

Reply to this Discussion! Login or Sign Up

Creative Communities of the World Forums