Chris Murphy
Forum Replies Created
-
Jan 23 10:06:54 edits-mac-pro kernel[0]: disk4: I/O error.
Is disk4 AppleRAID, or is it a member disk? Try this command if you don’t know:
diskutil list -
Go to the Console application, click system.log, click Clear Display, reproduce the problem, note and post what’s newly added in Console.
Go to Disk Utility, click on the drive icon with model number etc. (not the name of the volume), and click the Info button in the toolbar. Scroll down to the SMART section and copy-paste what’s there.
-
Chris Murphy
January 11, 2014 at 11:39 pm in reply to: Sharing a thunderbolt raid drive between two new Mac ProsRight. Without IP involved, I don’t see how this works with two computers taking control of a single HFS+ file system.
-
Media Patrol requires translation, it seems like a marketing term to me. I’ll guess it’s either a selective or extended self-test via SMART. Default On means it’s happening, but how often and what time of day? If it’s a SMART test, it slightly reduces drive performance so it might be something you’d rather want done on demand if the schedule can’t be specified.
Redundancy check sounds like a scrub/verify. I’m not sure what the options are, I’d rather just have a mismatch count, rather than it either stopping or fixing. I also don’t know what autofix means exactly – if it ignores disk read errors or fixes them; or if it overwrites parities in the event of a mismatch, or what. There are multiple potential problems, each with different fixes possible.
Synchronization sounds like it’ll read data chunks, recompute parities, and overwrite existing parities. You wouldn’t normally do this without a reason. For example if a redundancy check shows mismatches have spiked since the last time you ran it, you’d want to power down, check all connections, maybe even reseat the drives if they’re in a backplane. Power back up. And do a file system check/repair – of the full variety[1]. Now you can rebuild parities, which will also reset any mismatch counts. If you were to immediately do another redundancy check there should be no errors/mismatches at all. If there are, then there’s almost certainly a hardware problem and it needs to be found.
[1]
Disk Warrior if you want GUI only, or check the btree rebuild options under the -R option in “man fsck_hfs”. Disk Utility does not rebuild any of the btrees. I’d use the 2nd form, -fy -f first, then separately rebuild each btree with -Rc, -Re, -Ra. This could take a while. And it only fixes file system metadata. Actual data files are untouched, and the file system itself knows nothing of the underlying raid so that’s not fixed either. However, if the raid is in bad shape, file system checks will be in bad shape and likely not repairable. If it doesn’t need repairing or is readily fixable, then at least the data chunks for the file system are likely consistent. -
Alex Gerulaitis: If we’re not looking at a possibility of simultaneous and independent corruption of two data blocks within one stripe…
Right, this would be a problem because two corruptions in the same stripe can’t be treated the same as two missing (read error, or failed drives). For failures, the exact affected chunks are known. For corruptions it has to be deduced and with two corruptions that’s difficult at best, and in the category of specialized data recovery.
Alex Gerulaitis: If one parity chunk is corrupted, you could always re-create it from the other parity chunk, and the data?
It could just recompute P+Q and overwrite – no need for a reconstruct. But before that, how was the corruption determined and isolated? Standard in parity raid implementations is a parity verification (scrub check or read-only scrub) but all this does is report mismatches. That is, the new parities don’t match existing. So that just says there’s a problem.
There might be proprietary implementations that go to the effort of deducing whether it’s D or P or Q that are corrupt, but for a one off corruption rather than whole disk? Seems doubtful by virtue there are still raid6 corruptions in sufficient quantity that the industry has developed alternative solutions to avoid or mitigate them: T10 DIF/DIX (now called PI), ZFS, Btrfs, and ReFS.
Alex Gerulaitis: Isn’t 6 equivalent to a 3-way RAID1 for the purpose of data recovery? One copy is corrupted, you’re not sure which one is good out of three – just check which ones match, assume those ones are good, discard the mismatching one?
It’s maybe worse with raid1 because the vote is an arbitrary decision. It seems like it makes sense to go with the majority, 2 vs 1. But in reality you’ll get sufficiently arbitrary results that it doesn’t really fix the problem.
Alex Gerulaitis: Applying that to 6: calculate P and Q again from the data chunks, compare them to existing P and Q; whichever one mismatches – re-write it, and Bob’s your uncle? If the data chunks were corrupted, then P and Q would still be healthy and you could re-create data from them, supposedly?
Sure it’s possible. I don’t know any implementations that do this, but that proves nothing. It seems like a lot of additional code, testing and maintenance of that code, ensuing greater risk for bugs and additional corruptions, for a small use case that should only rarely be a problem with the kind of hardware we’re talking about. Again, raid6 is about enabling recovery from specific known missing chunks, not deducing what’s still present but maybe wrong. The use cases where some corruption is a big problem, there are better solutions for this than raid6.
-
One drawback of sane UI is that by not showing esoteric settings, users have no way of knowing if they’re being handled on their behalf or not. I don’t see anything in here that could be set flat out wrong enough that it would explain corruption. The write back cache flush of up to 12 seconds could mean a rather spectacular amount of corruption if there were also a power loss during heavy write. However the user manual for the R6 says the write back cache is battery backed, so that ought to mean once power is reapplied, the contents of the cache are written to the drives on power-up. For there to be no data loss or corruption requires drive write caches are disabled. That way anything sent to the drives is committed (in theory) and anything in the controller write cache is preserved until power is restored. This doesn’t mean there will be zero corruption but it’s significantly reduced.
-
EricBowen: Likely the Parity is corrupted from bad blocks in the drive in those areas would be the most likely probability to me.
Why only parity? Drives know nothing about RAID, they won’t discriminate between data and parity chunks. Bad blocks have a much greater chance of corrupting data chunks, simply because there are more of them. If you’re really seeing parity chunks corrupted more often than the ratio of parity to data disks, that sounds like raid firmware bugs to me.
Bad blocks that are not detected or corrected by drive ECC is quite rare. When it happens, t’s not something parity raid knows about about in normal operation. The usual case is the drive’s ECC detects error and corrects it without informing the controller; another possibility is detection without correction while informing the controller with a read error. That read error includes the LBA of the bad sector so the controller knows what data needs to be reconstructed from parity, and then it sends a copy up to the application layer as if nothing has happened, and causes a copy to be written to that same (bad) LBA. Then it’s up to the drive firmware to determine if merely overwriting the sector fixes the problem, or if it’s a persistent write failure it will remove that physical sector from use by dereferencing, the LBA and data get assigned to a reserve sector. Once that happens, the old sector isn’t accessible with general purpose commands (it has no LBA).
Anyway, nothing else in the storage stack discriminates between data and parity. So if you mean to indicate a high instance of parity corruption (either single or dual parities) compared to data corruption, that sounds like firmware problems. And to the contrary, it’s not unheard of.
An Analysis of Data Corruption in the Storage Stack
EricBowen: As to how the parity information is correct you would have to ask Intel or LSI. I just watch the logs that report from the web management consoles or error logs. If it says fixed then I assume it means it was fixed.
It may very well be there are proprietary implementations that the manufacturer’s won’t talk about.
EricBowen: I am not miss configuring raid 5’s when I create them nor am I failing to schedule parity checks.
I take your work for it. But then, well before raid5 unravels, scrubbing would reveal mismatches. Mismatches aren’t normal or OK. In small amounts, it can represent silent data corruption, which again raid6 doesn’t mitigate. Anything more than this indicates a problem.
EricBowen: I am talking raid 5’s with enterprise drives only which include the timeout recovery option in the firmware. They are still unraveling and until I see greater reliability to rebuild with corruption or handle corruption than raid 6 I wont suggest it over 6.
OK but again raid6 isn’t meant to handle corruption, it’s meant to handle two drive failures. Corruption≠failure. There is an expected redundancy improvement in raid6 over raid5 in the use case it’s designed for, which is protection from an additional drive failure while still partially degraded from a one disk failure. Here’s a presentation from NetApp.
Parity Lost and Parity Regained
Corruption occurs outside of drives a significant percent of the time, before it even gets to the raid controller, which will promptly write corrupt data and correct parities for that corrupt data to disk (short of additional corruptions).
Are Disks the Dominant Contributor for Storage Failures?
You’ve got enterprise drives: the mid-range enterprise SATA drives spec an order of magnitude fewer unrecoverable errors than consumer drives, and enterprise SAS yet another order of magnitude fewer. And on top of that regular scrubs, which should spot mismatches before they become problems. Yet you’re reporting significantly higher raid5 implosions compared to raid6? This is unexpected but without logs, maybe even debug logs, it may remain obscure what the cause is.
-
How does it manifest? And do you get any relevant console messages on either server or client?
-
Haha, you know, I find the proper math is often surprisingly helpful!
4000GB @ 130MB/s does indeed translate into ~8.5 hours to fully write as a single block device if its sequential write performance is maximized. And raid1/10 (and 0+1) can do that. Parity raid will be slower, but how much slower greatly depends on the controller, and some even have settings that affect the trade off between array responsiveness while degraded vs rebuild performance. It’s probably within +20% for a 1x raid6 failure. Uncertain how much slower 2x failures will be.
I think Kevin is fine choosing raid6, but still needs to look at the 1x fail raid6 performance, and if he can tolerate that performance level for ~10-12 hours. It’s also worth looking at the 2x fail performance as well just to be aware of how long it’s going to take. I still think he can skip the hot spare.
EricBowen:Many of those factors described why Raid 5 fails are not controllable by the client even with the Raid Controller management.
Sure, there are products that neither set things up correctly, nor expose the settings so that the user might have a chance of doing so themselves. But papering over these mistakes with raid6 isn’t correct either. While it’s reasonable to say, only the bottom line matters, it doesn’t change the fact that wrong raid5 and raid6 configurations can fail for identical reasons – reasons that neither raid5 nor raid6 were designed to mitigate.
EricBowen:Raid 6 is able to repair the corrupt parity data which I have seen done where as raid 5 has not been able to for medium to severe occurrences.
How is the parity corrupted? What enables raid6 to either detect or correct it? And is this raid detectable corruption also detected by the drive ECC?
The raid6 I’m referring to is the commonly available, non-checksumming, P+Q parity based on Galois field algebra. There are no checksums for data or parity chunks. There’s no way for this implementation to directly detect or correct corruption in data or parity chunks, nor is it designed to. It defers to drive and controller ECC, which do use checksumming, and it’s the drive ECC that detects and corrects corruption. This kind of raid6 cannot detect or correct for silent data corruption. During normal operation, only data chunks are read, and so long as the drive doesn’t report a read error, the raid doesn’t question the veracity of the data. If the drive reports a read error due to ECC detection of error but inability to correct it, then the affected data chunk is reconstructed from parity, and the data propagates up to the application layer and is also written back to the device that previously reported the read error. The overwrite of the affected sector(s) fixes the problem, either by successful overwrite of the physical sector or the drive firmware remaps to a reserve sector if there’s a persistent write failure for that sector.
There are obviously more data chunks than there are parity chunks. If either Q or P parity chunks are being corrupted somehow, then absolutely data chunks are being corrupted. And again in normal operation, parity chunks aren’t consulted. If the drive doesn’t report a read error, corrupted data chunks propagate to the application layer undetected or corrected.
So why is it that raid5 instances are having so many unravelings? Because they’re configured wrong. If they’re configured correctly, the incidence of bogus ejection of drives as faulty for being unresponsive goes to essentially zero. Bad sectors are corrected on the fly, and also during normally scheduled scrubs. Yes, of course, there still could be two legitimate drive failures at the same time, and mitigating that possibility is why we have raid6. But dual drive failures are still rare in the raid sizes discussed, compared to single drive failure with a subsequent bad sector causing a 2nd disk ejection and hence array collapse.
-
If silent data corruption is a real problem needing mitigation, then we can’t consider non-checksumming parity raid6 qualified to deal with that. That realm is for ZFS, Btrfs, ReFS, and PI.
Parity isn’t a checksum, so any disagreement between a data and parity chunk, even when two parity chunks agree in a mismatch with a data chunk, is still ambiguous. Hence the “write hole” applies to raid6 every bit as much as to raid5 and raid1. It’s an assumption to defer to two agreeing Q and P chunks against a data chunk, and in fact this is a wrong strategy because this very thing can happen in a power failure where data chunks were correctly written but parity chunks were not, they have their old values and therefore still agree.
Further, in normal operation, parity chunks aren’t even consulted. So the system doesn’t know about any mismatches. That’s why regular scrubs are important. Most cases of raid5 total collapse despite only a single drive failure, is due to wrong setups. The wrong drives were spec’d, scrubs weren’t scheduled, the drive and controller timeouts weren’t set correctly. Bad sectors end up not being fixed. A drive fails, rebuild commences, and a bad sector is encountered, but since we’re degraded there’s no parity to rebuild from and the array collapses even though there’s only been a single drive failure. So the way around this is go with raid6 rather than fix the underlying sources of the problem.
Now, there’s no question that drive sizes are growing more quickly than drive performance. Therefore rebuild times are going up a ton and that’s why it’s sane to recommend raid6, because a 2nd drive could die during rebuild. But corruption mitigation isn’t the use case. Raid is intended to defer the checksumming/corruption mitigation to the hardware, the drive does actually write checksums to each sector, and its ECC is designed to detect and correct problems. If it can’t, it should report a read error and then the raid can do something about that in normal operation.