Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?
Reply #104 – 2023-09-20 08:40:57
A good video I stumbled upon for anyone who comes across this topic. Either he's missed the point or I have. RAID was never for that. RAID is the means to swap out a failed HDD with minimal or no interruption to service, and (as an afterthought) achieve higher read transfer rates by paralleling the read across the redundancy. Bit rot was never in the mix, if that's even a "thing". It will take a lot to persuade me bit rot is a "thing" when it comes to spinny disks. There's a CRC for every sector, and if the CRC fails on read (or write verification) the read (or write) is retried and logged. Unrecoverable errors fail the whole transfer, and do not just return a sector of data with individual bit errors. The RAID then reports a failed disk and uses the redundancy to maintain data availability. Where there could be bit errors is in the communications through the data paths, or bit-flips in SSDs. I presume there is some kind of means of detection in SSDs (although I don't know for sure – I grew up with HDDs when the controller was a separate card!), but data paths commonly don't even have parity. It's left to the OS to verify that data was delivered accurately. Ditto a computer's RAM. There was parity RAM at one time, but it was relatively expensive and now I don't think there is any validation on the data stored in RAM. If a bit flips in data which is on its way to the HDD, it will be stored as if it were accurate (along with CRC etc) – the HDD has no way to know the data is in error, and neither does the RAID. The RAID acts in good faith. None of this amounts to a can of beans. If it were a problem of any significance at all, the protections would be built in. If there is a problem, it is in archival data where a write data error might not get noticed for years or decades (so it is essential to have a verify cycle when writing the archive), and the archive media may suffer from degradation (so it is essential to have multiple archives, and refresh the archives on a routine basis). It's a lot of work, and a lot of cost. Accumulate enough data, and you'll spend your whole time maintaining data. The alternative is to just accept that once in a blue moon you'll lose a file, and unless you're under contract not to lose files it probably doesn't matter that much.