DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Topic: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk? (Read 10717 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #100 – 2023-08-22 15:26:34

Quote from: cid42 on 2023-08-04 22:35:48

It's such a small amount of chunky data that I wouldn't bother with a hot backup like raid. Just have a copy on both PC's then a third copy on an external drive as the cold backup. Use the external drive to copy between PC's if you have to, personally I would get a cheap router just for a LAN of the PC's (or you could even get away with periodic direct ethernet connection between two PC's if you don't mind potential self-flagellation).

Whoa best advice thanks

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #101 – 2023-08-27 03:03:41

Quote from: ktf on 2023-08-17 09:33:52

RAID scrubbing will only catch errors where the underlying system knows that something went wrong. So, the scrubbing process reads data, harddisk returns read error, scrubbing process overwrites bad copy with good copy. However, if there is a discrepancy in data, a RAID controller will not know which copy is the good one.
ZFS and btrfs store checksums. So, they can verify those checksums to know which one is the good one. This potentially catches more errors.

Though someone may have already answered this question sufficiently, if both zfs and btrfs provide scrubbing, why might either file system be preferable? Can one of them do even more than the other to detect, correct and/or prevent errors?

Also, is either file system less likely to crash for one or more reasons?

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #102 – 2023-08-27 04:30:45

Quote from: rutra80 on 2023-08-16 02:34:54

There is multitude of checksums on many levels. Don't you trust hard drive sector level checksums? FLAC checksums? EXT checksums? NTFS checksums? RAID5/6 checksums? Why? They will all scream when there's a bit rot.

Okay, great, but how can a super newbie set up a QNap or Synology NAS to 1.) Always run checksums, at least at useful intervals (which would be how often?) ; 2.) Flag me whenever an error is detected. 3.) Show me how to correct the error?

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #103 – 2023-09-20 04:50:32

A good video I stumbled upon for anyone who comes across this topic.

https://youtu.be/l55GfAwa8RI?si=haOI4bv7TQ_wkzaq

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #104 – 2023-09-20 08:40:57

Quote from: Replica9000 on 2023-09-20 04:50:32

A good video I stumbled upon for anyone who comes across this topic.

Either he's missed the point or I have. RAID was never for that. RAID is the means to swap out a failed HDD with minimal or no interruption to service, and (as an afterthought) achieve higher read transfer rates by paralleling the read across the redundancy. Bit rot was never in the mix, if that's even a "thing".

It will take a lot to persuade me bit rot is a "thing" when it comes to spinny disks. There's a CRC for every sector, and if the CRC fails on read (or write verification) the read (or write) is retried and logged. Unrecoverable errors fail the whole transfer, and do not just return a sector of data with individual bit errors. The RAID then reports a failed disk and uses the redundancy to maintain data availability.

Where there could be bit errors is in the communications through the data paths, or bit-flips in SSDs. I presume there is some kind of means of detection in SSDs (although I don't know for sure – I grew up with HDDs when the controller was a separate card!), but data paths commonly don't even have parity. It's left to the OS to verify that data was delivered accurately.

Ditto a computer's RAM. There was parity RAM at one time, but it was relatively expensive and now I don't think there is any validation on the data stored in RAM. If a bit flips in data which is on its way to the HDD, it will be stored as if it were accurate (along with CRC etc) – the HDD has no way to know the data is in error, and neither does the RAID. The RAID acts in good faith.

None of this amounts to a can of beans. If it were a problem of any significance at all, the protections would be built in. If there is a problem, it is in archival data where a write data error might not get noticed for years or decades (so it is essential to have a verify cycle when writing the archive), and the archive media may suffer from degradation (so it is essential to have multiple archives, and refresh the archives on a routine basis).

It's a lot of work, and a lot of cost. Accumulate enough data, and you'll spend your whole time maintaining data. The alternative is to just accept that once in a blue moon you'll lose a file, and unless you're under contract not to lose files it probably doesn't matter that much.

Notice