Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk? (Read 11144 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #50
I would have to agree with john33, keep it simple.

I have about 450GB of FLAC\MP3s.

I use a 4 bay external enclosure filled with old 2Tb HDs and one portable 2Tb enclosure.

Weekly I use Syncback free to make a copy with verification of my local music to each drive in the external enclosure so i have 4 copies of the data and then turn off the external enclosure. Monthly i use Syncback free to copy with verification of my local music to the portable 2Tb enclosure which i keep at work.

I had a drive or 2 die in my 4 bay extrernal enclosure but I just need to replace it with any drive big enough to store the data and run the syncback free backup profile again.
Who are you and how did you get in here ?
I'm a locksmith, I'm a locksmith.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #51
Mired in the intricacies seems like accurate term... ZFS is way overshot here, it's a resource hog and all that is needed here is simple NAS with weeny ARM, 4GB RAM, and RAID5. Today it's more probable to get ones AV collection damaged by meteorite than bit rot or ECCless RAM errors.
Is that a fact? How many here-plus countless other users who likely ran Windows for everything else they needed-had initially grabbed onto zfs because they weren't concerned with bit rot? Peace of mind is an exceedingly rare and precious thing, especially these days, and it's clear that most here say that zfs is very much worth using for that reason and others to protect data.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #52
Rocket science says that reliability is a function of simpleness. Using ZFS for simple AV storage is like using atomic clock for boiling your eggs soft.
There is multitude of checksums on many levels. Don't you trust hard drive sector level checksums? FLAC checksums? EXT checksums? NTFS checksums? RAID5/6 checksums? Why? They will all scream when there's a bit rot.
Yet it's OK if hard drive's cache is not an ECC RAM? And you trust SATA bus checksums? PCIe checksums? Ethernet/wifi checksums?
Why the fixation on RAM and file system checksums?

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #53
Rocket science says that reliability is a function of simpleness. Using ZFS for simple AV storage is like using atomic clock for boiling your eggs soft. There is multitude of checksums on many levels. Don't you trust hard drive sector level checksums? FLAC checksums? EXT checksums? NTFS checksums? RAID5/6 checksums? Why? They will all scream when there's a bit rot. Why the fixation on RAM and file system checksums?
IF this weren't an "audio enthusiast" forum, you may or may still not have a point, but unless there's a show of hands to the contrary, I'd rather err on the side of caution. In any case, talk about screaming, like I haven't been screaming about my admitted ignorance with 80% of everything technical here. I once knew a few things about using check sums to verify data against bit errors, but forgot it all. That, and the most elementary things about RAID 0 through 10, most of which I also forgot is all I really know about file systems beyond using Windows Explorer; way too busy working with others at diyaudio.com on a speaker build, my day job, preparing meals, getting at least adequate sleep and gym time.

But am I wrong or is another HUGE advantage of zfs that it can be set to run any and all such check sums, alert you if there are errors and even repair the damage to files-and do it all automatically?

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #54
IF this weren't an "audio enthusiast" forum, you may or may still not have a point, but unless there's a show of hands to the contrary, I'd rather err on the side of caution. In any case, talk about screaming, like I haven't been screaming about my admitted ignorance with 80% of everything technical here. I once knew a few things about using check sums to verify data against bit errors, but forgot it all. That, and the most elementary things about RAID 0 through 10, most of which I also forgot is all I really know about file systems beyond using Windows Explorer; way too busy working with others at diyaudio.com on a speaker build, my day job, preparing meals, getting at least adequate sleep and gym time.

But am I wrong or is another HUGE advantage of zfs that it can be set to run any and all such check sums, alert you if there are errors and even repair the damage to files-and do it all automatically?

I'm not that sophisticated with my media back ups.  I use a combination of multiple devices and optical discs complete with hash lists on each disc and device.  Works out just fine, can be a little time consuming and tedious but it works well.  No fancy ZFS or RAIDs or NAS here. 

You're overthinking it a little too much and very unwilling to try anything else that involves a different OS or a command line.  Get yourself a NAS or two or better yet a bunch of external USB drives and duplicate everything between them, plenty of ways to easily hash a bunch of files and all you ever have to do is check it periodically (like once every couple of months if that) while you're checking your external drives' health.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #55
Or just offload the whole nause of data management to the cloud, with a backup at home.  The encryption used on cloud storage means that if a bit flips the decryption will fail.  If my Internet wasn't so slow I would be doing just that.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #56
But am I wrong or is another HUGE advantage of zfs that it can be set to run any and all such check sums, alert you if there are errors and even repair the damage to files-and do it all automatically?
Huge advantage for certain applications.
But you are not to operate a huge server park for something that could fit on a single drive.

View it a bit more pragmatically:
* Rip logs and tags (do not underestimate the dirty job of maintaing tags!) can be stored in a free cloud account or three.
* Audio in a checksummed format. Then if a bit flips on one drive, you will know which one is correct.
* ISOs - make a checksum file that can be verified periodically.
* Disk crashes are more of a concern than single bit corruption. User errors are more of a concern than single bit corruption.
* And at worst, in case of crash: ripping 180 DVDs/BDs is much less of a job than 3100 CDs, so make an extra backup or two for the CD rips. And an extra for the tags, did I say that was a dirty job?

You can take these extra measures because you know what kind of data you will store. The cloud admins at Amazon cannot do that, since they have to store anything and everything.

Then ask yourself: what else do you need?
Hundred percent uptime? (Likely not a top priority - if something needs to be retrieved from an off-site backup, I would happily wait days for that!) A dedicated computer running full-time to scan your drives for errors? (Maybe that will be neat, if you anyway need a separate box then why cannot it do its job?) A maintainance plan to replace drives? (Uh, I have to admit myself, it would be done by just buying bigger drives when they became less expensive, and so there would always be two copies on fairly recent drives.)

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #57
And wtf is "resilvering"? Searched [ What is Resilvering a hard drive?  ] Hits there compared it to a ZFS "scrub";

Resilvering is the same as scrubbing. It involves reading every bit on the array to make sure the mirror / redundancy / whatever copies still agrees with each other. Doing that helps you catch bit flips when you can still correct them. It's a play on words: re-silvering a mirror!

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #58
Although checking for errors (the scrub) and repairing errors might appear to be two sides of the same coin, at least on a file system smart enough to know, it isn't accurate to equate them. Resilvering is - as you point out about the wordplay - about rebuilding the "mirror" so that it "mirrors".

That is typically done from a new drive (a glass without silver mirrors nothing!). If you know that this is the new drive, you can in a RAID1/RAID5 single protected setup assign "this is empty" and fill it by calculating the bits from the rest of the arrray.
Now if the system has no checksumming no nothing except a parity check, it wouldn't be able to tell which one was wrong in case they don't XOR out correctly. But assigning "this is wrong!" (and "this is empty" means it is wrong!) you can still resilver.

Smarter systems have more protection, making the distinction more opaque ... in a good way.
(That was the over-simplified version.)

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #59
But am I wrong or is another HUGE advantage of zfs that it can be set to run any and all such check sums, alert you if there are errors and even repair the damage to files-and do it all automatically?

By default, ZFS will scrub the data every two weeks.  The status command will tell you if and what errors were found.  If a file is corrupted, and there's another copy of the data, ZFS will automatically fix it.  If there's no other copies, ZFS doesn't (easily) give you your file back.  This is a good thing so you're not unknowingly copying corrupted files over good files on your backups.

Say you have 350GB of flac files and you have 1TB+ of storage.  You can simply tell ZFS to save 2 (or even 3) copies of everything.  If one copy gets corrupted, ZFS fixes it with the good copy.  Downside is you use double the storage, but with a single disk, there's no RAID configuration to deal with and you copy your files to your external backup knowing they're not corrupted.

FLAC has checksums, but afaik, this isn't checked during normal payback.  I have a purposely corrupted flac file that plays just fine.  It's not until I try to decode that file to WAV does the flac program tell me it's corrupted.  If this corruption occurs on a different filesystem, I'd probably never know until it was too late.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #60
ZFS is cool but what are the advantages for AV storage over RAID5/6 with scrubbing that will run on a potato?

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #61
RAID scrubbing will only catch errors where the underlying system knows that something went wrong. So, the scrubbing process reads data, harddisk returns read error, scrubbing process overwrites bad copy with good copy. However, if there is a discrepancy in data, a RAID controller will not know which copy is the good one.

ZFS and btrfs store checksums. So, they can verify those checksums to know which one is the good one. This potentially catches more errors.
Music: sounds arranged such that they construct feelings.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #62
@nmxny24

Also, you can use apps that support Reed-Solomon error correction. It's very robust algorithm and it can reconstruct damaged files up to given percentage (that is chosen at file creation).
Very nice ICE-ECC v2.7 works under windows (maybe on linux under wine but I didn't check). It's old but works flawlessly. Also WinRAR supports it. I find them very useful.
ICE-ECC can create error correction file per folder or per single file.
I've tested extensively both and they also support header damage reconstruction.


lame --abr 288 -f --lowpass 17 (+ mp3gain@92 dB)

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #63
RAID scrubbing will only catch errors where the underlying system knows that something went wrong. So, the scrubbing process reads data, harddisk returns read error, scrubbing process overwrites bad copy with good copy. However, if there is a discrepancy in data, a RAID controller will not know which copy is the good one.

ZFS and btrfs store checksums. So, they can verify those checksums to know which one is the good one. This potentially catches more errors.
In case of RAID5 or RAID6 respectively 1/3 or 1/2 of capacity is for checksums spread over at least 3 or 4 disks - it inherently detects errors and rebuilds data.
As for Reed-Solomon - it already is used on many levels of hardware.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #64
In case of RAID5 or RAID6 respectively 1/3 or 1/2 of capacity is for checksums spread over at least 3 or 4 disks - it inherently detects errors and rebuilds data.
No, it does not. RAID5/6 have parity data, no checksums. It can restore data when it knows it is bad, but when it does not know which of the 2 or 3 disks has bad data, it cannot fix it.

You can set-up ZFS or btrfs as RAID0/1/5/6, but it will have checksums in addition to that. With the addition of these checksums, it can check for bad data and restore it. Of course, this is indeed assuming the error correction code of the hard disk has failed. This might seem improbable, but it has happened to me in the past more than once, so I wouldn't rely on it.

I myself run an array with two disks of a different manufacturer, with btrfs managing the two volumes mirrored (so like RAID 1), for high availability. The array makes snapshots every day, so a file can quickly be restored in case of user error. This array is scrubbed once a month. An incremental backup is stored on DVD-9 daily, as protection for ransomware, as it is non-rewritable. That copy is to get access to files quickly in the event of a failure. Data is also backed up to the cloud keeping past copies, again as protection for ransomware and when the room the server is in burns down or something like that. Restoring takes longer of course.

So, really, a proper backup is layered, as a single backup medium cannot facilitate all possible failure modes.
Music: sounds arranged such that they construct feelings.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #65
RAID5 can restore data when 2/3 stays intact and RAID6 when 1/2 stays intact, just like with RAID1 with at least 3 disks it is quite obvious which is bad when there is a bit flip on one and the other 2 stay the same. QNAP has scrubbing mechanism for RAID5/6 to handle exactly that.
This is all about budget and level of risk for ones' projects that makes one comfortable. Medical grade? Military grade? You can add more and more layers of protection to add more zeroes to that 0,001% risk but 100% protection doesn't exist. ZFS is for deduplication, caching, handling many virtual machines and a lot of storage. In the context of this thread where topic starter just wants to store his small AV collection without much technical ado, I find anything above scrubbed RAID5/6 with backups harmful... But each to their own of course.

EDIT: I see that Synology has both btrfs and RAID scrubbing support so it covers all fronts.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #66
RAID5 can restore data when 2/3 stays intact
... provided the wrong one is identified, and you need to know that. If you flip one bit in one random drive, it can detect but not restore - it knows that at least one of three bits is wrong, but not which one.

Without further checks(umming), RAID1, 4 and 5 work under the presumption that each bit is in one of three states:
0 and correct
1 and correct
<unavailable>

By unavailable I mean you know that this is the (only!) one that cannot be trusted. Say if the device simply cannot be reached because the drive has died. Then you can replace it and calculate by summing up the rest of the bits (modulo 2).

If none of them are unavailable, then the single-bit parity can detect a single-bit error, but not correct it. That takes more.

Edit: Simple examples follow. In all cases you have 2 bits to actually store, in 3 bits where one is parity protection. You know they shall sum to 0 modulo 2 (that is, to even number 0 or 2).
Example 0 (for 0 bits wrong): Bits are 1, 1, 0. Evaluated as OK.
Example 1 (1 bit wrong): Bits on drive are 1, 0, 0. Detected as error, but you have no idea whether the correct was 0, 0, 0 or 1, 1, 0 or 1, 0, 1.
Example 2 (2 bits wrong): Bits are 0, 0, 0 but should have been 0, 1, 1. Evaluated as OK even if it isn't - because two bits are wrong. Because it evaluates as OK, there is no way to correct.
Example 3 (all three bits are wrong): Detected as error, but there is no way to know for sure that all bits were wrong (which would make for correction!) or just one of the three cases of "one bit wrong".
You also see that there is no way for the parity bit to distinguish "correct" from "two bits wrong" - nor "one bit wrong" from "three bits wrong".
Now suppose in Example 0 that drive #2 crashes and gets "unavailable" status. You jerk it out, assign "drive number 2 needs to be filled" and insert a new one. Since the system now knows that the second bit is not to be trusted and should be calculated from the others, it solves the equation 1 + x + 0 = 0 MOD 2, i.e. x=1, and writes it. Order restored.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #67
RAID5 might be vulnerable to that, assuming that by blind luck such bit also passes HDD sector level checksums, basic checksums of every filesystem, and that final data format has no error correction whatsoever... Highly improbable but OK.
On such level of paranoia it might be worthwhile to use RAID1 with at least 3 disks, RAID6, or RAID5 with btrfs.
But please don't forget that no checksum is perfect - there always is certain amount of different data that will generate the same checksum. Again it's all about the level where "highly improbable" changes to "impossible" for you.

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #68
and how many people actually test their backups? backing up corrupt data happens too

just saying
Quis custodiet ipsos custodes?  ;~)

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #69
RAID5 might be vulnerable to that, assuming that by blind luck such bit also passes HDD sector level checksums, basic checksums of every filesystem,
That's why you do the job to make sure you have any or all of those checks.

and that final data format has no error correction whatsoever... Highly improbable but OK.
Final format ... what audio format has error correction? You got error mitigation in all sane formats (looking at you, ALAC (and TTA, if anyone cares)) by muting off a corrupted frame, but that isn't "correction".


... again: the biggest causes of data loss are, in no particular order,
* Complete hard drive failure. Yep, a RAID protects against that (then hurry resilver before the second breaks, I once had two in the same week)
* WTF DID I JUST DO?! (I am sure I had more of those in a single week as well.) RAID does not protect against that.

and how many people actually test their backups? backing up corrupt data happens too
And, hard drive dying where you stored it ...

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #70
For 350GB of data, am I the only one who thinks this thread has lost the plot?!?!?!?

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #71
☝🏻


Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #73
For 350GB of data
As of now, but then to be stored as per the OP's information: an additional 3100 CDs + some music downloads (some of which in high resolution), 120 DVDs and 60 Blu-Rays.
Yet still enough to fit one drive!

Re: DAS: Which RAID Configuration to Minimize Bit Rot Error Risk?

Reply #74
For 350GB of data
As of now, but then to be stored as per the OP's information: an additional 3100 CDs + some music downloads (some of which in high resolution), 120 DVDs and 60 Blu-Rays.
Yet still enough to fit one drive!

I had forgotten that, but I also have 2.7TB of movies and 3TB of TV episodes on DAS and multiple external drives. If you are retaining the original discs, I really don't see the problem. For non-critical data, it just seems somewhat OCD to me, but each to his/her own. ;)