Skip to main content
Topic: Protecting audio files from bit rot? (Read 8044 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: Protecting audio files from bit rot?

Reply #50
With MP3 there is no checksum embedded. 

FWIW, this thread says foobar can also check MP3 frame CRCs:

https://hydrogenaud.io/index.php/topic,68536.0.html

This is imperfect since you could still have truncations of the file (whole frame deleted) that might be missed, but it would notice minor errors or bit flips, at least assuming you have the CRC option enabled when encoding files.

Re: Protecting audio files from bit rot?

Reply #51
So why not zip your albums and put them on external hard drive?  Sure having 2 copies of every album will take up space, but then you have a backup that is verified.
It would be pleasant if you took some of the advice given to you. there is no need to store 2 backups to protect against a sudden data corruption in one copy. PAR file will give you a much better protection against data corruption than an extra copy.
You method is not protection/space efficient.

You asked for advice. Par is MADE for what you are asking for. Its an ECC System in file format.
Sven Bent - Denmark


Re: Protecting audio files from bit rot?

Reply #53
With MP3 there is no checksum embedded. 

FWIW, this thread says foobar can also check MP3 frame CRCs:

https://hydrogenaud.io/index.php/topic,68536.0.html

This is imperfect since you could still have truncations of the file (whole frame deleted) that might be missed, but it would notice minor errors or bit flips, at least assuming you have the CRC option enabled when encoding files.

foobar2000 can show CRC32 checksum of MP3 by using "verify integrity" but I just checked one of my albums ripped with LAME 3.9 and there is no CRC32 information.
So why not zip your albums and put them on external hard drive?  Sure having 2 copies of every album will take up space, but then you have a backup that is verified.
It would be pleasant if you took some of the advice given to you. there is no need to store 2 backups to protect against a sudden data corruption in one copy. PAR file will give you a much better protection against data corruption than an extra copy.
You method is not protection/space efficient.

You asked for advice. Par is MADE for what you are asking for. Its an ECC System in file format.
Forgive me but I thought I replied to your post.  Having 2 copies of thousands of album is not space efficient - I agree - but ZIP still does checksum.

Do you suggest https://multipar.eu/ ?  It seems the popular "QuickPar" has not been updated in 12 years.

Re: Protecting audio files from bit rot?

Reply #54
DeathCore Im not sure what you are trying to do. You suggested as i quoted to store zip'ed version as a backup and/or to use it for checksums ability.

much better tools for that is par. you can use PAR to check your files for data corruption just like running the checksum in zip.
in case you get data corruption par file can also be used to fix those. the same as if you kept an extra copy BUT would take up far less space

Either I'm missing your point about what you are trying to obtain with zipping and extra copy or you are still not understanding what PAE can do for you.. Which par solutions you get is up to you, the par format should to the best of my knowledge be the same.
https://en.wikipedia.org/wiki/Parchive


I'ts just that you rsuggestion of storing an extra copy in a zip file is far inferior in any aspect i can think of .
- Checksum strenght is lower than just using a hash.
- It take up more space than par file or par file would be more robust in correcting errors than an extra copy.
PAR can even self heal aka if par data goes corrupt it can fix itself and fix  the data you are trying to protect as well.


at least if you would still gowith the zip things down use Rar instad.
Rar has a data recovery feature. so in case something get corrupted it can fix it. not just check if things are ok.
Also never use solid compression as it cna amke dat korruptio have a lot more sever effect on your data.
Sven Bent - Denmark

Re: Protecting audio files from bit rot?

Reply #55
DeathCore Im not sure what you are trying to do. You suggested as i quoted to store zip'ed version as a backup and/or to use it for checksums ability.

much better tools for that is par. you can use PAR to check your files for data corruption just like running the checksum in zip.
in case you get data corruption par file can also be used to fix those. the same as if you kept an extra copy BUT would take up far less space

Either I'm missing your point about what you are trying to obtain with zipping and extra copy or you are still not understanding what PAE can do for you.. Which par solutions you get is up to you, the par format should to the best of my knowledge be the same.
https://en.wikipedia.org/wiki/Parchive


I'ts just that you rsuggestion of storing an extra copy in a zip file is far inferior in any aspect i can think of .
- Checksum strenght is lower than just using a hash.
- It take up more space than par file or par file would be more robust in correcting errors than an extra copy.
PAR can even self heal aka if par data goes corrupt it can fix itself and fix  the data you are trying to protect as well.


at least if you would still gowith the zip things down use Rar instad.
Rar has a data recovery feature. so in case something get corrupted it can fix it. not just check if things are ok.
Also never use solid compression as it cna amke dat korruptio have a lot more sever effect on your data.

I am just trying to learn different ways to protect my music from possible corruption.  It sounds like PAR is the second best choice if you don't want to change your computer's whole file system.

Also before making this topic, I have never seen anyone discuss backing up music with PARchive.  I have however, read many posts in the past about zipping music for checksum then storing on external media.  I never said I did this.  Also I don't just mean zip - rar, 7z, lz, dmg, etc.

Re: Protecting audio files from bit rot?

Reply #56
Maybe spoon feed him with sfv files. I'm afraid he's way in over his head at this point.

Like using zip (et al), looking into fb2k's integrity verifier in conjunction with frame CRCs that likely don't exist in any of his mp3s (not the same as the CRC in the Lame header that I was talking about, but I digress) is, well, let's be nice and just call it a wasted effort. 

For me, mp3s are entirely expendable. If they were going to be corrupted in some way I would just make new ones. If I am unable to detect a problem that may exist, then let ignorance be bliss. If I encounter what I suspect to be an audible problem, then my next course of action would be to consult the source.  This seems far more rational to me than getting sucked into a rabbit hole of paranoia. This is not to dismiss par files, rather it is about the fear that redundancy is being created for files that might already be damaged.
Is 24-bit/192kHz good enough for your lo-fi vinyl, or do you need 32/384?

Re: Protecting audio files from bit rot?

Reply #57
foobar2000 can show CRC32 checksum of MP3 by using "verify integrity" but I just checked one of my albums ripped with LAME 3.9 and there is no CRC32 information.
We already discussed that several times in this thread, did you see them? Use EncSpot and Mediags. In case you are are unfamiliar with a commandline program like Mediags, it is easy to to use a .bat file and make a shell extension yourself.

[1]Download all exe and dll files in Mediags' website and put in the same folder.

[2]Paste the text below in notepad, save as a .bat file and put into the same folder.

Code: [Select]
@echo off
mediags %1 > log.txt
notepad log.txt

[3]Open Windows explorer and type "shell:sendto" (without the quotes) in the address bar and create a shortcut of the .bat file in the sendto folder.

[4]Right click any folder with your audio files >> Send to >> your .bat file, Mediags will analyze your files and open a log file after it finished.


Re: Protecting audio files from bit rot?

Reply #58
There is nothing specific between you audio files and other data
Well, for me at least, the specific thing about music files is that I changed metadata (tags, adding ReplayGain info...) way more than with other kinds of files, which made relying on external checksums/parchives impossible in the long run.
It's good to have audio stream checksums then, as in FLAC and a few others.
That aside, I agree it's just backups as usual (with both extra copies and par files, or even par files with 100% redundancy instead of extra copy).

Re: Protecting audio files from bit rot?

Reply #59
Btrfs is still considered experimental, but the simple schemes (read: not RAID5) are stable. I'm using it in RAID1 mode at the moment. Even though I had a problem with the filesystem itself once, the data is safe, mainly because a broken FS refuses to mount. For those that want a non-experimental system, ZFS is an alternative.

Non-checksummed RAID1 setups died for me after I read this article. Even without RAID1, a checksummed FS is useful because you immediately notice the file is corrupted, and maybe have the chance to restore an older copy before, say, deleting the older copy to clean up some space.

Last I checked, ZFS, though an awesome filesystem, requires a LOT of RAM.  For the size array I wanted, my motherboard did not support enough RAM.  I believe the rule of thumb is 1 GB of RAM for each TB of data in the array.  3 4 TB drives in a RAIDZ would have have required 12 GB of RAM just for the filesystem.  On a 16 GM max motherboard, that didn't leave a lot of RAM for the OS and all the services I had running on it.

ZFS is still more advanced and stable than BTRFS.  But I think once BTRFS gets "stable," it's lower memory footprint may make it more desirable than ZFS.

The other thing that concerns me is that ZFS is owned by Oracle now.  Oracle is the company that started development and backed BTRFS.  I'm hoping that ZFS has a bright future under Oracle, but I am a little worried about it.

Re: Protecting audio files from bit rot?

Reply #60
I believe the rule of thumb is 1 GB of RAM for each TB of data in the array.
I thought that was for realtime de-duplication only? (the arc cache). I know some NAS offer the option to use ZRAID and they don't have so much RAM.

Re: Protecting audio files from bit rot?

Reply #61
Quote
Use EncSpot and Mediags.

Woohoo, my project's first recommendation on HA!

On FLAC, I haven't seen mentioned the CRC-32's you get if you rip with EAC.  These EAC log CRCs are against the same data as the FLAC's intrinsic CRC-16.  If you can verify these, there's no big need for a separate hash file.  Not sure if there are any tools that will check these though...  Oh wait, there is at least one

https://mediags.codeplex.com/wikipage?title=UberFLAC%20over%20WPF

*cough, cough*

Re: Protecting audio files from bit rot?

Reply #62
1. When copying your files, use a utility that checks the files after copy operations (I use TeraCopy).
2. For safe, long term storage, use MDISC. I use 100GB blu-ray XL discs.

Re: Protecting audio files from bit rot?

Reply #63
This is simple. You create a hash of any file you need to be kept pristine and routinely validate the hash on both your primary and backup systems.

Re: Protecting audio files from bit rot?

Reply #64
Btrfs is still considered experimental, but the simple schemes (read: not RAID5) are stable. I'm using it in RAID1 mode at the moment. Even though I had a problem with the filesystem itself once, the data is safe, mainly because a broken FS refuses to mount. For those that want a non-experimental system, ZFS is an alternative.

Non-checksummed RAID1 setups died for me after I read this article. Even without RAID1, a checksummed FS is useful because you immediately notice the file is corrupted, and maybe have the chance to restore an older copy before, say, deleting the older copy to clean up some space.

Last I checked, ZFS, though an awesome filesystem, requires a LOT of RAM.

Yes and it is best if it is ECC RAM, though I am of that opinion anyway regardless of file system.

Re: Protecting audio files from bit rot?

Reply #65
Quote
Use EncSpot and Mediags.

Woohoo, my project's first recommendation on HA!

On FLAC, I haven't seen mentioned the CRC-32's you get if you rip with EAC.  These EAC log CRCs are against the same data as the FLAC's intrinsic CRC-16.  If you can verify these, there's no big need for a separate hash file.  Not sure if there are any tools that will check these though...  Oh wait, there is at least one

https://mediags.codeplex.com/wikipage?title=UberFLAC%20over%20WPF

*cough, cough*

Thanks. Can you add some switchs to ignore some types errors or only check against a specific type of error? I have many files always show errors (and very horrible words like "fatal") despite the fact they are perfectly fine. For example:

xxx.flv
* Fatal: File truncated near packet header.

xxx.mkv
* Fatal: No element found with signature [7B][A9][A2]

xxx.wav
* Fatal: Missing 'data' section

For example, is it possible to show only crc errors?


Re: Protecting audio files from bit rot?

Reply #67
As others have pointed out, this thread goes off the rails of reality in the first post.

Data storage formats such as optical media, hard drives, etc are protected from hardware errors surreptiously causing errorsbby adding parity, CRC, and other checksum type controls to the data as they store them.

The relevance of this problem can be estimated by looking at the number of times a stored program that has always worked well suddenly starts totally failing and crashing do ng the identical same things that used to work well, without an accompanying error message pointing out the media errors that caused it.

In general, computer programs are far, far, far more intolerant of random errors than audio signals.

The reliability of computer systems while processing files that lack common protections schemes can be estimated by looking at the reliability of the same system, processing files that do contain protection schemes.


Bottom line, widespread problems with bit rot are usually modern versions of stories about things that go bomp in the night that used to be told around camp fires, etc. 

Re: Protecting audio files from bit rot?

Reply #68
While not widespread, those issues do occur. It happened to me a few times already with archived data, without me moving it - it just suddenly wasn't matching the checksums on several files anymore. That's on at least two HDDs of different age and brand. Maybe all those error correction measures inside the HDD work when you regularly read the files, but it doesn't seem to work as well when the HDD is off most of the time.

In any case, better be safe than sorry afterwards, so adding an additional layer of protection is neither wrong, nor does it "go off the rails of reality". But that's just me.

Re: Protecting audio files from bit rot?

Reply #69
While not widespread, those issues do occur. It happened to me a few times already with archived data, without me moving it - it just suddenly wasn't matching the checksums on several files anymore.

Which checksums?  How do you know for sure that these problems weren't due to human failure?

Quote
That's on at least two HDDs of different age and brand. Maybe all those error correction measures inside the HDD work when you regularly read the files, but it doesn't seem to work as well when the HDD is off most of the time.

The drive checksums are created when you record and play data off the disk. They are internal to the drive and generally not visible to people using a proper O/S such as Windows to read and write the files. When the drive is powered down and idle, they are not being calculated.


Quote
In any case, better be safe than sorry afterwards, so adding an additional layer of protection is neither wrong, nor does it "go off the rails of reality". But that's just me.

The odds of failure of the same data on two different disks going bad are fantastically high. They point to a failure of a common component which could be software.  I get the feeling that significant information about these failures is not being reported. If you are storing data with some odd operating system, or system software then of course you shouldn't be using it to store and retrieve important data. But, this is not how the vast majority of data is used.


 

Re: Protecting audio files from bit rot?

Reply #70
The relevance of this problem can be estimated by looking at the number of times a stored program that has always worked well suddenly starts totally failing and crashing do ng the identical same things that used to work well, without an accompanying error message pointing out the media errors that caused it.

Back in the day when I worked with people who worked with PCs, that was all part of the Windows experience! Microsoft set the computer-using bar very low, and people got used to stuff like having to reboot,  re-install programs and even reload the operating system on a regular basis. All this while the Unix machines in the server room just went on and on ...and on ...and on.

NB... My experience of Windows ended at XP. If it has improved, in the several versions since then, well good. About time too.

Quote
The odds of failure of the same data on two different disks going bad are fantastically high.The odds of failure of the same data on two different disks going bad are fantastically high. They point to a failure of a common component which could be software.
Or a disk controller or... anything.

Technically, as users, all we need to know is that our systems are not perfect, and that hard disks are mortal, and that, if we do not have adequate backups, we run the risk of loosing our data.  The odds of that happening are not high: it is a lucky person that does not experience one or more hard disk failures in their computing life. Actually, never mind the hardware... it is a lucky person that never gets that ohmygodwhathaveIdone feeling after a delete command.

The rest is academic.

But rot in music files? Yes, probably mythical. in about 15 years of regularly listening to music from a computer, I have (as I think I mentioned before) had one file that "went bad." It played, but horribly distorted. The backup was fine, and it was far more likely to have been user error (me: but no idea how) than bit rot) because a data error on disk would surely be more likely to cause drop outs or an unplayable file, not one that had the same fault from beginning to end. I don't even have a count of the hard disks I've lost in that time.



The most important audio cables are the ones in the brain

Re: Protecting audio files from bit rot?

Reply #71
There is nothing specific between you audio files and other data
Well, for me at least, the specific thing about music files is that I changed metadata (tags, adding ReplayGain info...) way more than with other kinds of files, which made relying on external checksums/parchives impossible in the long run.
It's good to have audio stream checksums then, as in FLAC and a few others.
That aside, I agree it's just backups as usual (with both extra copies and par files, or even par files with 100% redundancy instead of extra copy).

You quoted me out of context i believe the quoted part was according to data corruption.
besides the facts the behavior of updating metadata is not unique to audio data. it might be the only data you personally  but that doesn't make it a valid general statement.

nevertheless if you do update you audio files regular you simply use it as any other new data version. aka verify old and  create new checksum/hash/parity file.
I do agree however that internal checksums/hashes would make it easier as the would be based solely on the main data stream. but in reality it does not make it impossible to keep a external checksum/ash/parity file at all.

its is as always a matter of effort vs wanted results tradeoff
Sven Bent - Denmark

Re: Protecting audio files from bit rot?

Reply #72
While not widespread, those issues do occur. It happened to me a few times already with archived data, without me moving it - it just suddenly wasn't matching the checksums on several files anymore.

Which checksums?  How do you know for sure that these problems weren't due to human failure?
CRC32 in the file name of several video files, MD5 within several flac/tak files. Archived, verified checksums matched, no more writes done to the files since. Re-verified one day, some checksums mismatched.

Quote
Quote
That's on at least two HDDs of different age and brand. Maybe all those error correction measures inside the HDD work when you regularly read the files, but it doesn't seem to work as well when the HDD is off most of the time.

The drive checksums are created when you record and play data off the disk. They are internal to the drive and generally not visible to people using a proper O/S such as Windows to read and write the files. When the drive is powered down and idle, they are not being calculated.
That's what I am saying. If the failure happens while the drive is off, or upon power on, it's not much of a security.

Quote
Quote
In any case, better be safe than sorry afterwards, so adding an additional layer of protection is neither wrong, nor does it "go off the rails of reality". But that's just me.

The odds of failure of the same data on two different disks going bad are fantastically high. They point to a failure of a common component which could be software.  I get the feeling that significant information about these failures is not being reported. If you are storing data with some odd operating system, or system software then of course you shouldn't be using it to store and retrieve important data. But, this is not how the vast majority of data is used.
You're trying to find things in my statements that aren't there - the failures happened on a single disk (the backup one), with NTFS, on windows. If you think using self-healing file systems or other means of protecting your data like .par archives is useless, it's your free decision not to.

Like I said, I prefer my data to be safe in case something does happen. You can cry about the odds for a failure being astronomically high all you want afterwards, it won't get your data back.

Re: Protecting audio files from bit rot?

Reply #73
That's on at least two HDDs of different age and brand. Maybe all those error correction measures inside the HDD work when you regularly read the files, but it doesn't seem to work as well when the HDD is off most of the time.

That would be speculation on your part apparently based on just one or two occurrences, not verified knowledge obtained by monitoring the operation of a large number of drives.

It is practically impossible for data to change due to a failure of the media in the drive without triggering the drive's error detection and correction features. These features are in a way bypassed if an outside agency changes the drive, because the checksums will  remain correct because there has been no media or logic failure in the drive.

 The error detection and correction features are engaged when data is read or written. If or when a drive fails while powered off, the failure is moot until the next time the relevant data is read or written to the disk.

The drive checksums are created when you record and play data off the disk. They are internal to the drive and generally not visible to people using a proper O/S such as Windows to read and write the files. When the drive is powered down and idle, they are not being calculated. [/quote]That's what I am saying. If the failure happens while the drive is off, or upon power on, it's not much of a security.
[/quote]

The data is checked the next time the drive  is powered up and read or written, which suffices for all practical circumstances.

Quote
In any case, better be safe than sorry afterwards, so adding an additional layer of protection is neither wrong, nor does it "go off the rails of reality". But that's just me.

Quote
Like I said, I prefer my data to be safe in case something does happen. You can cry about the odds for a failure being astronomically high all you want afterwards, it won't get your data back.

You are talking about someone besides me, because I have multiple backups of critical information, some geographically dispersed.

The real problem is that if the drive fails and data is lost or corrupted, the only practical way to recover it in most cases is to go to your backups.

If there is bit rot, the principles of operation of the device and actual real-world experience shows that it must be the error detection and correction features of the drive that give the first warnings, unless the drive is so badly failed that not even they work.

Since my (pre retirement) day job used to involve maintaining and building computers, I've seen a ton of media and equipment failures involving data storage.. Bit rot definitely exists, but it is most often found in common optical media. I have had whole boxes of CD ROMs and DVDs fail on the shelf in a dry, dark area,, and the same for both optical drives, hard drives, and SSDs. 

If a storage device is in the process of succumbing to bit rot, there is  usually first a goodly number of errors detected by the drive itself.  Not all of them are explicitly reported, and sometimes they first manifest themselves as equipment slow-downs. 

That all said, I don't back up anything because I fear bit rot. There are so many far more common errors with the same fatal outcome.

Re: Protecting audio files from bit rot?

Reply #74
That's on at least two HDDs of different age and brand. Maybe all those error correction measures inside the HDD work when you regularly read the files, but it doesn't seem to work as well when the HDD is off most of the time.

That would be speculation on your part apparently based on just one or two occurrences, not verified knowledge obtained by monitoring the operation of a large number of drives.

It is practically impossible for data to change due to a failure of the media in the drive without triggering the drive's error detection and correction features.
Yes. But can the error correction work in all cases? No. There is a limit to what it can recover, and sometimes it won't be able to recover the correct state.

Quote
The real problem is that if the drive fails and data is lost or corrupted, the only practical way to recover it in most cases is to go to your backups.
Or, you let your file system automatically handle that (notifying you your other drive is failing) and all other cases of weirdness that may or may not crop up.

Quote
If there is bit rot, the principles of operation of the device and actual real-world experience shows that it must be the error detection and correction features of the drive that give the first warnings, unless the drive is so badly failed that not even they work.
The thing is, as you said it yourself
Quote
Not all of them are explicitly reported, and sometimes they first manifest themselves as equipment slow-downs.
And actually, I only had one type of errors reported to me by windows - that is, that the drive has completely failed. So unless you run additional tools that monitor SMART values, those logged errors are rather useless. And even then, as I said above, some errors are not corrected, so those values are mostly informational in nature. They might warn you and save you from even bigger data loss, but they don't prevent it in all cases. Self healing file systems like btrfs or zfs on the other hand are an additional layer of protection, that only fails to protect you from a complete RAID failure (all drives die before the array can be rebuilt). But that's not something you can do without having another copy anyway.

Quote
That all said, I don't back up anything because I fear bit rot. There are so many far more common errors with the same fatal outcome.
Nor do I, but if I can prevent something from happening, I don't see why I shouldn't - and you haven't mentioned any reasons against it either, even though you sound like you are against it.

 
SimplePortal 1.0.0 RC1 © 2008-2018