Skip to main content
Topic: Protecting audio files from bit rot? (Read 9913 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: Protecting audio files from bit rot?

Reply #75
Thanks. Can you add some switchs to ignore some types errors or only check against a specific type of error? I have many files always show errors (and very horrible words like "fatal") despite the fact they are perfectly fine. For example:

xxx.flv
* Fatal: File truncated near packet header.
...
xxx.wav
* Fatal: Missing 'data' section

For example, is it possible to show only crc errors?

Fatal errors indicate that the parser is too confused to do anything CRC tests and just gives up.  These formats don't have any CRCs anyway so I take your request to optionally look only at files containing intrinsic checks.  Will consider options.

xxx.flv
* Fatal: File truncated near packet header.

This is a bug in Mediags and thanks for reporting it.  Your file is probably fine.  The fix is easy, but replicating the bug is not since I have no media file that uses segment 7BA9.  Not even any of the official Matroska test files do.  MKV format is sooo difficult is so many ways.

The issue has been assumedly fixed in the next release - v0.9.6

Re: Protecting audio files from bit rot?

Reply #76

Quote from: arny
That all said, I don't back up anything because I fear bit rot. There are so many far more common errors with the same fatal outcome.

Nor do I, but if I can prevent something from happening, I don't see why I shouldn't - and you haven't mentioned any reasons against it either, even though you sound like you are against it.

Separate programs that test data don't prevent anything - they just tell you when something bad has happened. The damage is done.

 The effective ways to prevent data corruption that are up to the user are primarily related to controlling the hardware and software environment.

I fear that paranoia about so-called bit rot distracts people from the larger problem which is that computer data storage is fallible in may ways. 

Techniques that back up user data on remote media have become far more accessible and easy to implement due to the growth and development of "The Cloud".

Re: Protecting audio files from bit rot?

Reply #77
Maybe spoon feed him with sfv files. I'm afraid he's way in over his head at this point.

Like using zip (et al), looking into fb2k's integrity verifier in conjunction with frame CRCs that likely don't exist in any of his mp3s (not the same as the CRC in the Lame header that I was talking about, but I digress) is, well, let's be nice and just call it a wasted effort. 

For me, mp3s are entirely expendable. If they were going to be corrupted in some way I would just make new ones. If I am unable to detect a problem that may exist, then let ignorance be bliss. If I encounter what I suspect to be an audible problem, then my next course of action would be to consult the source.  This seems far more rational to me than getting sucked into a rabbit hole of paranoia. This is not to dismiss par files, rather it is about the fear that redundancy is being created for files that might already be damaged.
Forgive me for asking questions all the time greynol.  I am no expert like you - not being sarcastic here.

I get your mentality on mp3 files, it makes sense.  For me though, when I started to rip my music collection decades ago, I never thought about codec quality or even ripping loseless.  It was like... insert CD into iTunes, hit OK - it rips with default AAC settings.  Insert a track in WMP, hit OK - oops, plenty of track information is missing, but oh well...  It's only in the past 5 years or so I started caring about this stuff as my library grew.  So while I only rip in FLAC now, the majority of my collection is still various MP3/AAC.  I could go into storage, bring out all my CD's and rip them again, but that would be a waste of time.  As you said in my other topic - "your eyes cannot hear".  Why re-rip my 128 AAC "Dark Side of the Moon" when it sounds perfect to my ears?

I think creating a par file to verify my music is not damaged is a better solution than going into storage and re-ripping every CD I own just so it can have that built in checksum.  I really like this multipar program that I downloaded.  It's intuitive and does the job I need it to do.  If I just want a checksum for the MP3 files in question, it takes up maybe 20KB of space - that is nothing.  It's unlikely an entire song or album would be damaged, but if I wanted, I could set the redundancy to maybe 10% so multipar could try and repair the damaged MP3 file.  I think for me though, the checksum is enough.  If a song got damaged by bit rot and had a bad CHIRP or something, then fine, I'll go get my CD and rip it again but in FLAC.
There is nothing specific between you audio files and other data
Well, for me at least, the specific thing about music files is that I changed metadata (tags, adding ReplayGain info...) way more than with other kinds of files, which made relying on external checksums/parchives impossible in the long run.
It's good to have audio stream checksums then, as in FLAC and a few others.
That aside, I agree it's just backups as usual (with both extra copies and par files, or even par files with 100% redundancy instead of extra copy).
I changed a lot of metadata this past year - fixing many tags and adding high quality album art.  Unless you are tweaking ReplayGain all the time, I wouldn't say relying on parchive is impossible.  It only takes a few seconds to make a par file checksum for a large album.

Am I correct that FLAC metadata does not change the checksum calculation?
1. When copying your files, use a utility that checks the files after copy operations (I use TeraCopy).
2. For safe, long term storage, use MDISC. I use 100GB blu-ray XL discs.

Never heard of MDISC until today.  In the past I used CD's as archival storage because they were so cheap compared to hard drives or flash storage.  I still have hundreds of burned discs that are over 15 years old and the ones I have tried in the past 2-3 years seem to work just fine.  There has always discussion though, that CD's and DVD's last between 15-30 years and the cheaper discs could physically rot faster.  Even though I don't use CD's and DVD's more, it's always important to have multiple backup solutions.

 

Re: Protecting audio files from bit rot?

Reply #78
Right ... comparing audio files with executables. If I could fix a corrupted audio library by just redownloading a couple of megabytes, I wouldn't worry so much about my backups.

As for mp3s, one is stuck with lossy streams unless one is willing to store decoded/transcoded versions.
High Voltage socket-nose-avatar

Re: Protecting audio files from bit rot?

Reply #79
Maybe spoon feed him with sfv files. I'm afraid he's way in over his head at this point.
Also before making this topic, I have never seen anyone discuss backing up music with PARchive.
It appears as though you did know about par files.  I apologize for underestimating you level of knowledge on the subject.

I am no expert like you - not being sarcastic here.
Thank you, but I do not have the same level of expertise as the others who are contributing to this topic.
Is 24-bit/192kHz good enough for your lo-fi vinyl, or do you need 32/384?

Re: Protecting audio files from bit rot?

Reply #80
Am I correct that FLAC metadata does not change the checksum calculation?
If you mean the "inner" MD5 audio checksum, that is correct. But the checksum of the whole file will of course change any time you change metadata.
A possible solution is separating the audio data and metadata into separate (cue?) files, but AFAIK that has some drawbacks and limitations.

Re: Protecting audio files from bit rot?

Reply #81

Quote from: arny
That all said, I don't back up anything because I fear bit rot. There are so many far more common errors with the same fatal outcome.

Nor do I, but if I can prevent something from happening, I don't see why I shouldn't - and you haven't mentioned any reasons against it either, even though you sound like you are against it.

Separate programs that test data don't prevent anything - they just tell you when something bad has happened. The damage is done.
See, that's not what zfs and btrfs do. They actually can self heal by detecting the (data block, not file!) checksum does not match on one of the disks and automatically copy over the correct data from the backup. I've linked an article which explains it in detail in this thread before, and how it's different from a regular raid1/5.

Parity files like the ones created by multiPAR can also be used to detect and repair damage done to the files. They are a trade-off between the amount of damage they can restore (parity file smaller than a whole copy) and file size.

So yes, there are separate programs that not only let you detect but also repair the damage at the cost of additional space used.

Quote
I fear that paranoia about so-called bit rot distracts people from the larger problem which is that computer data storage is fallible in may ways. 
Hence why users should make backups. And self-healing backups are better than "regular" backups.

Quote
Techniques that back up user data on remote media have become far more accessible and easy to implement due to the growth and development of "The Cloud".
Sadly, only a few countries of the world have a good internet infrastructure to actually allow normal users to backup to the cloud (have fun uploading terabytes with an upstream of 40KiB/s) and even if you have enough upstream bandwidth, cloud offerings that offer enough space to back the data up will cost you way more than just buying another HDD and storing it at a friend's house several tens of kilometers away.

In other news, it seems Apple will be including a new filesystem in the macOS sierra release, but from what it seems, it doesn't have any FS level checksumming like btrfs or zfs.

Re: Protecting audio files from bit rot?

Reply #82

Quote from: arny
That all said, I don't back up anything because I fear bit rot. There are so many far more common errors with the same fatal outcome.

Nor do I, but if I can prevent something from happening, I don't see why I shouldn't - and you haven't mentioned any reasons against it either, even though you sound like you are against it.

Separate programs that test data don't prevent anything - they just tell you when something bad has happened. The damage is done.

 The effective ways to prevent data corruption that are up to the user are primarily related to controlling the hardware and software environment.

I fear that paranoia about so-called bit rot distracts people from the larger problem which is that computer data storage is fallible in may ways. 

Techniques that back up user data on remote media have become far more accessible and easy to implement due to the growth and development of "The Cloud".


This I have to agree with.  Before I implemented a btrfs mirror, if a file was corrupt, I simply hopped on Crashplan and restored the file over the old one and was up and running 5 minutes later.  Of course having a btrfs mirror cuts that time down to seconds, instead of minutes.

Don't let copy-on-write checksummed filesystems distract you from keeping good backups.

Re: Protecting audio files from bit rot?

Reply #83
This is a bug in Mediags and thanks for reporting it.  Your file is probably fine.  The fix is easy, but replicating the bug is not since I have no media file that uses segment 7BA9.  Not even any of the official Matroska test files do.  MKV format is sooo difficult is so many ways.

The issue has been assumedly fixed in the next release - v0.9.6

Found an example that uses this signature.*  Current RC build handles it okay.  (First untested attempt did not.)

On a related note, I have yet to find an example of a MKV container that actually uses intrinsic CRCs effectively.  Still, the Mediags tool is useful to verify structure and file length.

*Earlier reply was a misquote.  Meant to refer to the xxx.mkv file.

Re: Protecting audio files from bit rot?

Reply #84
FWIW, this thread says foobar can also check MP3 frame CRCs:

https://hydrogenaud.io/index.php/topic,68536.0.html

This is imperfect since you could still have truncations of the file (whole frame deleted) that might be missed, but it would notice minor errors or bit flips, at least assuming you have the CRC option enabled when encoding files.

I'd like to point out a few facts about MP3 CRC that seem to be largely unknown to the general public.

Source:
http://www.mp3-tech.org/programmer/docs/mp3_theory.pdf
Quote
This field will only exist if the protection bit in the header is set and makes it possible check the most sensitive data for transmission errors. Sensitive data is defined by the standard to be bit 16 to 31 in both the header and the side information. If these values are incorrect they will corrupt the whole frame whereas an error in the main data only distorts a part of the frame. A corrupted frame can either be muted or replaced by the previous frame.
tl;dr You can flip bytes in MP3 file with a hex editor and most of the time CRC checks will not detect audible defects; they safeguard only a specific small part of each MP3 frame.

The MP3 CRC field is meant for preventing massive audible distortion when streaming over an unreliable medium, not for detecting storage errors.
On top of that, working on frame basis, they do nothing about protecting the file as a whole, against truncation or insertion of unwanted data.

Re: Protecting audio files from bit rot?

Reply #85
Quote
most of the time CRC checks will not detect audible defects
This jives with my testing.  I've never heard any difference from flipping single bits at random.
Quote
The MP3 CRC field is meant for preventing massive audible distortion when streaming over an unreliable medium, not for detecting storage errors.
Makes sense - it's just 16 bits.  The possibility of a false negative is too high.
Quote
they do nothing about protecting the file as a whole, against truncation or insertion of unwanted data
The LAME CRC does - within the limit of a paltry 16 bits.  LAME's CRC is over the *entire* audio segment, not just a frame.  LAME also stores the length of the entire audio segment.  The quote you provided is about audio frames.  It does not address the LAME wrapper.

 
SimplePortal 1.0.0 RC1 © 2008-2019