Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Testing lossless audio files for data corruption (Read 10988 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Testing lossless audio files for data corruption

Hi, everyone.  I have been working for years to backup my CD library to lossless audio files and have encountered some issues with file corruption (mostly related to external hard disk drives) that are freaking me out.  I am seeking advice as to how I might be able to verify that all my archival copies are intact.

First, here's the problem that I've had:

1.  I converted a bunch of FLAC files to ALACs using DBPowerAmp, which verified the original FLAC files by checking their embedded MD5s against the decoded audio's, encoded the decoded audio to ALAC, decoded the ALACs and checked the decoded audio against the decoded audio from the original FLAC files.  The ALAC files were all created on an external hard disk drive.

2.  I copied the ALAC files from the external hard disk drive to a clean Windows PC via USB 2.0 using Windows Explorer drag-and-drop copy.  The files were shared via a read-only Samba share and all playback, etc. in iTunes was done from this read-only share.  This is my home music server.

3.  Less than 24 hours later, without using the external hard disk drive for any other purpose, I copied the ALAC files from the external hard disk drive to a second clean Windows PC via USB 2.0 using Windows Explorer drag-and-drop copy.  The files were shared via a read-only Samba share and all playback, etc. in iTunes was done from this read-only share.  This is my office music server.

4.  Some of the ALAC files on the office music server have corrupt audio data.  They playback with audible artifacts (sudden, loud, brief static in the middle of some of the audio).  The ALACs on my home music server have no audible defects (at least, none that I've found so far).  The byte counts for the corresponding files at home vs. in the office are the same, but there must be some bad bits in the office files.

I've repeated the process several times and changed the tools and equipment I've used and have found the same thing multiple times (only after extended listening, since only 1 in several hundred files is affected, and different files are affected each time).  I've tried encoding via iTunes, Foobar, DBPowerAmp, and have copied via Windows Explorer and Mac OS cp.  I've played back via Cog, iTunes, VLC, etc. and no tools have ever reported errors in copying files or transcoding.  The only thing I can trace it back to is that it has to be that the external hard disk drives (I've used many, Western Digital, Seagate, etc.) are corrupting the data.

Thank you for reading this far, this is driving me crazy, as you might expect.

Now, my ultimate goal is to verify that all of the audio data in any backup set of ALAC files is intact.  As far as I know, the ALACs do not have any embedded CRC information, so the audio in the ALAC files cannot be verified against the metadata in the files themselves (please correct me on this if I am wrong).  I do have, however, MD5s embedded in all of my source FLAC files and I've stored Exact Audio Copy log files with CRCs for all of the CDs I've ripped.  I can certainly write some scripts to knit these all together and compare the MD5s of the FLACs with MD5s of the ALAC audio.  I'm just wondering if anyone has suggestions as to how I might be able to do this.  Here are the things I'm looking to do:

a.  How do I best verify that the MD5s embedded in my FLAC files match the actual audio data in the files?
b.  How do I best verify that the MD5s of the FLAC files match the MD5s of the audio in the ALACs?
c.  Is there a standard method of storing CRCs or MD5s in ALACs?

I have some potential solutions to all of these, but I was hoping that you all might be able to give me some guidance.

Finally, I've also looked into volatility issues with external hard disk drives, and I've found some information, but the implications of the errors that I've consistently but unpredictably encountered over the last couple of years frighten me.  External hard disk drives are becoming the de-facto backup medium for many users (especially with the availability of tools like Apple's Time Machine, etc.), and I wonder how unreliable these cheap devices' USB-to-SATA controllers are.

Thanks for all your help.

Testing lossless audio files for data corruption

Reply #1
You can use foobar 2000 and download the component "binary comparator" to see if the audio data is identical:
http://wiki.hydrogenaudio.org/index.php?ti.../foo_bitcompare

Or you can use the "calculate audio crc" codec in dbpoweramp to recalc and compare.
http://www.dbpoweramp.com/codec-central-utility.htm

(But there may be a quicker way that looks at md5s only.)

Testing lossless audio files for data corruption

Reply #2
Use Batch Converter and Convert To >> Test Conversion, any errors in the FLAC files will show at the end. ALAC does not have this ability.

Testing lossless audio files for data corruption

Reply #3
NB: If we are talking foobar components I'd have suggested File Integrity Verifier.

Personally I would use a batch file and FLAC's -t switch for this.

In a related side note:
I back up my CDs using Wavpack to DVD with PAR2, and an external hard drive.  I recently realised that my external hard drive was near full, so I bought a new one.  I transferred all files to the new drive using XCOPY (using /V to verify), and then used a batch file to verify the Wavpack files.  3-4 failed, and on testing they were corrupt on the original drive.  Thankfully I had my DVD back-ups, which thankfully didn't require the PAR2 data as yet.  Just another reminder that you need to verify your files regularly, and - if you can - back-up to more than one place.  Thanks to my new 1TB drive I now have my DVDs and two external hard drives! Belt and braces.
I'm on a horse.

Testing lossless audio files for data corruption

Reply #4
As for ©: If you really need ALAC, and are prepared for some work, perhaps you could use something like shntool to generate the hash of the audio data, store it in a tag, and then decompress and run through shntool to verify.  Certainly not as pleasant as the built in options that Wavpack, FLAC, TAK, etc. have, but an option, at least.

That said, do you really need to worry about the ALAC files?  Why not just ensure that the FLAC files are correct, and if you spot a problem with an ALAC file just re-convert.  Easier to keep an eye on one set of files, than two (although maintaining two does give you redundancy either way).
I'm on a horse.

Testing lossless audio files for data corruption

Reply #5
Thanks TechVsLife and spoon.  I was thinking about using these.  I'll probably use DBPowerAmp's Test Converter on all the FLACs to verify them, then I'll use the Calculate Audio CRC on the ALACs and the FLACs and compare the results.  I just didn't know if there was a way of comparing the ALACs to the embedded FLAC MD5s.  It sounds like the ALACs don't have checksums embedded (based on everything that I know and what spoon has said).  I wonder, would the Foobar2000 binary comparator check the FLACs' MD5s while comparing them to the ALACs?  If it would, that might do it all in one step (if it doesn't check these, though, if the FLACs have been corrupted then it might not flag anything, since the ALACs were generated from the FLACs).

I'm going to have to run this process on several thousand files that I've generated over the years, so it will take my machine a couple of days.  I'm just looking to make sure I come up with a definitive process before moving forward.

The other question is that I'd like some process to validate the audio data in the ALACs in the future so that if, for example, I ran DBPowerAmp's Calculate Audio CRC on the ALACs and stored the log file, then changed the ALAC tags (to upgrade the artwork, etc.), I could run a process to compare the ALAC audio against the pre-calculated, verified CRCs in the future.  I don't know of a tool that will do this (and generating MD5s on the entire original ALAC files wouldn't let me re-verify the audio if I change the file tags).

Thanks again to you both.

Testing lossless audio files for data corruption

Reply #6
As for ©: If you really need ALAC, and are prepared for some work, perhaps you could use something like shntool to generate the hash of the audio data, store it in a tag, and then decompress and run through shntool to verify.  Certainly not as pleasant as the built in options that Wavpack, FLAC, TAK, etc. have, but an option, at least.

That said, do you really need to worry about the ALAC files?  Why not just ensure that the FLAC files are correct, and if you spot a problem with an ALAC file just re-convert.  Easier to keep an eye on one set of files, than two (although maintaining two does give you redundancy either way).


Thanks, your posts are helpful.  Do you think that Shntool will let me run a batch validation of all the ALACs if I embed the CRCs of the audio in their tags?  That would be great.

The motivation for this whole thing is that I love FLAC (especially since this sort of integrity check is comparatively easier with it), but I now listen to mostly ALACs (since I playback on Airport Expresses and my iPhone).  I do and always will rip to FLAC and will keep my precious FLACs, but it would be nice to be able to verify my ALACs before hand since recently, every now and then I'll be listening to an ALAC and it will playback improperly, which is annoying (and embarrassing at parties, etc.).

Thanks again.

Testing lossless audio files for data corruption

Reply #7
Thanks, your posts are helpful.  Do you think that Shntool will let me run a batch validation of all the ALACs if I embed the CRCs of the audio in their tags?  That would be great.
Well, it looks like shntool can handle ALAC files (if you place alac.exe in the same directory) natively, so I would have said that this would be relatively easy.  That said, I know nothing of ALAC files and tagging, so don't know how easy it is to tag with the information, and retrieve it to compare to the result of shntool's hash mode.
I'm on a horse.

Testing lossless audio files for data corruption

Reply #8
NAS is much preferable to USB based solutions. For how long have manufacturers failed to even supply the most basic ATA features as error reporting without flaws? Get a cheap dual core Atom mainboard with 64 bit support and install FreeNAS and access it via gigabit ethernet or wirelessly. It supports the checksummed ZFS filesystem, which supports instant error reporting (even when your drive doesn't) and even on-the-fly repair for pooled storage.

Testing lossless audio files for data corruption

Reply #9
fwiw, sounds strikingly similar to this ubuntu/samba bug -- actually almost as if you were the op there:
https://bugs.launchpad.net/ubuntu/+source/samba/+bug/491288

note: The link above is not a win7 or usb bug. However, there's no shortage of usb corruption stories, sometimes attributable to particular hardware (typically the usb controller chip in the computer). I assume you disconnect the drive through 'safely remove hardware.' You may also want to try changing the cache setting (e.g. remove write caching).  And run a diagnostics test on the drive using the software from the manufacturer.

Testing lossless audio files for data corruption

Reply #10
Are you sure that the files are corrupted rather than your playback being affected by something in the system?  I get playback faults from time to time, but when I go back to the same place in the music and play over, there is never a fault in the same place.  If the fault is not repeatable at the same point in the music, it is not a problem with the file.

I have had problems with tags getting corrupted on a copy to external drive from time to time.

Testing lossless audio files for data corruption

Reply #11
Are you sure that the files are corrupted rather than your playback being affected by something in the system?  I get playback faults from time to time, but when I go back to the same place in the music and play over, there is never a fault in the same place.  If the fault is not repeatable at the same point in the music, it is not a problem with the file.

I have had problems with tags getting corrupted on a copy to external drive from time to time.


Hi.  Absolutely, it is in the file.  I have played back the same file on multiple systems with different players, decoded the wave form, etc. and it's in there.  The original file copied over to these systems plays fine.  For a long time, I assumed that the issue was with the encoding process and kept trying different encoding tools, but now I've realized that the files are getting corrupted just in file copies to/reads from external hard disks.

I'm big on ZFS and am building an iSCSI SAN at work with OpenSolaris, but for my own audio files, I'd just been copying my FLAC and ALAC files to multiple external hard disks for backups and figured that if the OS and disk drive didn't report any errors, that everything was peachy.  I guess I'll be implementing better storage methodology at home, too.

Thanks.

Testing lossless audio files for data corruption

Reply #12
NB: If we are talking foobar components I'd have suggested File Integrity Verifier.

Personally I would use a batch file and FLAC's -t switch for this.

In a related side note:
I back up my CDs using Wavpack to DVD with PAR2, and an external hard drive.  I recently realised that my external hard drive was near full, so I bought a new one.  I transferred all files to the new drive using XCOPY (using /V to verify), and then used a batch file to verify the Wavpack files.  3-4 failed, and on testing they were corrupt on the original drive.  Thankfully I had my DVD back-ups, which thankfully didn't require the PAR2 data as yet.  Just another reminder that you need to verify your files regularly, and - if you can - back-up to more than one place.  Thanks to my new 1TB drive I now have my DVDs and two external hard drives! Belt and braces.


Synthetic Soul, while I assume that the protocol you described will verify the Wavpack files via the batch file you mentioned, I just want other readers of this board to note that the xcopy /v flag does not actually verify the entire contents of the copied file in Windows:

http://support.microsoft.com/kb/126457

Thanks.


 

Testing lossless audio files for data corruption

Reply #14
Thanks for the verification bostonidealist, I was unaware of what /V actually did.  Maybe I'll check out XXCOPY, although I would use Wavpack's own verification method always.
I'm on a horse.