FLAC and Wavpack (likely others too) use error detection codes for uncompressed audio data.
Why not for compressed?
I could find only 2 differences:
-you have to decompress the data to verify correctness, which (sometimes greatly) reduces verification performance
-you have more data to be checksumed, which slightly reduces compression / verification performance
What am I missing?
The checksum is meant to proof that the decoded data is the same as the source (e.g. wav file). A checksum of the encoded data will only proof that the file hasn't changed. Off course you could have booth.
The checksum is meant to proof that the decoded data is the same as the source (e.g. wav file). A checksum of the encoded data will only proof that the file hasn't changed. Off course you could have booth.
AFAIK (not very far) neither FLAC nor WavPack store checksums of the whole audio, but separate ones for each block.
Therefore they aren't useful for comparisons with the source.
FLAC does store the checksum of the whole audio data.
Where did you get the idea it doesn't?
WavPack by default uses blockbased CRC's and, if desired, an MD5 hash of the audio data.
To my knowledge The CRC's are only used for error detection in the audio stream while decoding it.
The MD5 hash is more usefull to verify the entire audiocontent. This could be used in a couple of scenario's:
- When transcoding lossless audio the MD5 hash can be used to verify that the same audiocontent is still there (intact) after the transcode. This can be usefull to detect misbehaving software or hardware.
- It could also be usefull to find duplicates in a large collection. When two MD5 hashes match there is very high chance that the audiocontent is the same.
I hope this is what you were asking ?
After reading your question a little bit more thoroughly I guess what you're asking is: "Why don't they keep an MD5 of the compressed audiocontent instead of the decompressed audiocontent?".
A hash of the compressed audio wouldn't be very useful because most people are much more interested in the integrity of the decompressed audio which cannot be 100% guaranteed by looking at the hash of the compressed audio. I think a hash of the decompressed audio is just much more usefull because it has more usecases and is directly linked to the audiocontent only.
OK, now I get it.
I didn't know that there are 2 kinds of EDC used simultaneously.
Yes, a checksum of whole audio data is indeed useful. Though for corruption detection (for me that's the main point of keeping checksums) a checksum for compressed data would work just as well and way faster.
Thanks a lot for the answer.
What is your goal? You don't need to decode and verify your files every now and then. If you ask me both a checksum of encoded and decoded audio is somewhat useless if you have no reference. AccurateRip is a large database that provides the ability to verify CD-rips using checksums. With CUEtools you can even store verification results to tags.
Silent data corruption.
I verify my music collection once in a while. I had to resign from using WavPack because it took way too long.
I'm considering switching to TAK as soon as there's 2.0, but I'm wondering if verification time won't make it inapplicable too.
Verification time should be disk-bound. I don't think that what you are asking for would change anything, if you're not running a 386.
If it was disk bound, FLAC and WavPack would have roughly the same speed.
But WavPack was several times slower.
Just checked an old comparison (http://members.home.nl/w.speek/comparison.htm). Wavpack 4 really only does about 5 MB/s on an Athlon 800. That's quite lame. I wonder if its later asymmetrical modes could improve that considerably.