Print Page - Why checksum uncompressed audio?

Title: Why checksum uncompressed audio?
Post by: _m²_ on 2009-10-25 07:06:26

FLAC and Wavpack (likely others too) use error detection codes for uncompressed audio data.
Why not for compressed?

I could find only 2 differences:
-you have to decompress the data to verify correctness, which (sometimes greatly) reduces verification performance
-you have more data to be checksumed, which slightly reduces compression / verification performance

What am I missing?

Title: Why checksum uncompressed audio?
Post by: stigc on 2009-10-25 08:12:56

The checksum is meant to proof that the decoded data is the same as the source (e.g. wav file). A checksum of the encoded data will only proof that the file hasn't changed. Off course you could have booth.

Title: Why checksum uncompressed audio?
Post by: _m²_ on 2009-10-25 15:53:38

Quote from: stigc on 2009-10-25 08:12:56

The checksum is meant to proof that the decoded data is the same as the source (e.g. wav file). A checksum of the encoded data will only proof that the file hasn't changed. Off course you could have booth.

AFAIK (not very far) neither FLAC nor WavPack store checksums of the whole audio, but separate ones for each block.
Therefore they aren't useful for comparisons with the source.

Title: Why checksum uncompressed audio?
Post by: kjoonlee on 2009-10-25 17:05:00

FLAC does store the checksum of the whole audio data.

Where did you get the idea it doesn't?

Title: Why checksum uncompressed audio?
Post by: Bylie on 2009-10-25 18:18:15

WavPack by default uses blockbased CRC's and, if desired, an MD5 hash of the audio data.

To my knowledge The CRC's are only used for error detection in the audio stream while decoding it.
The MD5 hash is more usefull to verify the entire audiocontent. This could be used in a couple of scenario's:

When transcoding lossless audio the MD5 hash can be used to verify that the same audiocontent is still there (intact) after the transcode. This can be usefull to detect misbehaving software or hardware.
It could also be usefull to find duplicates in a large collection. When two MD5 hashes match there is very high chance that the audiocontent is the same.

I hope this is what you were asking ?

After reading your question a little bit more thoroughly I guess what you're asking is: "Why don't they keep an MD5 of the compressed audiocontent instead of the decompressed audiocontent?".

A hash of the compressed audio wouldn't be very useful because most people are much more interested in the integrity of the decompressed audio which cannot be 100% guaranteed by looking at the hash of the compressed audio. I think a hash of the decompressed audio is just much more usefull because it has more usecases and is directly linked to the audiocontent only.

Title: Why checksum uncompressed audio?
Post by: _m²_ on 2009-10-25 20:35:24

OK, now I get it.
I didn't know that there are 2 kinds of EDC used simultaneously.
Yes, a checksum of whole audio data is indeed useful. Though for corruption detection (for me that's the main point of keeping checksums) a checksum for compressed data would work just as well and way faster.

Thanks a lot for the answer.

Title: Why checksum uncompressed audio?
Post by: odyssey on 2009-10-25 21:28:12

What is your goal? You don't need to decode and verify your files every now and then. If you ask me both a checksum of encoded and decoded audio is somewhat useless if you have no reference. AccurateRip is a large database that provides the ability to verify CD-rips using checksums. With CUEtools you can even store verification results to tags.

Title: Why checksum uncompressed audio?
Post by: _m²_ on 2009-10-26 13:54:33

Silent data corruption.
I verify my music collection once in a while. I had to resign from using WavPack because it took way too long.
I'm considering switching to TAK as soon as there's 2.0, but I'm wondering if verification time won't make it inapplicable too.

Title: Why checksum uncompressed audio?
Post by: rpp3po on 2009-10-26 14:15:08

Verification time should be disk-bound. I don't think that what you are asking for would change anything, if you're not running a 386.

Title: Why checksum uncompressed audio?
Post by: _m²_ on 2009-10-26 16:17:24

If it was disk bound, FLAC and WavPack would have roughly the same speed.
But WavPack was several times slower.

Title: Why checksum uncompressed audio?
Post by: rpp3po on 2009-10-26 16:48:48

Just checked an old comparison (http://members.home.nl/w.speek/comparison.htm). Wavpack 4 really only does about 5 MB/s on an Athlon 800. That's quite lame. I wonder if its later asymmetrical modes could improve that considerably.

HydrogenAudio

Lossless Audio Compression => Lossless / Other Codecs => Topic started by: _m²_ on 2009-10-25 07:06:26