HydrogenAudio

Lossless Audio Compression => WavPack => Topic started by: Brink on 2018-11-27 18:41:00

Title: Can I have more information about how "-m" works?
Post by: Brink on 2018-11-27 18:41:00
From http://www.wavpack.com/wavpack_doc.html we have:

Quote
-m = compute & store MD5 signature of raw audio data

Calculate and display the MD5 checksum of the uncompressed audio data and store it in the compressed file. These sums are commonly used in file trading communities to compare versions of tracks, and as such the sums generated by WavPack match those of FLAC, OptimFROG, Shntool, and get_id3(). They can also be used by WvUnpack during decompression to verify the data integrity of lossless files.

I've decided to create a simple script (Linux/Unix only) to prove this but I'm having different md5 signatures. The script creates a wav on the fly using sox:

Code: [Select]
#!/usr/bin/env bash
sox -c1 -n result.wav synth 00:00:02 whitenoise
md5sum result.wav
wavpack -hh -m -t -v result.wav
wvunpack -v result.wv
wvunpack -o result_unpacked.wav result.wv
md5sum result_unpacked.wav
rm -f result.wav result.wv result_unpacked.wav

Output:

Quote
29e36143187f0a616f27d337fc652e5c  result.wav

 WAVPACK  Hybrid Lossless Audio Compressor Version 5.1.0
 Copyright (c) 1998 - 2017 David Bryant.  All Rights Reserved.

original md5 signature: cc1d6436f019d37dd3591a7733bcb5a4
created (and verified) result.wv in 0.06 secs (lossless, -0.96%)

 WVUNPACK  Hybrid Lossless Audio Decompressor Version 5.1.0
 Copyright (c) 1998 - 2017 David Bryant.  All Rights Reserved.

verified result.wv in 0.02 secs (lossless, -0.98%)

 WVUNPACK  Hybrid Lossless Audio Decompressor Version 5.1.0
 Copyright (c) 1998 - 2017 David Bryant.  All Rights Reserved.

restored result_unpacked.wav in 0.02 secs (lossless, -0.98%)
29e36143187f0a616f27d337fc652e5c  result_unpacked.wav

You can see that the md5 from md5sum from the wav and the uncompressed wv are the same, but why do we have different md5 checksums from md5sum and "original md5 signature"? What am I missing here?
Title: Re: Can I have more information about how "-m" works?
Post by: Case on 2018-11-27 18:46:35
The checksum calculated by audio codecs is for the raw audio data only. Your WAV file has headers that alter the results.
Title: Re: Can I have more information about how "-m" works?
Post by: bryant on 2018-11-27 18:52:20
Ah, beat me to it. Thanks, Case!

You can use the --raw option of wvunpack to output just the audio, and the md5sum of that should match the value reported by wavpack:

Code: [Select]
wvunpack result.wv --raw -o - | md5sum -
Title: Re: Can I have more information about how "-m" works?
Post by: Brink on 2018-11-27 19:07:43
Quote
You can use the --raw option of wvunpack to output just the audio, and the md5sum of that should match the value reported by wavpack

Indeed. Running your snippet:

Quote
21fbcc024f8ce7dc748d8dc1111585bb  result.wav

 WAVPACK  Hybrid Lossless Audio Compressor Version 5.1.0
 Copyright (c) 1998 - 2017 David Bryant.  All Rights Reserved.

original md5 signature: b7f430cb9677a58f42479c5dd6de0b59
created (and verified) result.wv in 0.04 secs (lossless, -0.96%)

 WVUNPACK  Hybrid Lossless Audio Decompressor Version 5.1.0
 Copyright (c) 1998 - 2017 David Bryant.  All Rights Reserved.

verified result.wv in 0.01 secs (lossless, -0.97%)

 WVUNPACK  Hybrid Lossless Audio Decompressor Version 5.1.0
 Copyright (c) 1998 - 2017 David Bryant.  All Rights Reserved.

restored result_unpacked.wav in 0.01 secs (lossless, -0.97%)

 WVUNPACK  Hybrid Lossless Audio Decompressor Version 5.1.0
 Copyright (c) 1998 - 2017 David Bryant.  All Rights Reserved.

unpacked result.wv in 0.01 secs (lossless, -0.97%)
b7f430cb9677a58f42479c5dd6de0b59  -
21fbcc024f8ce7dc748d8dc1111585bb  result_unpacked.wav

So, when using -m and -v options (-v = verify output file integrity after write), in a rough explanation, what it does is similar to what we did here in this snippet?

@Case explanation was really helpful. Do you think it could be somehow inserted into the docs?
Title: Re: Can I have more information about how "-m" works?
Post by: m14u on 2018-11-27 19:23:46
Quote
[...] Do you think it could be somehow inserted into the docs?
From http://www.wavpack.com/wavpack_doc.html we have:

Quote
-m = compute & store MD5 signature of raw audio data

Calculate and display the MD5 checksum of the uncompressed audio data and store it in the compressed file.[...]
Title: Re: Can I have more information about how "-m" works?
Post by: Brink on 2018-11-27 21:02:56
I'm talking about
Quote
Your WAV file has headers that alter the results.
@m14u.

https://en.wikipedia.org/wiki/WAV

Quote
WAV (...) It is the main format used on Microsoft Windows systems for raw and typically uncompressed audio.

In theory, my WAV would be what the documentation says about "uncompressed data", but @Case answer gives more insight about why doing a md5 of a uncompressed wav and -m in wavpack gives different results.
Title: Re: Can I have more information about how "-m" works?
Post by: m14u on 2018-11-27 22:00:22
WAV (...) It is the main format used on Microsoft Windows systems for raw and typically uncompressed audio.
yes, but... chunks, Karl...