Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: 'Audio MD5' component (foo_audiomd5) != 'verify integrity' (Read 650 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

'Audio MD5' component (foo_audiomd5) != 'verify integrity'

Using 'Audio MD5' component (foo_audiomd5) v0.6.5. Resulting MD5 checksum calculated on a FLAC is different to the in-built FB2k 'Verify integrity' option.

I understand the FB2K in-built utility checksum is based only on the audio samples.
Is the Audio MD5 component not? - I tried to determine by changing tags in the file, but the Audio MD5 component's checksum did not change (and it was still different to the 'verify integrity' MD5 checksum)

Re: 'Audio MD5' component (foo_audiomd5) != 'verify integrity'

Reply #1
FLAC computes the MD5 checksum from the raw uncompressed PCM data it sees as it is encoding the file. Audio MD5 component calculates its checksums from the compressed binary data on the disc, but only from the actual audio bits. That is why it uses ffmpeg.exe, it parses the file format and skips all non-audio related bits like tags.

There is nowadays an option in advanced preferences to change the lossless checksum behavior to use the decoded output. With that option enabled the results should match native MD5 fields in various lossless formats.

I finally added some extra info on the component's download page too. The component originated from a request on IRC so people it was made for knew what it is about. The rest of the world not so much.

Re: 'Audio MD5' component (foo_audiomd5) != 'verify integrity'

Reply #2
With that option enabled the results should match native MD5 fields in various lossless formats.
Users may take note of ("instead of worrying about") some differences here.
* Monkey's is an exception - it calculates on the encoded stream, but not the way foo_audiomd5 does.
* FLAC calculates MD5 from a signed little-endian representation of the audio - WavPack uses source file representation. So for WAVE input > 8 bits, and AIFC-sowt input, they agree. When you feed WavPack a big-endian source file (AIFF, AIFC-none, CAF-big) it calculates big-endian without translating, and when you feed WavPack an 8-bit WAVE it calculates unsigned without translating.

Re: 'Audio MD5' component (foo_audiomd5) != 'verify integrity'

Reply #3
Thank you, makes sense.

And I see the option now for "Compute hash from decoded PCM data for lossless formats" and indeed it matches when that is checked.

 

Re: 'Audio MD5' component (foo_audiomd5) != 'verify integrity'

Reply #4
I do not use it for FLAC. Why not - is there any harm to it?
Well for a reason I partially got wrong: I thought it wouldn't report actual corruption that happened before tagging, and it doesn't in the window. I tested it and if I read the console output (that will get you a copypastesearch job to do if you run it on a thousand files) ...

So, a slightly longer explanation on how it is not as reliable as the format's built-in algorithm: What if the file is corrupted already?
Scanning to add the tag won't report corruption in the window (but it will in console).
Scanning to verify using audiomd5 will find that it matches, and won't report that it is actually corrupted (well it will in the console).

The audiomd5 simply isn't as good as FLAC's built-in integrity check. Not knocking it - it is good to have for less safe audio formats.

By not using it for FLAC, I got myself some extra work though: formatting columns and sometimes having to type longer searches like
%audiomd5% MISSING AND %__md5% MISSING
(note the double underscores in the latter, it is a tech field).
And of course with a tag you can see if a file that was already corrupted, has changed once again.