Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Comparing lossless codecs, with or without MD5 (Read 2188 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Comparing lossless codecs, with or without MD5

Hi all,

As some of you may know, I've done a few comparisons of lossless audio codecs of the years, and as FLAC, TAK, WavPack, Monkey's Audio and refalac have had new releases, it seems time for an update. I'm preparing, but it might take a while. Running all encoders and decoders over the whole corpus takes more than a week.

I've known in the past that for the fastest among of the codecs, MD5 sum calculation is a major part of computation, it was mentioned in revision 3 and 4. I've attached a graph from that section below:

X

As you can see, for FLAC 1.3.0 decoding sped up by 25% with checksumming disabled. As MD5 calculation is pretty much impossible to improve upon and decoder and encoder have improved over the past few years, this difference has only grown.

In the past, I've always taken the default settings of a codec for this. As far as I know, this means checksumming for FLAC was always on, and checksumming for TAK and WavPack was always turned off. I'm uncertain about other codecs: I've heard something about Monkey's Audio checksumming, but the command line tool doesn't seem to mention it in its built-in help.

>> I would like to hear the opinion of board members on this: should I continue comparing codecs with their default settings? Should I switch checksumming off for all codecs? Should I turn checksumming off for all codecs? <<

I think there are pros and cons for each. The default is what most users will use, but it might not be very transparent. Also, checksumming on decoding is probably something that is only done on testing or decoding to a file, and is disabled on playback. So, when using the default, decoding speed might not mirror decoding speed on playback.

Enabling for all codecs is not possible for all codecs, because not all codecs do checksum. Disabling for all codecs is not completely fair either, because for FLAC this is only possible with an undocumented option.


Music: sounds arranged such that they construct feelings.

Re: Comparing lossless codecs, with or without MD5

Reply #1
I'd say, if you're benchmarking decoding speed, and widespread players/decoders don't use MD5 - disable it. Use it only when benchmarking verification/testing of files, maybe transcoding etc...

Re: Comparing lossless codecs, with or without MD5

Reply #2
A few thoughts not too well ordered:

* Are MD5 times (I mean, extra time) about the same across encoders and decoders? Like, you have 450 seconds of music in that test up there, flac -8 decodes it in a second, takes +.29 to MD5, while TAK -0 takes 1.4 seconds and maybe slightly more to MD5 ... ?
(Here is where time is a superior measure to speed, I think. Also, user waits in seconds, so it is more relevant.)
But if encoders are about equally fast, you could just plot against time and say "MD5 will add approximately xx seconds."?

* RAM disk speeds are not attainable in real-world situations anyway, and switching off MD5 will make it even more unrealistic for FLAC users in particular. Not that anyone who knows you around here will distrust your tests, but if I were a FLAC dev I would be a bit cautious to highlight speeds that are even further from realistic, by applying an undocumented command that users should not.
If you include both, then fine.
And maybe you could even cite FLAC with and without - at least for decoding it will be the fastest anyway, so an extra graph will not mess up the visuals anyway - and then explain that adding MD5 to TAK/WavPack (provided WavPack MD5 workload is about the same in seconds, it can be switched on and off so that is testable) would slow them down about the same as the difference measured with FLAC?

* Monkey's has an internal checksum for the encoded data, and apparently the algorithm is MD5. With a compression ratio not far from fifty percent, is it reasonable speculation that it takes about half the time as on the PCM? As far as I understand, it cannot be switched off upon encoding, but I don't know if it is always employed upon decoding.

* OptimFROG can --check a file against MD5. But as far as I understand, it will only do that as a verification, so users who actually want it decoded with verification, must do it twice. Not much sense in a "decoding with verification" time then.
(And if you want to verify the .ofr file, you can do the fast verification that doesn't decode.)

In the past, I've always taken the default settings of a codec for this.
Sure you remember correct here?  ;)
Your revision history (page 31 in revision 3 and page 30 in revision 4) says that "FLAC’s encoding and decoding is now with the undocumented –no-md5-sum"


Re: Comparing lossless codecs, with or without MD5

Reply #3
If playback doesn't use MD5 (players don't use flac.exe for decoding), then that's the speed people see 99% of the time when dealing with FLACs...

Re: Comparing lossless codecs, with or without MD5

Reply #4
If playback doesn't use MD5 (players don't use flac.exe for decoding), then that's the speed people see 99% of the time when dealing with FLACs...
Hi there rutra80, how's it goin'?
What is the opposite of music?

 

Re: Comparing lossless codecs, with or without MD5

Reply #5
In the past, I've always taken the default settings of a codec for this.
Sure you remember correct here?  ;)
Not anymore now, no. Apparently I misremembered that detail.

So, it seems revisions 3 and 4 disabled MD5 sum computation for FLAC. I can say for certain revision 5 did not: all codecs are at their default settings. For future reference, I'll dump my scripts here (should have done that much earlier). I should have done that for revision 3 and 4 too, now I can't seem to find what I did back then.

The scripts haven't been 'cleaned up' or anything, and they're not to my usual coding standards (i.e. they're pretty messy but they work). I just post them for future reference.
Music: sounds arranged such that they construct feelings.