WavPack decoding complexity vs FLAC

* FLAC's "-8" is by no means the slowest setting. It is called "--best" because well, "best among what is useful".

Wow, thanks, I did not know that! (Admittedly I could have read the FLAC manual but for some reason, I have only ever read the WavPack manual that well).
That configuration is indeed extremely boring to encode--my laptop took about what, 2 hours to finish encoding that? The funny thing is that, even after waiting for that long, the FLAC encode was only 600 kilobytes smaller than the -hhx6 WavPack encode which took only several seconds. -hx6 is a bit faster than -hhx6 and the compression rate is virtually the same. Now I have bigger respect for WavPack's encode times.

* The above command will not be FLAC "subset".

I tried to make a more "subset-conformant" FLAC encode (-8per8) but that lost to WavPack again. This is really interesting stuff.

* WavPack's -x settings "do not increase decoding complexity"

Yes, I was referring to the -h and -hh settings, which affects decode performance according to the manual. But I did not know of the details so thanks for explaining them!
But reading saratoga's reply, even though FLAC is really light on decode resources with WavPack being slightly heavier, I didn't realize that AAC is that hard to decode, so I think WavPack's -h and -hh, along with other codecs such as Vorbis and Opus, are fine as they are.
[/quote]

Quote from: saratoga on 2023-02-05 20:35:30

Both reference encoders can re-encode in-place, so if you first are just ripping CDs you can use a fast setting and then you can run re-encodes overnight / over week-end if that is what you want.

I am using -x6b24 -cc for my music collection, but I'm considering reencoding to -hx6b24...

Thank you very much for the reply!

Re: WavPack decoding complexity vs FLAC

Reply #11 – 2023-02-07 06:39:13

FLAC is about 10-13 MHz on a slow ARM CPU for real-time decoding while wavpack highx6 was up to 50 MHz. FLAC is several times faster, but both are stupidly fast given that modern devices have multiple CPUs running at GHz frequencies.

Especially since I didn't know that AAC requires ~200MHz just to decode, this data puts things into perspective. Thank you for sharing this!

Re: WavPack decoding complexity vs FLAC

Reply #12 – 2023-02-07 06:41:01

Quote from: bryant on 2023-02-06 17:12:43

To answer the OP’s specific question, the difference between the WavPack modes is the number of decorrelation passes made. The WavPack decorrelator works by making successive passes over the audio samples using different filters. The number of passes varies from 2 in “fast” mode to 16 in “very high” mode (the default and “high” mode use 5 and 10, respectively). Obviously this extra work costs CPU cycles for both encode and decode (the passes are done in reverse order during decode to “undo” the decorrelation).

Thank you very much for the explanation! I will have to read the source code for the technical details. I like learning about audio compression technologies and this is really interesting.

Re: WavPack decoding complexity vs FLAC

Reply #13 – 2023-02-07 10:29:51

Quote from: kylxbn on 2023-02-07 06:39:13

Especially since I didn't know that AAC requires ~200MHz just to decode, this data puts things into perspective. Thank you for sharing this!

Strange again. Here are what I got with this two-disc album encoded into a single file.
https://www.discogs.com/release/2452858-Andrew-Lloyd-Webber-The-Phantom-Of-The-Opera

Code: [Select]

System:
  CPU: 12th Gen Intel(R) Core(TM) i3-12100, features: MMX SSE SSE2 SSE3 SSE4.1 SSE4.2
  App: foobar2000 v1.6.16
Settings:
  High priority: no
  Buffer entire file into memory: yes
  Warm-up: no
  Passes: 1
  Threads: 1
  Postprocessing: none

WavPack fast x4 (713kbps):

Code: [Select]

Opening time: 0:00.000
Decoding time: 0:11.177
539.596x realtime

AAC-LC (195kbps):

Code: [Select]

Opening time: 0:00.001
Decoding time: 0:02.763
2181.658x realtime

Opus (195kbps)

Code: [Select]

Opening time: 0:00.000
Decoding time: 0:12.665
476.231x realtime

flac -8 (684kbps)

Code: [Select]

Opening time: 0:00.000
Decoding time: 0:04.554
1324.298x realtime

Not just a processor architecture thing I suppose?

Re: WavPack decoding complexity vs FLAC

Reply #14 – 2023-02-07 10:56:55

Quote from: kylxbn on 2023-02-07 06:37:20

my laptop took about what, 2 hours to finish encoding that?

I stacked up with a ton of slowdowns. Not at all intended to be useful. You can probably come within some kilobytes in a fraction of the time.
Here is roughly how FLAC does these things: Except the -l switch for the maximum prediction order, the variations in "complexity" (i.e. decoding time) are peanuts. FLAC will instead try several "simple" encodes and pick the best.
Unlike WavPack's decorrelation passes as explained by bryant, where every pass squeezes out more by decorrelating (provided that there are more patterns to be found - white noise won't have any of course!), FLAC will try a different one instead and choose that if it improves.
(Reference) FLAC does that along several dimensions - or can do so:

* Stereo decorrelation. Took me years to realize what FLAC (the format) actually does, but it is simple and smart for a brute-force: several codecs (including WavPack I think?) will run dual mono, and mid+side joint stereo and pick the best. But if you have to encode dual mono, you have to encode left and right separately; and every now and then, one of the channels is more compressible than the mid. FLAC can then store side + smallest {left, mid, right}. FLAC can also force dual-mono, and there is also a "smart faster" -M switch that ... uhm, that nobody uses I guess.
For multi-channel, the codecs do quite different things. FLAC MUST use dual-mono for non-stereo (big surprise to me given that FLAC isn't bad on multi-channel, I thought it would still decorrelate the main channels because why not). WavPack can group them together as pairs. TAK uses some (apparently smart!) heuristics to get a good correlation matrix, which explains why TAK absolutely slays the competition at multichannel (though it has not implemented >6). It costs time - but not decoding!
* Windowing function. The weighting of the signal. You saw those "four" functions I gave? They are more; "subdivide_tukey(7)" removes fractions of the signal (to get rid of statistical outliers), and it makes several functions (I think seven that each remove 1/7th, seven that include 1/7th and then one for the whole thing, but I could be wrong). But it recycles calculations so much that it is not at all the same time-consumption as calculating a ton of them.
FLAC doesn't brute-force this all the way down to the encode. It actually has a guesstimation procedure that picks the "hopefully best" without encoding the residuals (I think ...?) and then does that one properly.
* Rice partitioning. Once the max has been set, FLAC will calculate using the finest partition and then see if it saves space to merge.
* "-p" and "-e" will tell the encoder to brute-force the model used among two different dimensions. -p is the most interesting for CDDA: it is supposed to trade off the predictor precision: number of bits used, vs the goodness of the prediction. In principle, savings should be very small. In practice, the successive roundoffs may actually by chance happen to yield a better-fitting predictor (in addition to saving space). This is because the least-squares optimization used to calculate the predictor vector is not "optimal" in terms of size; least-squares minumum is very well associated with "size minimum", but not completely so. More here https://hydrogenaud.io/index.php/topic,120158.0.html
-p does not increase complexity. It just changes the predictor vector to something with fewer bits.

ktf will probably have to correct several of my misunderstandings (again!) but this is at least some of the essence of how FLAC instead of layering up with complexity, tries different shots of "equal complexity" and picks the best. For prediction order (actual order, not max order!) it might affect complexity. And also FLAC can give up on a noisy signal and store it uncompressed.

Quote from: kylxbn on 2023-02-07 06:37:20

the -hhx6 WavPack encode which took only several seconds. -hx6 is a bit faster than -hhx6 and the compression rate is virtually the same. Now I have bigger respect for WavPack's encode times.

Well to nuance that:

* I picked some nonsensically slow FLAC option set. Don't judge a codec by its most stupid setting. (That is actually one reason for FLAC to tout its presets - you often see people asking "I want the the absolute maximum compression, what is it?" and the answer is "No, you don't want that!")
That is also a reason why TAK dropped the -p5 that was around in the test version. Knowing that TAK would be judged for the performance of the most extreme setting, Thomas Becker chose to only include the most extreme "reasonably fast" setting.
Also WavPack has included a quite extreme one in -x6. -hx4 or -hhx4 offer much better value for money.

ffmpeg's WavPack encoder (don't use it, it only covers file format version 4 and has some quirks to it) has -compression_level 0 to 8. Try -compression_level 8 and compare to wavpack -hhx6. It will be slightly smaller (for one thing, it lacks the WavPack 5 block checksums), but the time taken is not worth it.

WavPack is more complex and it is only expected that this pays off in terms of size. Notice that WavPack is an older algorithm (the file format was revised later though) than the others. For a nineties construction it is damn good, and the "-x" switches showed that it could be improved upon without breaking compatibility.

Quote from: kylxbn on 2023-02-07 06:37:20

I am using -x6b24 -cc for my music collection, but I'm considering reencoding to -hx6b24...

As a FLAC user I would always want the "-m". Even if you use WavPack 5 with the block-level checksums (which can be verified without decoding using wvunpack -vv), I would use the MD5 even if only to identify the files.

Re: WavPack decoding complexity vs FLAC

Reply #15 – 2023-02-07 10:58:27

Quote from: bennetng on 2023-02-06 11:20:24

Not just a processor architecture thing I suppose?

Partly. You're comparing a processor that has vector floating point units (SSE, AVX) which can do 8 floating point operations per instruction vs a CPU that doesn't even seem to have any floating point hardware. It might very well be a fixed-point implementation of an AAC decoder is much slower, while FLAC (and I presume WavPack too) is already fixed point to begin with.

Re: WavPack decoding complexity vs FLAC

Reply #16 – 2023-02-07 11:02:28

Here is an example that flac beats wavpack hhx6 with a 16-bit 96kHz dual mono chiptune without using --lax. With --lax it can even be smaller. Of course, it may not be the case with other chiptune files, 24-bit or not.

I want to play that game too

1 868 458 bytes within subset (yours: 2 001 985)
Both figures after applying metaflac --dont-use-padding --remove-all to get everything equal.

(1 812 709 outside subset)

Edit: Chiptune signals are very peculiar indeed, and apparently weren't high on codec developers' radars:
1 900 633 for OptimFROG at --preset max --md5
2 030 216 for WavPack -hhmx6
2 187 250 for TAK -p4m -md5
2 499 712 for Monkey's "Extra High" (beating the 2 615 760 "Insane" which is also bigger than "High")

Re: WavPack decoding complexity vs FLAC

Reply #17 – 2023-02-07 11:08:06

Quote from: kylxbn on 2023-02-07 06:39:13

Especially since I didn't know that AAC requires ~200MHz just to decode

Strange again. Here are what I got with this two-disc album encoded into a single file.

AAC-LC

Quote from: ktf on 2023-02-07 10:58:27

Not just a processor architecture thing I suppose?

Partly. You're comparing a processor that has vector floating point units (SSE, AVX) which can do 8 floating point operations per instruction vs a CPU that doesn't even seem to have any floating point hardware. It might very well be a fixed-point implementation of an AAC decoder is much slower, while FLAC (and I presume WavPack too) is already fixed point to begin with.

And it is ~200 MHz in aforementioned Rockbox test for AAC-HE, not LC. For LC it is ~70 MHz.

Re: WavPack decoding complexity vs FLAC

Reply #18 – 2023-02-07 11:28:41

So here is HE-AAC SBR (61kbps), same test conditions as previous post:

Code: [Select]

Opening time: 0:00.001
Decoding time: 0:04.640
1299.489x realtime

Re: WavPack decoding complexity vs FLAC

Reply #19 – 2023-02-07 13:51:33

Here is roughly how FLAC does these things: Except the -l switch for the maximum prediction order, the variations in "complexity" (i.e. decoding time) are peanuts. FLAC will instead try several "simple" encodes and pick the best.
Unlike WavPack's decorrelation passes as explained by bryant, where every pass squeezes out more by decorrelating (provided that there are more patterns to be found - white noise won't have any of course!), FLAC will try a different one instead and choose that if it improves.

Your knowledge about this topic really amazes me. Thank you very much for taking the time to explain stuff--I feel guilty since I shouldn't be spoon-fed information like this ^_^; That was really easy to understand, maybe partially because I have a broad but non-detailed understanding about the topic.

* I picked some nonsensically slow FLAC option set. Don't judge a codec by its most stupid setting.

Ah, yes, of course. I was just amazed that a 2-hour encode of a 3-minute song only managed to compress 600 kilobytes more compared to a 30-second encode. I was always intrigued by how FLAC at -8 encodes audio in a couple of seconds while WavPack -mx6b24cc (my current collection) always takes somewhere around half a minute per song. But knowing that spending 2 more hours does not improve things by a significant margin, I just came to the conclusion that WavPack's encode speed is acceptable as it is. Just like bryant said, we can do an arbitrary number of decorrelation passes so nothing is stopping us in making 256 passes and waiting for a 2-hour encode, but... I think I might go with -hx4m (or -x6 since the difference isn't huge) when I reencode my collection. I plan to stop using lossy / hybrid mode since it increases file sizes by a noticeable amount (around 1 megabyte per song) so I'll just use Vorbis when I transfer songs to my portable devices (I actually want to use Opus but I'm wary of the patent pool stuff going on. Unless things get clearer and Fraunhofer steps out of the way, I'm a bit cautious of using Opus) Besides, mobile players just ignore the replaygain tags I have on my WavPack files so I guess I'll just apply the gain when converting to a lossy format.

ffmpeg's WavPack encoder (don't use it, it only covers file format version 4 and has some quirks to it) has -compression_level 0 to 8. Try -compression_level 8 and compare to wavpack -hhx6. It will be slightly smaller (for one thing, it lacks the WavPack 5 block checksums), but the time taken is not worth it.

I try to use official encoders when possible. This is one reason why I don't use any of the other codecs even though they offer better compression ratios--because I avoid non-GPL programs. (I am not implying that ffmpeg is closed-source. Just that I avoid unofficial and non-GPL implementations.) I want to be able to access/play my files even after the apocalypse. So no OptimFROG or TAK for me. I have always used the -m flag when encoding for my collection but I never compared things with the ffmpeg implementation so that will definitely be fun.

As a FLAC user I would always want the "-m".

Yes, checksums for the win! I also use Btrfs as my filesystem so hard drive faliure or corruptions should be noticed if happening, especially since I'm too poor to buy a back up hard disk.