Skip to main content


Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: "Tested": codecs for the effect of stereo decorrelation (mid/side) (Read 15614 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

"Tested": codecs for the effect of stereo decorrelation (mid/side)

So after I made a quite thoughtless across-tracks-decorrelation experiment, and played a bit back and forth with different codecs on that to sort out my confusion, I thought, why not run a test on some files I've already thrown at @ktf's FLAC betas.

I was in particular curious about one thing: OptimFrog's claim to using a smarter decorrelation. Turns out, OptimFrog and Monkey's and TAK can get more out of stereo on CDDA, than do WavPack-less-than-x4 and FLAC - but that is not the case for 96/24. In any case, it cannot explain OptimFrog's small filesizes; rather they are probably because the format is generally complex and puts your CPU at work to keep you warm during the winter.

The results should be interpreted with so much caution that I initially thought they might not be useful unless something striking would show up. Say, here you cannot expect results to go in the same direction:
If encoder X spends more time than encoder Y getting stereo file x smaller than stereo file y, we cannot tell whether it is more "efficient" or just spends more effort searching for patterns. We don't even know the theoretical compressibility of the L-R difference signal.

Anyway, I think we can take home a few findings:
* TTA cannot do mono! Yeah it can handle multi-channel, but cannot read input files that are mono .wav  *shrugs*
* The small-files compressors OFR/TAK/MAC compress both the mono well and the stereo well. OFR and TAK's heavier modes increase the stereo diff.
* WavPack at x4: Consistent with tests at , WavPack "needs" x4 to compress hirez well. Maybe it is to get differences in the ultrasonic octave compressed?
* FLAC. The beta implements double-precision calculation that improves quite a bit, especially for higher-rez. An reasonable speculation on the "good" stereo reduction for stock 1.3.1, could be that it compresses away some of the effects of a "bad roundoff" to single precision that makes for more digits common to all channels. Bad common roundoff to zeroes in common -> can in part be compressed.
* FLAC's "-M" does pretty well. With that switch it does not fully calculate L/R vs mid/side before deciding which one to use.

Columns: Mono size (my locale uses comma for decimal separator), stereo gain in ppm; then stereo gain per sub-corpus. Not displayed: gains per file to get an idea of per-file overhead, see instead the first FLAC column, and consider that there were 71 + 42 + 1 file(s).

mono GB2chdiffppm.CDDA, rockhirez, rockhirez, jazz/cl.
FLAC  irls-2021-09-21 -8 --no-mid-side6,21580.783512468
FLAC  irls-2021-09-21 -8 -M6,215 500.12 2903 0731 864
FLAC  irls-2021-09-21 -86,216 485.13 4594 4322 312
FLAC  irls-2021-09-21 -56,266 608.13 6704 5482 364
FLAC 1.3.1 -86,387 940.13 5678 1552 707
FLAC 1.3.1 -56,448 274.13 9788 7182 747
WavPack -f6,584 744.11 9793 077−45
WavPack default6,404 953.10 9673 994546
WavPack -hx16,275 782.16 2841 987199
WavPack -hhx46,2415 832.18 10522 9986 667
Monkey's normal6,296 923.17 6913 403828
Monkey's insane6,226 931.16 8874 016957
TAK -p26,156 907.17 9942 7351 176
TAK -p4m6,097 440.18 4533 3891 654
OFR --preset 26,066 457.17 2861 7851 456
OFR --preset 105,987 604.18 6573 7061 631

Corpus in more detail:
The first two columns are the corpus from .
The "hirez jazz/cl." is one file where I merged together 106 minutes 96/24 from , jazz/classical acoustic recordings sometimes recorded in multi-ch and downmixed. Same file as mentioned at the bottom of .

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #1
Could you please elaborate a bit on the columns? I'm too dumb to figure that "ppm" meaning :)

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #2
ppm = "parts per million". 1/10k of a percentage point.

So for the biggest overall effect, WavPack -hhx4, the mono files are 624/1061 of WAV size, that is 58.8 percent; the stereo files are around 1.6 less, 57.2 percent of WAV size.

Here you got the same with different formatting, where mono filesizes are in percent of .wav, and where the differences are in percentage points:
mono compression2chdiff in pctptsCDDA rockhirez rockhirez jazz/cl.
FLAC  irls-2021-09-21 -8 --no-mid-side58.5%
FLAC  irls-2021-09-21 -8 -M58.5%0.551.230.310.19
FLAC  irls-2021-09-21 -858.5%0.651.350.440.23
FLAC  irls-2021-09-21 -559.0%0.661.370.450.24
FLAC 1.3.1 -860.2%0.791.360.820.27
FLAC 1.3.1 -560.7%0.831.400.870.27
WavPack -f62.0%0.471.200.310.00
WavPack default60.4%0.501.100.400.05
WavPack -hx159.1%0.581.630.200.02
WavPack -hhx458.8%1.581.812.300.67
Monkey's normal59.3%0.691.770.340.08
Monkey's insane58.6%0.691.690.400.10
TAK -p258.0%0.691.800.270.12
TAK -p4m57.4%0.741.850.340.17
OFR --preset 257.1%0.651.730.180.15
OFR --preset 1056.4%0.761.870.370.16
It may be surprising to see WavPack -hhx4 not out-compress FLAC, but that is because most of the corpus is high sample rate where WavPack doesn't shine as much and where the new FLAC beta improves a lot.
WAV file sizes:
CDDA rock: 3.05 GB (5h09min)
hirez rock: 3.4 GB (1h47)
hirez jazz/cl.: 3.4 GB (1h46)

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #3
2chdiff - is that full file but stereo, or is that only the extracted mid/side or l/r difference signal?

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #4
That is the difference
one file in stereo - (one file for the left channel + one file for the right channel)

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #5
Hmm the question is if codecs treat the difference as separate signal and compress it separately, or is it somehow used for predictors etc. If the latter, then I'm not sure if compressing the difference signal alone tells much - it's not music and predictors aren't tuned for it... It somehow resembles analysing lossy codecs by listening to difference signal...

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #6
No, not "difference" as in difference signal - as difference in size.

What I did, was I split a stereo file file into a left channel file and a right channel file.
Compressed left channel file and right channel file. That is a safe way to get "dual mono" of the same audio.
Compressed the original file too, with the same setting.

Then a measure of how much use the encoder makes of channel correlation, is: how many percent does it gain when it can look at both?
A measure, but I didn't say it was a precise one. But FWIW I think it says something about the FLAC revision, about some WavPack settings - and, it suggests that OptimFrog's secret doesn't lie in exceptional handling of stereo, but rather in throwing heavy artillery at every signal.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #7
Quite clear results. I only know the FLAC format very well, I just looked up Wavpack and Monkeys Audio. Put simply, it seems WavPack and Monkeys Audio only implement a conversion to mid-side audio, while FLAC also has stereo decorrelation modes called left-side and right-side. To me it seems that for WavPack and Monkeys Audio, though I'm not sure about Monkey's Audio, that either left and right or mid and side channels are treated separately after either converting from left-right to mid-side or not.

So, apparently, the gain is not in the stereo decorrelation but in the way a mid or a side channel can be compressed. The best explanation I can come up with (but please note this is purely guesswork) is that FLAC is less equipped to deal with small signals that might occur in the mid channel of highly-correlated stereo. This would also explain why FLACs benefit is only present for 16-bit (CDDA) material and not for 24-bit signals.
Music: sounds arranged such that they construct feelings.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #8
Not sure what are the right guesses CDDA vs hirez ... one thing is that a big part of hirez is uncorrelated noise. (Also I have not timed these. No info here about how much more effort froggy puts in stereo than in mono, for example.)

Anyway, here is the CDDA-only part, and where I have added columns that compare each mono to the FLAC beta at -8 mono, and each stereo to new -8 stereo. The final column is, say, [t](filesize FLAC -8 stereo minus Monkey's insane stereo) minus (filesize FLAC -8 mono minus Monkey's insane mono)[/t], that is: We know that Monkey's insane compresses more than FLAC does, and that difference is how much bigger in stereo than in dual mono?

CDDA part onlymono compressionstereo compression2chdiff in %pts1ch vs new FLAC -82ch vs new FLAC -8diff previous two
FLAC  irls-2021-09-21 -8 --no-mid-side67.2%67.1%0.080.00−1.27
FLAC  irls-2021-09-21 -8 -M67.2%66.0%1.230.00−0.12
FLAC  irls-2021-09-21 -867.2%65.9%1.350.000.00
FLAC  irls-2021-09-21 -567.6%66.2%1.37−0.34−0.32
FLAC 1.3.1 -867.3%65.9%1.36−0.08−0.07
FLAC 1.3.1 -567.7%66.3%1.40−0.46−0.40
WavPack -f68.7%67.5%1.20−1.47−1.62−0.15
WavPack default67.4%66.3%1.10−0.20−0.45−0.25
WavPack -hx166.9%65.3%1.630.320.600.28
WavPack -hhx466.7%64.9%1.810.551.010.46
Monkey's normal66.1%64.3%1.771.151.570.42
Monkey's insane65.1%63.4%1.692.122.470.34
TAK -p266.3%64.5%1.800.951.400.45
TAK -p4m65.8%63.9%1.851.471.970.50
OFR --preset 265.4%63.7%1.731.812.190.38
OFR --preset 1064.4%62.5%1.872.823.340.52
We see that getting a stereo signal, will enable the higher-compressing codecs to increase their compression advantages over FLAC, that is not unexpected; and for WavPack, TAK and OptimFrog (but not Monkey!) the higher modes do even better.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #9
One bad-ass track, for what it is worth:
Merzbow: "I Lead You Towards Glorious Times".

There are genres that messes up good codecs, and this is the least compressible CDDA track in my collection. Some compression tests here. ("my final" test, my ass ... too curious. Rehab is for quitters.)

Anyway, because this is noise, one would not expect much help from stereo - indeed, tests show that the stereo file isn't much compressible from PCM, so there cannot have been much eh? Still codecs behave different. These are sorted by stereo size (mono size maintains that order except between LA -high and -high -noseek).

stereo kBmono kBstereo − monoleft − right
I tried to get three settings from each encoder: normal, high and maximal. OFR 10 was chosen as maximal by mistake, and kept as high when I discovered --preset max, and then ... Other choices were a bit ... arbitrary. The new FLAC beta produces bit-identical audio to 1.3.1, so the new "-9 -p -e" was a candidate for max (it didn't squeeze much out).

*The file fools OptimFrog's --preset max big time. And the LA's come out in the wrong order.
* I've known since long that TAK doesn't like this piece of music, and Monkey's is even worse. Those return bigger files than the WAV. But TAK in the very least can utilize stereo.
* Indeed less than half these files can get help from stereo here. FLAC does, as these presets (which include -m) pretty much brute-force searches the stereo options. TAK does even better. Two OFR modes do, so here there actually might be some support to froggy claims that it can make pretty good sense out of stereo.
* The two good WavPacks and the two good OFRs disagree with everything else about what channel should be smallest. Maybe because they are the only to make good sense out of the right channel.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #10
Sorry I’m a little late to respond here, but I have been following along. Thanks for analyzing this, it’s definitely interesting to see how the different compressors respond to stereo! I also was a little confused at first on what the columns meant, but you clarified it nicely.

I am only really familiar with how WavPack handles stereo, and how the “extra” modes work, so I’ll clarify that a little which will hopefully add something to the discussion.

In the “fast” and “normal” modes the default behavior (as was guessed) is just converting left-right to mid-side, and then treating the two channels completely independently. It can be turned off for comparison (-j0), but it’s almost always better. All of the “extra” modes check to make sure mid-side is improving things.

There is obviously still going to be some correlation between the channels even after mid-side encoding, and so the “high” and “extra high” modes take advantage of this. The filters with negative term values (-1, -2, and -3) employ this “cross-correlation”.

As for the “extra” modes, when I created the filters that are available at levels -x1 to -x3, there was very little high-resolution material out there (I think I had three tracks I captured somehow from a DVD) and so I didn’t use that in my corpus. Everyone was just comparing compression using CD audio and so I optimized for that.

The higher modes (-x4 to -x6) create new filters from scratch, so it makes perfect sense to me that those would be best for high-resolution (they have no preconceived notions).

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #11
So I knocked TTA for the wrong reason:
a few findings:
* TTA cannot do mono!
Yes it can do mono! What this TTA version refuses to handle, are ffmpeg-generated .wav files.

So I tested it. Same corpus. First table, the three rightmost figures are unsurprising, not far from far WavPack default or -hx1: sixteen thousand ppm, five thousand ppm, and three hundred ppm.
But the Merzbow mono files fooled it. The monos sum up to 23 kbit/s worse than Monkey's normal, while stereo is 25 better.  Mono: worst by far, stereo: between flac -5 and wavpack default.  So it is a signal that fools it, and luckily it is a mono (less interesting) such that stereo finds what to do about it.

I also was a little confused at first on what the columns meant, but you clarified it nicely.
No wonder for confusion when I cannot even make my mind up on whether size s(h)avings should be positive of negative numbers.

And ... :
while FLAC also has stereo decorrelation modes called left-side and right-side
It was only after reading this it dawned for me that left-side and right-side are stereo decorrelation strategies - not weird channel configurations that FLAC chose to support. Explains my ignorant comment here.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #12
It was only after reading this it dawned for me that left-side and right-side are stereo decorrelation strategies - not weird channel configurations that FLAC chose to support.

Yes. It seems most lossless formats only do either left & right or mid & side encoding, but FLAC can also choose to encode left & side or right & side. This is beneficial if there is some form of stereo correlation, but the resulting mid channel is more complex to encode than either left or right.
Music: sounds arranged such that they construct feelings.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #13
Tested: 7.1 files - as they are and with channels exported to quad+quad or to stereo+stereo+stereo+stereo
Purpose: Effects of "multi-channel" decorrelation

... also purpose: lobbying at @TBeck for TAK to do 7.1, as it would slay anything that isn't awfully slow. I have included the new TAK beta too!

Caveat: who knows what a representative 7.1 signal looks like - these files turned out to something OptimFROG doesn't like, which is uncommon.

There are some Dolby Digital trailer files at . I retrieved the 7.1 files, and remuxed the Dolby TrueHD audio streams to .mka. Note to those who want to play with the same thing: ffmpeg -acodec copy picks only the first audio stream, but luckily the lossless stream was first on all, saving me work.
Deleting a duplicate I was down to 18 files, a total of no more than 22 minutes; all but one are 48/24 but with lots of wasted bits.
Exported each to .wav in three ways:
* as-is: 7.1
* two quad files: "front" channels FL+FR+FC+LFE in one file, and "side/back" channels BL+BR+SL+SR in another (for TAK to handle it!)
* four stereos, in order 0&1, 1&2, 3&4, 5&6, 7&8

... and then let encoders run for a couple of nights.

A major surprise emerged after OptimFROG'ing the stereos: this is material where OptimFROG performs worse than FLAC.
A not so big surprise: Monkey's performs bad - due to not utilizing wasted bits. That means, a 16 bit sample in a 24 bit container (padded with zeroes) compresses much worse than 16 in 16.  Flac, WavPack and TAK compress them as good as 16 in 16; MPEG-4 ALS does so with the "-l" switch.
Several of these files appear to have parts where fewer than 24 bits are at work - but not necessarily so during the entire file.

Why all these ALS settings tested?
-t# is ... well what is it? Help file says it is "two modes", joint stereo and #channel decorrelation, where channels must be a multiple of the number. I take it that in a 7.1, -t4 means it tries joint stereo and tries 4ch groupings and picks the best.
In the very least, it gives an idea of what the encoder can make out of considering several channels at once.
-l makes use of wasted bits
-p is slow, -7 is awfully slow, -7 -p even worse

Encoding time here is ... often just too expensive. Monkey's Extra High encodes in a minute. TAK -p4m (on two quad files per signal) in two. The ALS encoder hasn't seen much optimization, and the slow ALS modes take over two hours on 22 minutes material (and much worse on the single 96/24 file).  The only thing slower is ffmpeg's WavPack encoder at "-compression_level 8".  No, not reference WavPack's -hhx6; ffmpeg has its own implementation, and its level "8" took 12x as much time as wavpack.exe -hhx6.

size/1024Codec (& fileset) & settingRemarks
476340ALS-l-7-p-t8150-ish minutes. -7 is the slow mode. -p is "long-term" prediction (slower)
476445ALS-l-7-t8... and -t8 makes full 8ch decorrelation I think?
478512ALS-l-7-p-t4Hm, quad decorrelation is slightly worse than splitting in quads
481461TAK4ch+4ch.-p4m_BETA232_Saves ~ 2 percent over four stereos
481521TAK4ch+4ch.-p4mTakes 2 minutes
482057ALS4ch+4ch.-l-7-t0(This had a "-t0" by mistake, I realize)
485650TAK5.1file_AND_stereoSLSRfile.-p4m_BETA2.3.2_The 5.1 part is ~ 1.5 percent smaller than three stereos
488162ALS-l-7-iThe "-i" is supposed to shut off joint stereo. But -7 is the heavy slow mode
490826ALS-l-p-t8No "-7", takes "only" 40 minutes.
491952ALS-l-t8t8 makes around 1.8 percent difference
494991ALS-l-t4t4 takes out 2/3rds of the "t8" effect
499654ALS5.1file_AND_stereoSLSRfile.-l-t2Half a percent over the next.
501180ALS-l=wastedbitsThis ALS encodes at TAK -p4m speed
505319flac2ch+2ch+2ch+2ch.ktfs_irlspost-9ktf's IRLSPOST build, here FLAC utilizes stereo decorrelation
505444ALS-l-p-iUses wasted bits, but no joint stereo. "Long term"
507620ALS-l-iUses wasted bits, but no joint stereo.
510731flac_ktfs_irlsbeta.-9FLAC encodes as 8x mono.
516680WavPack-hx4j0j0 is supposed to switch off channel decorr, does that work with 7.1 at -hx4?
618871MLP5.1file_AND_stereoSLSRfileffmpeg's MLP encoder. (Does not handle 7.1) TrueHD: 1 KB smaller
710654Monkey5.1file_AND_stereoSLSRfile.EXTRAExtra high better than Insane.
714264Monkey-EXTRATakes 57sec. Weaker than 6ch decorr + stereo decorr
714990ALS-default Takes 80sec. No -l, it does not utilize wasted bits.
776954tta2ch+2ch+2ch+2chSeems that TTA also only decorrelates stereo. Four files ...
782144tta5.1file_AND_stereoSLSRfileOne file is stereo
799264refalac70sec. Reassigned BL,BR to make ALAC work!
802489tta4ch+4ch... two quads are worse than a 7.1, is that file/block overhead?
971694DolbyTrueHDMuxed out of the downloaded files. Much worse than ffmpeg's TrueHD encoder
1532054wavUncompressed PCM 7.1

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #14
Interesting list. Did you perhaps check whether the TrueHD original was pure MLP or a AC3 with correction?

I really don't understand how FLAC outperforms OptimFROG here. I've only had that with chiptune before. The relatively small difference between FLAC 8 channel and FLAC 4 stereo's (1%), and the modest gains achieved by the much smarter algorithms empoyed by TAK and ALS suggest that this file doesn't have much interchannel correlation to begin with.
Music: sounds arranged such that they construct feelings.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #15
Did you perhaps check whether the TrueHD original was pure MLP or a AC3 with correction?

Oh. Damn. I'm not doing this over again :-o

13 minutes of the 22: ffmpeg-i says Stream #0:1(eng): Audio: truehd, 48000 Hz, 7.1, s32 (24 bit) (default)
9 minutes were .m2ts files where ffmpeg -i filename -acodec copy m2ts.mka gives something like
Code: [Select]
[SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 90k tbn
  Stream #0:2[0x1100]: Audio: truehd (AC-3 / 0x332D4341), 48000 Hz, 7.1, s32 (24 bit)
  Stream #0:3[0x1100]: Audio: ac3 (AC-3 / 0x332D4341), 48000 Hz, 5.1(side), fltp, 640 kb/s
  Stream #0:4[0x1101]: Audio: eac3 (AC-3 / 0x332D4341), 48000 Hz, 7.1, fltp, 1664 kb/s
  Stream #0:5[0x1102]: Audio: ac3 (AC-3 / 0x332D4341), 48000 Hz, 5.1(side), fltp, 640 kb/s
Output #0, matroska, to 'm2ts.mka':
    encoder         : Lavf59.16.100
  Stream #0:0: Audio: truehd ([255][255][255][255] / 0xFFFFFFFF), 48000 Hz, 7.1, s32 (24 bit)
Stream mapping:
  Stream #0:2 -> #0:0 (copy)
What's it doing, really?

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #16
Did you perhaps check whether the TrueHD original was pure MLP or a AC3 with correction?

Oh. Damn. I'm not doing this over again :-o
That comment was mostly to put the very bad performance of TrueHD into perspective. It probably has a lossy stream as fallback embedded, which would explain why it compressed so badly.

What's it doing, really?
It copies stream 0:2 to the new (mka) file. Stream 0:2 is truehd. This doesn't tell us anything about whether that truehd stream is AC3+correction data or 'pure' MLP.
Music: sounds arranged such that they construct feelings.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #17
Yes, it doesn't tell when in .mka, you are right!

Code: [Select]
> ffmpeg -i .\Chameleon.m2ts -acodec copy -vn -sn m2ts-to.m2ts

  Stream #0:2[0x1100]: Audio: truehd (AC-3 / 0x332D4341), 48000 Hz, 7.1, s32 (24 bit)
  Stream #0:0: Audio: truehd (AC-3 / 0x332D4341), 48000 Hz, 7.1, s32 (24 bit)
Stream mapping:
  Stream #0:2 -> #0:0 (copy)

This one does say something about it. But then info on the output file:
Code: [Select]
> ffmpeg -i .\m2ts-to.m2ts

  Stream #0:0[0x1100]: Audio: truehd ([131][0][0][0] / 0x0083), 48000 Hz, 7.1, s32 (24 bit)
Now "truehd (AC-3 / 0x332D4341)" has become "truehd ([131][0][0][0] / 0x0083)"

Over to Matroska:

Code: [Select]
> ffmpeg -i .\m2ts-to.m2ts -acodec copy .\m2ts-to.m2ts-to.mka


  Stream #0:0[0x1100]: Audio: truehd ([131][0][0][0] / 0x0083), 48000 Hz, 7.1, s32 (24 bit)
  No Program
  Stream #0:1[0x1100]: Audio: ac3, 0 channels, fltp
Output #0, matroska, to '.\m2ts-to.m2ts-to.mka':
    encoder         : Lavf59.16.100
  Stream #0:0: Audio: truehd ([255][255][255][255] / 0xFFFFFFFF), 48000 Hz, 7.1, s32 (24 bit)
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
and info on the output file:

Code: [Select]
ffmpeg -i .\m2ts-to.m2ts-to.mka


  Stream #0:0: Audio: truehd, 48000 Hz, 7.1, s32 (24 bit)

Now it has become just "truehd". Which means that no information about AC3 does not rule out it being AC3. Oh.

Well at least it wasn't an outright transcode.  That was my worry. Not that I know whether there is anything to worry about from a testing point of view. (Maybe it is? Is there any lossy that when decoded is more friendly towards lossless compressor X than Y - without it being related to wasted bits?)

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #18
It's not that it's plain AC3. The question is if it's possible to find out if it's an AC3 elementary stream with a correction stream, or if it's pure MLP. Clearly, FFmpeg isn't telling you. Maybe that requires verbose output?

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #19
ffmpeg -i mkvfile.mkv -acodec copy -vn -sn mkv-to.m2ts
and then
ffmpeg -loglevel 40 -hide_banner -i .\mkv-to.m2ts
Code: [Select]
[mpegts @ 000002854760b3c0] max_analyze_duration 7000000 reached at 7000000 microseconds st:0
[mpegts @ 000002854760b3c0] start time for stream 1 is not set in estimate_timings_from_pts
[mpegts @ 000002854760b3c0] stream 1 : no TS found at start of file, duration not set
[mpegts @ 000002854760b3c0] Could not find codec parameters for stream 1 (Audio: ac3, 0 channels, fltp): unspecified sample rate
Consider increasing the value for the 'analyzeduration' (0) and 'probesize' (5000000) options
Input #0, mpegts, from '.\mkv-to.m2ts':
  Duration: 00:00:40.13, start: 1.400000, bitrate: 5754 kb/s
  Program 1
      service_name    : Service01
      service_provider: FFmpeg
  Stream #0:0[0x1100]: Audio: truehd ([131][0][0][0] / 0x0083), 48000 Hz, 7.1, s32 (24 bit)
  No Program
  Stream #0:1[0x1100]: Audio: ac3, 0 channels, fltp
At least one output file must be specified
[AVIOContext @ 0000028547613f00] Statistics: 4886672 bytes read, 3 seeks
... whatever that means.

(Using -codec copy yields pretty much the same audio-relevant output.)

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #20
Tested: near-mono stereo (CDDA)

Background: some time when I did the test on the least-compressible CD in my collection (the above Merzbow, compression figures here and some mildly shocking here) - I recalled that I also once tested the other end of my CD collection (an Édith Piaf compilation).
There I stumbling upon a known deficiency in WavPack 4: bad mono optimization, kept for compatibility.
WavPack 5 has sacrificed the stone-age decoder compatibility, so, why not do that test over again?

Closer inspection reveals the Piaf CD as almost mono indeed; only a few thousand samples differ between the channels, and never more than the LSB (difference peak at -90.31 dBTP).  Differences in all tracks yes, but nine of the twelve only in the last second.
Job was probably outsourced to El Cheapo Basement Mastering.

So as that is maybe as good as "mono encoded as stereo", I generated another near-mono that isn't mono: I took the "CDDA" corpus used here - 71 tracks arbitrarily chosen by sorting my non-classical collection by track MD5 sum - and generated one file of the first 10 seconds of each, i.e. 11 min 10 seconds.
Extracted the left channel, ffmpeg-resampled it to 88.2 kHz and back to 44.1 kHz, and made a "stereo" file with the unaltered left channel and the re-resampled channel. 
Result is very highly correlated channels, witnessed by encoders that can switch off stereo decorrelation: doing so with FLAC -4 to -8 and WavPack -f and WavPack default created 61 to 65 percent larger files.  A bit more or less with other options (actually, WavPack -h isn't too good here).

In both I included a couple of oddball codecs - not because I believe they will be used, but to get a gut feeling for what it takes to get these kinds of signal number-crunched, compared to ordinary ones. After all, there is a long development path from "works on my small development sample" to "robust enough not to make fool out of it on someone else's music", and I am sure that the developers of the alive-and-kicking codecs know.

Test number 1: 41 minutes, most samples "mono as stereo", some LSBs differing at end of every track:
100,0%wavPiaf 41 minutes almost mono
..dual mono & the like.
37,9%wv-ffmpegffmpeg-defaultffmpeg's WavPack runs dual mono.
30,2%als-i-i means dual mono. Adding -l changes nothing.
..Some of these are ... ... quite underwhelming
25,4%sac--optimize=normal --sparse-pcm0.01x realtime for this. Complete failure.
20,8%wv 4.80-hxOld WavPack cannot cope, we knew that.
20,4%wv 4.80-hhx62x realtime
20,3%wv-ffmpegffmpeg -compression_level 80.263 realtime and no match for WavPack 5.
20,2%flac-1-2 about the same
19,2%wv-fj0New WavPack ... but shouldn't j0 be dual mono!?
18,7%rka-l3RKAU's heaviest mode does not impress
18,0%la-high-noseekLA does not impress!
17,7%rka-l1RKAU's lightest mode beats the heaviest
17,6%la-normalLA normal beats high
17,4%flac-8and -8p between -8 and WavPack default
17,2%wvdefaultFLAC -5 and WavPack default shouldn't beat all of the ones above here, eh? ;-)
..TAK starts here.
17,0%tak-p0TAK files are in the right size order -p0, -p0e etc.
16,9%flac-8e-e makes more sense on CDDA than -p
16,5%flacflake -5
16,4%sac--high --optimize=fast --sparse-pcm52 hours
16,4%flacflaccl -11Not as good as -8
16,4%flacflake -6
16,2%flacflaccl -8
16,0%ofr--preset0--preset 0 has none of the frog's "optimizations" (compare WavPack's -x settings)
15,9%flacflake -8
15,8%wv-hhx64x realtime, is twice the speed of 4.80's -hhx6 (and saves a quarter size)
15,7%apeINSANEInsane ape fooled again!
15,4%flacktfs' irlspost build -9epalso ktf's double precision build lines up here
15,4%flacffmpeg-12-cholesky12ffmpeg wins the FLAC game, but:
15,4%flacffmpeg-12-cholesky6... ffmpeg-flac: 6 passes beats 12.
15,3%ofr--preset1OptimFROGs are in the right size order except 9 beats 10
15,1%apeEXTRA HIGH
15,1%alsdefault="-l". A few KB better than with -t2
14,8%tak-p4mBetween TAK -p2 and -p4m there is nothing but other TAK
14,4%als-7 -pSmallest ALS (also tried -z3 -p, avoid)
14,4%ofr--preset2This is OptimFROG's default
14,3%sac(default, i.e. "--normal")< 0.5x realtime speed.
13,9%sac--normal --sparse-pcm< 0.4x realtime. Only improving SAC option on this file.
13,8%ofr--preset6frog-only territory from here
13,6%ofr--preset9beats 10 narrowly
13,5%ofr--preset max1.85x realtime
Comments for this corpus:

* WavPack: predictably, it now does well - though not ffmpeg's version.
* FLAC: Here ffmpeg (including Justin Ruggles, creator of original Flake) are doing something damn good.
* TAK: damn good.  And, it is damn hard to find signals where TAK is not in size order.

Then the ones that work more asymmetrically, OptimFROG is pretty much where you thing it would be - --preset 9 beating --preset 10 is one of those little glitches in its machinery, but doesn't do much to the overall picture. 

* Monkey's: we are getting used to Monkey's Insane getting fooled.  Again I think the blocksize just isn't appropriate. 
* LA, RKAU and sac: It takes more than just a run-of-the-mill development corpus!
sac in particular: It is utterly useless but for this kind of benchmarking. Use the entire battery of sac's options brings it down to 0.007x realtime.  Gnawing at the same segment for forty-five minutes without a disk write.  Spending four days to make a compressed CD image file that is unsuited for playback - but ... sometimes there is something to be learned: Even this brute force, where it can chew on a segment for an hour looking for the right model to encode, is quite worthless without some engineering craft.  While sac's "--sparse-pcm" modelling twist might be mildly interesting, its --optimize (apparently inspired by its idol OptimFROG, and which again I think the frog took from WavPack's -x?) is of no use.
... on this signal.  On the Merzbow - baffling that it is even possible to out-frog OptimFROG on a signal that Monkey's and TAK cannot distinguish from static.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #21
Test number 2: 11 minutes with one channel an up-and-then-downsample of the other
That actually means most samples are different - but music-wise pretty much transparent to each other (admittedly I didn't do any listening test, but they should be). So highly correlated, but very few dual monos.

Since WavPack 4 vs WavPack 5 was one of the points of testing this, let me mention why they aren't both included in the table: Turns out the difference is small (WavPack 5 has slightly bigger overhead per block, some kilobytes - if you don't like that, just choose a bigger blocksize). Apparently, WavPack 4's deficiency in mono was in ... mono! Not in highly correlated stereo.
But WavPack is in for other surprises, as you will see.

Edit: Note again, this is 11 minutes, the previous one was 41. Read the times accordingly. Also, timings are not particularly rigorous, take them as "what ballpark".
..dual mono & the like.
67,9%wv-fj0Penalty for WavPack 5: <0.1 points - and vanishing at high --blocksize setting
66,6%flac-8p--no-mid-sideforces dual mono
66,5%als-iforces dual mono
64,3%xzGeneral purpose compressors beating dual mono
59,7%7zLZMA2 ultra
..Some of these are ... ... quite underwhelming
52,9%ladefaultAgain LA disapponts
51,5%wv-hj0Shouldn't j0 force dual mono?
50,8%lahighLA high beats default, but nothing special.
44,0%apeFASTMonkey's in the right order, but ...
43,2%wv-hThat wasn't good?
42,4%apeINSANE... but every monkey beaten by WavPack -f?!
41,8%wv-f-f beats -hh?! Same with 4.80
41,1%flac-8p-4 to -8 between wv -f and here
40,5%wvWavPack defaults beats -hx
40,4%wv-xAt least -x improves
40,4%flacffmpeg-8-compression_level 8 beats ... ?
40,3%wv-x4Max blocksize squeezes 0.11 points
40,1%wv-hhx4107 seconds
40,1%wv-ffmpeg-compression_level 4183 seconds
40,1%wv-hhx6230 seconds
40,0%wv-ffmpeg-compression_level 6356 seconds. WavPack format 4, and beaten by WavPack.exe 4.80
40,0%wv480-hhx4Only half as fast as 5.40, but still beats ffmpeg ...
40,0%wv-ffmpeg-compression_level 8... except ffmpeg at speeds you do not want to endure in daily use
40,0%alac(refalac!)At 7 seconds, ALAC is suprisingly good
39,9%flakeflake -1119 seconds
39,9%flacclflaccl -1110 seconds
39,8%flacflac-irls.-9ktf's IRLSPOST build takes 66 seconds
39,7%flacirls.-9p3 minutes. In between here: regular -8ep and double precision -8ep
..TAK starts here.
39,3%tak-p03 SECONDS. (And on spinning drive.)
38,7%alsdefault12 seconds. Bitexact to "-l".
38,6%sac--normal--optimize=normalSAC's "optimize" only wastes *hours* on these 11 minutes
38,4%tak-p1all TAK are nicely ordered
38,3%rka-l2RKAU 2 better than 3
38,2%ofr--preset0Bit-exact same file as --preset 1
38,1%tak-p2TAK default shaves a point off -p0
37,9%sac--high--optimize=high--sparse-pcm51 hours compressing 11 minutes.
37,8%tak-p47 seconds
37,7%tak-p4m13 seconds. Two points s(h)aved off the smallest FLAC, one off the second-smallest fast codec ALS
37,4%als-l-7-p1.2x realtime, like OptimFROG --preset max. -7 about same size
37,1%ofr--preset55 and 4 worse than 2 ...
36,3%ofr--preset215 seconds
36,2%sac--normal--sparse-pcm"only" 26 minutes or 0.4x realtime
36,0%sac--high--sparse-pcm32 minutes
35,5%ofr--preset322 seconds, and smaller files than presets up to 7
35,4%ofr--preset865 seconds to improve over --preset 3
35,1%ofr--preset10150 seconds
35,0%ofr--presetmaxStill > 1x realtime. Tweaking options at this level yields same file

The "faster" (de)compressors:
* WavPack: uh, this wasn't as should be, -f beating -hh?
Also, it is most appreciated that the ffmpeg team supports WavPack encoding - but still WavPackers might as well use its default setting and then recompress with WavPack 5. Unless you use the maddest -compression_level settings, that don't really pay off that well.
* FLAC/WavPack: WavPack has only mid/side decorrelation strategy, and I have a hunch that FLAC's willingness to try different strategies is what makes better at this file.
* TTA: Not horrible, not impressive ... nobody cares?
* ALAC: !! What the f**c is it doing there? That is good!
* MPEG-4 ALS is kinda the second-best fast codec here: its default operation is faster than the slowest TAK -p4m and smaller than any FLAC/ALAC/WavPack (/tta)
* TAK: Again TAK starts where FLAC ends. (ALS? The 10 minutes ALS isn't a fast decoder either.)

The "heavier" ones:
* Monkey's (and LA?): no clever stereo decorrelation strateg - maybe a naive mid/side like WavPack? That is more of an objection to something that sets out to win the compression game, than to WavPack that has its priorities more balanced. But at least Monkey's and LA get their sizes in order of the presets this time.
* OptimFROG: surprise that its size orders are so mixed here. Preset 3 beating preset 5 by 1.6 points is quite a bit out of froggy character. 
* sac: Again, a good codec takes more than CPU time, it has to be spent wisely.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #22
Oh, on TAK:
* Also tested the TAK 2.3.2 beta, since ... well it isn't targetting improvements, but if the fixes applied would matter somewhere it could very well be on oddities? Nothing to write home about, max difference 9 KB on fortyfive to seventy megabytes. Although, the difference is larger on p4 than on p3 than ... etc, and none are worse (read off integer KB only though).
* Actually it isn't completely true that all TAK are in order, there is one exception: in test 1 (the most-samples-mono Édith Piaf), -p1 is slightly better than -p1e. Both in 2.3.1 and the beta. Around 0.01 percentage points.

And I see that I have included so many that they obscure the magnitudes. And also that the compression is so high that I should maybe have quoted savings in percent rather than points - well I did point out that WavPack 5 saves "a quarter" size off WavPack 4 on the essentially-mono first corpus.
But also, note how patient FLAC users can save like 16 percent over FLAC -5 on the first CD, but only four on the second signal (that is much more varied!)
While on the other hand, the savings in going FLAC to TAK or TAK to OFR are about the same percentwise ballpark in both tests. (Five plus/minus one percent (not point!) - respectively eight plus/minus one.)

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #23
Thanks again Porcus for your meticulous testing! Like you say, not the most useful files for choosing a compressor, but interesting to get down to the nuts and bolts!

And you guessed right about the WavPack mono-optimization thing. It does only apply to stereo that’s 100% mono. The original omission was to not check for and make special provision for truly identical channels (i.e., encode only one channel) and we were instead encoding it as mid/side. Of course the “side” would have been total silence, and this would have been fine if the silence detector worked on individual channels, but it doesn’t. So there it is.

And you may very well be right about the second test results and the ability of some codecs (FLAC) to choose from more than left+right and mid+side. I suspect the other two are left+side and right+side and I think that actually very old WavPack 3 could do this (or at least I experimented with it at one point). In the end though I decided that the improvement was so rare and small that it didn’t justify the extra time to check.

There is one other interesting thing here, and that’s the --sparse-pcm option of sac. I believe that this refers to the situation where some PCM values are missing or there are probabilistic peculiarities in the PCM value’s distribution (the former being just a specific case of the latter).

The simplest and most universally encountered and easily handled version of this is the zeroed LSBs phenomenon that, for example, lossyWAV takes advantage of. This it trivial to implement and seems to crop up in real samples more than one would expect. Interesting fact: WavPack’s version also handles cases where the LSBs are 1s instead of 0s and the case where they’re either 1s or 0s, but still all identical for each sample. I have no idea how often those cases come up, or if other compressors bother, but it was so trivial to add that after the 0’s case that I could not resist.

Funny story is that when I submitted a patch to handle this for FFmpeg Michael resisted a little because this is unnecessarily complicating the code to handle something that should not exist. In a purist sense it’s simply wrong, which might be why some codecs refuse to deal with it all together.

But the zeroed LSB case is just the tip of the iceberg. Imagine the cases where every represented PCM value is a multiple of some other, non-power-of-two integer, like 3 or 5. This is not that different from the zero LSBs case, except it’s no longer trivial to detect, and probably not that common.

I actually wrote a program at some point to analyze PCM audio for these kinds of uneven distributions and then ran it on a sampling of thousands of CD tracks in my collection. The cases are surprisingly common. One of the most common types I found was where the PCM values had obviously been multiplied by some value greater than 1 and just truncated (not dithered) resulting in regularly spaced missing values. I was able to squeeze these out (losslessly, of course) and get significantly better compression, in some cases then blowing WavPack past all the competitors. I even went down this road pretty far devising easy ways to detect these cases and encode the “formula” to convert back to the original sample values. This was complicated by cases I found where this process had occurred twice, and there were variations on how positive and negative values were handled.

Another situation, which I also encountered, were cases where there were no missing codes but that adjacent codes that should have been equally probable, were not. So imagine a file where even values were twice as probable as odd ones. Quite a bit of entropy there to take advantage of (maybe close to half a bit per sample?) but not really obvious how to take advantage of it.

In the end I decided that this was not worthwhile. For one thing I feared that perhaps my rather old CD collection was not representative and modern mastering tools would not create audio like this any more. And of course this was going to be slow and create enormously complex code. It was an interesting exercise and it would have been fun to release a WavPack that significantly bettered existing encoders on specific files, but in the end thought better of it.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #24
The --sparse-pcm feature of sac is also "relatively cheap" - relatively meaning a 20 to 25 percent time penalty. That isn't much compared to a factor of 50 (not percent, that would be like 5000!) for leaving it do its frog-inspired --optimize thing.
Source available - didn't see a license, but ideas are out of the bag [insert rant about certain codec license here]
Also on the Merzbow track the --sparse-pcm makes much more difference than the other options to put into it. (I wrote wrong here, should be that "Normal" mode beats "High" at compression.)

But the zeroed LSB case is just the tip of the iceberg. Imagine the cases where every represented PCM value is a multiple of some other, non-power-of-two integer, like 3 or 5. This is not that different from the zero LSBs case, except it’s no longer trivial to detect, and probably not that common.
I think you mentioned this once here yes.  But you only checked for integer multiples?
Hunch: what if the "last three steps" to the final 44.1/16 file are resampling to 44.1, peak normalization and dithering down to 16.
Peak normalization is scaling.

Also I have scratched my head over ... more quiet parts (= blocks) could in principle be handled by
(1) upscaling, remembering the scaling factor, and
(2) wasted bits in the upscaled signal.
Worth it? Maybe not? Had the 14 bits DTS-CDs achieved world domination, it would have been a different thing.

For one thing I feared that perhaps my rather old CD collection was not representative and modern mastering tools would not create audio like this any more.
If my above hunch has anything to it, then we shouldn't even be surprised if "modern mastering tools" do this more often Edit: when hit the submit button, TBeck was already writing that he only found it in old recordings. Oh well ... that's what I get from just making layman's speculations. Which, anyway, follow unedited from here:
Modern mastering tools include the musician's own computers to an extent unheard of in 1990. And who knows whether they do things in the "correct" order, when the outcome anyway so much more than enough resolution for nothing to be audible.

Also I have scratched my head over: suppose for example I create a 16-bit signal with peak of 8 bits below digital full scale, and then pad to 24. WavPack/FLAC/TAK handle the 8 wasted LSBs. They also certainly benefit from the 8 MSBs being zero (all numbers are smaller!), but do they exploit this fully?

Padding to 24 isn't farfetched. There must be a lot of files around that once were 16 bits, but were then imported into say Audacity for a minor adjustment (like peak normalization?!) and then exported to 24 bits ... dithered at the bottom. Which means there is a linearly transformed "bunch of wasted bits signal" that differs to this by ... some noise that anyway needs to be stored as residual.
(Insert obvious analogy to least-squares fit here.)