So after I made a quite thoughtless across-tracks-decorrelation experiment (https://hydrogenaud.io/index.php?topic=121739.msg1005004#msg1005004), and played a bit back and forth with different codecs on that to sort out my confusion, I thought, why not run a test on some files I've already thrown at @ktf
's FLAC betas (https://hydrogenaud.io/index.php?topic=120158.msg1003738#msg1003738).
I was in particular curious about one thing: OptimFrog's claim to using a smarter decorrelation (http://losslessaudio.org/). Turns out, OptimFrog and Monkey's and TAK can get more out of stereo on CDDA, than do WavPack-less-than-x4 and FLAC - but that is not the case for 96/24. In any case, it cannot explain OptimFrog's small filesizes; rather they are probably because the format is generally complex and puts your CPU at work to keep you warm during the winter.
The results should be interpreted with so much caution that I initially thought they might not be useful unless something striking would show up. Say, here you cannot expect results to go in the same direction:
If encoder X spends more time than encoder Y getting stereo file x smaller than stereo file y, we cannot tell whether it is more "efficient" or just spends more effort searching for patterns. We don't even know the theoretical compressibility of the L-R difference signal.
Anyway, I think we can take home a few findings:
* TTA cannot do mono! Yeah it can handle multi-channel, but cannot read input files that are mono .wav *shrugs*
* The small-files compressors OFR/TAK/MAC compress both the mono well and the stereo well. OFR and TAK's heavier modes increase the stereo diff.
* WavPack at x4: Consistent with tests at https://hydrogenaud.io/index.php?topic=120454.msg1004854#msg1004854 , WavPack "needs" x4 to compress hirez well. Maybe it is to get differences in the ultrasonic octave compressed?
* FLAC. The beta implements double-precision calculation that improves quite a bit, especially for higher-rez. An reasonable speculation on the "good" stereo reduction for stock 1.3.1, could be that it compresses away some of the effects of a "bad roundoff" to single precision that makes for more digits common to all channels. Bad common roundoff to zeroes in common -> can in part be compressed.
* FLAC's "-M" does pretty well. With that switch it does not fully calculate L/R vs mid/side before deciding which one to use.
Columns: Mono size (my locale uses comma for decimal separator), stereo gain in ppm; then stereo gain per sub-corpus. Not displayed: gains per file to get an idea of per-file overhead, see instead the first FLAC column, and consider that there were 71 + 42 + 1 file(s).
|mono GB||2chdiffppm||.||CDDA, rock||hirez, rock||hirez, jazz/cl.|
|FLAC irls-2021-09-21 -8 --no-mid-side||6,21||580||.||783||512||468|
|FLAC irls-2021-09-21 -8 -M||6,21||5 500||.||12 290||3 073||1 864|
|FLAC irls-2021-09-21 -8||6,21||6 485||.||13 459||4 432||2 312|
|FLAC irls-2021-09-21 -5||6,26||6 608||.||13 670||4 548||2 364|
|FLAC 1.3.1 -8||6,38||7 940||.||13 567||8 155||2 707|
|FLAC 1.3.1 -5||6,44||8 274||.||13 978||8 718||2 747|
|WavPack -f||6,58||4 744||.||11 979||3 077||−45|
|WavPack default||6,40||4 953||.||10 967||3 994||546|
|WavPack -hx1||6,27||5 782||.||16 284||1 987||199|
|WavPack -hhx4||6,24||15 832||.||18 105||22 998||6 667|
|Monkey's normal||6,29||6 923||.||17 691||3 403||828|
|Monkey's insane||6,22||6 931||.||16 887||4 016||957|
|TAK -p2||6,15||6 907||.||17 994||2 735||1 176|
|TAK -p4m||6,09||7 440||.||18 453||3 389||1 654|
|OFR --preset 2||6,06||6 457||.||17 286||1 785||1 456|
|OFR --preset 10||5,98||7 604||.||18 657||3 706||1 631|
Corpus in more detail:
The first two columns are the corpus from https://hydrogenaud.io/index.php?topic=120158.msg1003738#msg1003738 .
The "hirez jazz/cl." is one file where I merged together 106 minutes 96/24 from http://www.2l.no/hires/ , jazz/classical acoustic recordings sometimes recorded in multi-ch and downmixed. Same file as mentioned at the bottom of https://hydrogenaud.io/index.php?topic=120158.msg1001334#msg1001334 .
Could you please elaborate a bit on the columns? I'm too dumb to figure that "ppm" meaning :)
ppm = "parts per million". 1/10k of a percentage point.
So for the biggest overall effect, WavPack -hhx4, the mono files are 624/1061 of WAV size, that is 58.8 percent; the stereo files are around 1.6 less, 57.2 percent of WAV size.
Here you got the same with different formatting, where mono filesizes are in percent of .wav, and where the differences are in percentage points:
|mono compression||2chdiff in pctpts||CDDA rock||hirez rock||hirez jazz/cl.|
|FLAC irls-2021-09-21 -8 --no-mid-side||58.5%||0.06||0.08||0.05||0.05|
|FLAC irls-2021-09-21 -8 -M||58.5%||0.55||1.23||0.31||0.19|
|FLAC irls-2021-09-21 -8||58.5%||0.65||1.35||0.44||0.23|
|FLAC irls-2021-09-21 -5||59.0%||0.66||1.37||0.45||0.24|
|FLAC 1.3.1 -8||60.2%||0.79||1.36||0.82||0.27|
|FLAC 1.3.1 -5||60.7%||0.83||1.40||0.87||0.27|
|OFR --preset 2||57.1%||0.65||1.73||0.18||0.15|
|OFR --preset 10||56.4%||0.76||1.87||0.37||0.16|
It may be surprising to see WavPack -hhx4 not out-compress FLAC, but that is because most of the corpus
is high sample rate where WavPack doesn't shine as much and where the new FLAC beta improves a lot.
WAV file sizes:
CDDA rock: 3.05 GB (5h09min)
hirez rock: 3.4 GB (1h47)
hirez jazz/cl.: 3.4 GB (1h46)
2chdiff - is that full file but stereo, or is that only the extracted mid/side or l/r difference signal?
That is the difference
one file in stereo - (one file for the left channel + one file for the right channel)
Hmm the question is if codecs treat the difference as separate signal and compress it separately, or is it somehow used for predictors etc. If the latter, then I'm not sure if compressing the difference signal alone tells much - it's not music and predictors aren't tuned for it... It somehow resembles analysing lossy codecs by listening to difference signal...
No, not "difference" as in difference signal - as difference in size.
What I did, was I split a stereo file file into a left channel file and a right channel file.
Compressed left channel file and right channel file. That is a safe way to get "dual mono" of the same audio.
Compressed the original file too, with the same setting.
Then a measure of how much use the encoder makes of channel correlation, is: how many percent does it gain when it can look at both?
A measure, but I didn't say it was a precise one. But FWIW I think it says something about the FLAC revision, about some WavPack settings - and, it suggests that OptimFrog's secret doesn't lie in exceptional handling of stereo, but rather in throwing heavy artillery at every signal.
Quite clear results. I only know the FLAC format very well, I just looked up Wavpack and Monkeys Audio. Put simply, it seems WavPack and Monkeys Audio only implement a conversion to mid-side audio, while FLAC also has stereo decorrelation modes called left-side and right-side. To me it seems that for WavPack and Monkeys Audio, though I'm not sure about Monkey's Audio, that either left and right or mid and side channels are treated separately after either converting from left-right to mid-side or not.
So, apparently, the gain is not in the stereo decorrelation but in the way a mid or a side channel can be compressed. The best explanation I can come up with (but please note this is purely guesswork) is that FLAC is less equipped to deal with small signals that might occur in the mid channel of highly-correlated stereo. This would also explain why FLACs benefit is only present for 16-bit (CDDA) material and not for 24-bit signals.
Not sure what are the right guesses CDDA vs hirez ... one thing is that a big part of hirez is uncorrelated noise. (Also I have not timed these. No info here about how much more effort froggy puts in stereo than in mono, for example.)
Anyway, here is the CDDA-only part, and where I have added columns that compare each mono to the FLAC beta at -8 mono, and each stereo to new -8 stereo. The final column is, say, [t](filesize FLAC -8 stereo minus Monkey's insane stereo) minus (filesize FLAC -8 mono minus Monkey's insane mono)[/t], that is: We know that Monkey's insane compresses more than FLAC does, and that difference is how much bigger in stereo than in dual mono?
|CDDA part only||mono compression||stereo compression||2chdiff in %pts||1ch vs new FLAC -8||2ch vs new FLAC -8||diff previous two|
|FLAC irls-2021-09-21 -8 --no-mid-side||67.2%||67.1%||0.08||0.00||−1.27|
|FLAC irls-2021-09-21 -8 -M||67.2%||66.0%||1.23||0.00||−0.12|
|FLAC irls-2021-09-21 -8||67.2%||65.9%||1.35||0.00||0.00|
|FLAC irls-2021-09-21 -5||67.6%||66.2%||1.37||−0.34||−0.32|
|FLAC 1.3.1 -8||67.3%||65.9%||1.36||−0.08||−0.07|
|FLAC 1.3.1 -5||67.7%||66.3%||1.40||−0.46||−0.40|
|OFR --preset 2||65.4%||63.7%||1.73||1.81||2.19||0.38|
|OFR --preset 10||64.4%||62.5%||1.87||2.82||3.34||0.52|
We see that getting a stereo signal, will enable the higher-compressing codecs to increase their compression advantages over FLAC, that is not unexpected; and for WavPack, TAK and OptimFrog (but not Monkey!) the higher modes do even better.
bad-ass track, for what it is worth:Merzbow: "I Lead You Towards Glorious Times". (https://www.youtube.com/watch?v=OzWNJtN86kU)
There are genres that messes up good codecs, and this is the least compressible CDDA track in my collection. Some compression tests here. (https://hydrogenaud.io/index.php?topic=120158.msg1001917#msg1001917) ("my final" test, my ass ... too curious. Rehab is for quitters.)
Anyway, because this is noise
, one would not expect much help from stereo - indeed, tests show that the stereo file isn't much compressible from PCM, so there cannot have been much eh? Still codecs behave different. These are sorted by stereo size (mono size maintains that order except between LA -high and -high -noseek).
|stereo kB||mono kB||stereo − mono||left − right|
I tried to get three settings from each encoder: normal, high and maximal. OFR 10 was chosen as maximal by mistake, and kept as high when I discovered --preset max, and then ... Other choices were a bit ... arbitrary. The new FLAC beta produces bit-identical audio to 1.3.1, so the new "-9 -p -e" was a candidate for max (it didn't squeeze much out).
*The file fools OptimFrog's --preset max big time. And the LA's come out in the wrong order.
* I've known since long that TAK doesn't like this piece of music, and Monkey's is even worse. Those return bigger files than the WAV. But TAK in the very least can utilize stereo.
* Indeed less than half these files can get help from stereo here. FLAC does, as these presets (which include -m) pretty much brute-force searches the stereo options. TAK does even better. Two OFR modes do, so here there actually might be
some support to froggy claims that it can make pretty good sense out of stereo.
* The two good WavPacks and the two good OFRs disagree with everything else about what channel should be smallest. Maybe because they are the only to make good sense out of the right channel.
Sorry I’m a little late to respond here, but I have been following along. Thanks for analyzing this, it’s definitely interesting to see how the different compressors respond to stereo! I also was a little confused at first on what the columns meant, but you clarified it nicely.
I am only really familiar with how WavPack handles stereo, and how the “extra” modes work, so I’ll clarify that a little which will hopefully add something to the discussion.
In the “fast” and “normal” modes the default behavior (as was guessed) is just converting left-right to mid-side, and then treating the two channels completely independently. It can be turned off for comparison (-j0), but it’s almost always better. All of the “extra” modes check to make sure mid-side is improving things.
There is obviously still going to be some correlation between the channels even after mid-side encoding, and so the “high” and “extra high” modes take advantage of this. The filters with negative term values (-1, -2, and -3) employ this “cross-correlation”.
As for the “extra” modes, when I created the filters that are available at levels -x1 to -x3, there was very little high-resolution material out there (I think I had three tracks I captured somehow from a DVD) and so I didn’t use that in my corpus. Everyone was just comparing compression using CD audio and so I optimized for that.
The higher modes (-x4 to -x6) create new filters from scratch, so it makes perfect sense to me that those would be best for high-resolution (they have no preconceived notions).
So I knocked TTA for the wrong reason:
a few findings:
* TTA cannot do mono!
Yes it can do mono! What this TTA version refuses to handle, are ffmpeg-generated .wav files (https://hydrogenaud.io/index.php?topic=121924).
So I tested it. Same corpus. First table, the three rightmost figures are unsurprising, not far from far WavPack default or -hx1: sixteen thousand ppm, five thousand ppm, and three hundred ppm.
But the Merzbow mono files fooled it. The monos sum up to 23 kbit/s worse than Monkey's normal, while stereo is 25 better. Mono: worst by far, stereo: between flac -5 and wavpack default. So it is a signal that fools it, and luckily it is a mono (less interesting) such that stereo finds what to do about it.
I also was a little confused at first on what the columns meant, but you clarified it nicely.
No wonder for confusion when I cannot even make my mind up on whether size s(h)avings should be positive of negative numbers.
And ... :
while FLAC also has stereo decorrelation modes called left-side and right-side
It was only after reading this it dawned for me that left-side and right-side are stereo decorrelation strategies
- not weird channel configurations that FLAC chose to support. Explains my ignorant comment here (https://hydrogenaud.io/index.php?topic=121478.msg1004551#msg1004551).
It was only after reading this it dawned for me that left-side and right-side are stereo decorrelation strategies - not weird channel configurations that FLAC chose to support.
Yes. It seems most lossless formats only do either left & right or mid & side encoding, but FLAC can also choose to encode left & side or right & side. This is beneficial if there is some form of stereo correlation, but the resulting mid channel is more complex to encode than either left or right.