"Tested": codecs for the effect of stereo decorrelation (mid/side)

Topic: "Tested": codecs for the effect of stereo decorrelation (mid/side) (Read 14511 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

"Tested": codecs for the effect of stereo decorrelation (mid/side)

2021-11-23 10:00:01

So after I made a quite thoughtless across-tracks-decorrelation experiment, and played a bit back and forth with different codecs on that to sort out my confusion, I thought, why not run a test on some files I've already thrown at @ktf's FLAC betas.

I was in particular curious about one thing: OptimFrog's claim to using a smarter decorrelation. Turns out, OptimFrog and Monkey's and TAK can get more out of stereo on CDDA, than do WavPack-less-than-x4 and FLAC - but that is not the case for 96/24. In any case, it cannot explain OptimFrog's small filesizes; rather they are probably because the format is generally complex and puts your CPU at work to keep you warm during the winter.

The results should be interpreted with so much caution that I initially thought they might not be useful unless something striking would show up. Say, here you cannot expect results to go in the same direction:
If encoder X spends more time than encoder Y getting stereo file x smaller than stereo file y, we cannot tell whether it is more "efficient" or just spends more effort searching for patterns. We don't even know the theoretical compressibility of the L-R difference signal.

Anyway, I think we can take home a few findings:
* TTA cannot do mono! Yeah it can handle multi-channel, but cannot read input files that are mono .wav *shrugs*
* The small-files compressors OFR/TAK/MAC compress both the mono well and the stereo well. OFR and TAK's heavier modes increase the stereo diff.
* WavPack at x4: Consistent with tests at https://hydrogenaud.io/index.php?topic=120454.msg1004854#msg1004854 , WavPack "needs" x4 to compress hirez well. Maybe it is to get differences in the ultrasonic octave compressed?
* FLAC. The beta implements double-precision calculation that improves quite a bit, especially for higher-rez. An reasonable speculation on the "good" stereo reduction for stock 1.3.1, could be that it compresses away some of the effects of a "bad roundoff" to single precision that makes for more digits common to all channels. Bad common roundoff to zeroes in common -> can in part be compressed.
* FLAC's "-M" does pretty well. With that switch it does not fully calculate L/R vs mid/side before deciding which one to use.

Columns: Mono size (my locale uses comma for decimal separator), stereo gain in ppm; then stereo gain per sub-corpus. Not displayed: gains per file to get an idea of per-file overhead, see instead the first FLAC column, and consider that there were 71 + 42 + 1 file(s).

	mono GB	2chdiffppm	.	CDDA, rock	hirez, rock	hirez, jazz/cl.
			.
WAVE	10,61	2,9	.	7,4	1,8	0,0
			.
FLAC irls-2021-09-21 -8 --no-mid-side	6,21	580	.	783	512	468
FLAC irls-2021-09-21 -8 -M	6,21	5 500	.	12 290	3 073	1 864
FLAC irls-2021-09-21 -8	6,21	6 485	.	13 459	4 432	2 312
FLAC irls-2021-09-21 -5	6,26	6 608	.	13 670	4 548	2 364
FLAC 1.3.1 -8	6,38	7 940	.	13 567	8 155	2 707
FLAC 1.3.1 -5	6,44	8 274	.	13 978	8 718	2 747
			.
WavPack -f	6,58	4 744	.	11 979	3 077	−45
WavPack default	6,40	4 953	.	10 967	3 994	546
WavPack -hx1	6,27	5 782	.	16 284	1 987	199
WavPack -hhx4	6,24	15 832	.	18 105	22 998	6 667
			.
Monkey's normal	6,29	6 923	.	17 691	3 403	828
Monkey's insane	6,22	6 931	.	16 887	4 016	957
			.
TAK -p2	6,15	6 907	.	17 994	2 735	1 176
TAK -p4m	6,09	7 440	.	18 453	3 389	1 654
			.
OFR --preset 2	6,06	6 457	.	17 286	1 785	1 456
OFR --preset 10	5,98	7 604	.	18 657	3 706	1 631

Corpus in more detail:
The first two columns are the corpus from https://hydrogenaud.io/index.php?topic=120158.msg1003738#msg1003738 .
The "hirez jazz/cl." is one file where I merged together 106 minutes 96/24 from http://www.2l.no/hires/ , jazz/classical acoustic recordings sometimes recorded in multi-ch and downmixed. Same file as mentioned at the bottom of https://hydrogenaud.io/index.php?topic=120158.msg1001334#msg1001334 .

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #1 – 2021-11-24 12:35:21

Could you please elaborate a bit on the columns? I'm too dumb to figure that "ppm" meaning

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #2 – 2021-11-24 12:52:25

ppm = "parts per million". 1/10k of a percentage point.

So for the biggest overall effect, WavPack -hhx4, the mono files are 624/1061 of WAV size, that is 58.8 percent; the stereo files are around 1.6 less, 57.2 percent of WAV size.

Here you got the same with different formatting, where mono filesizes are in percent of .wav, and where the differences are in percentage points:

	mono compression	2chdiff in pctpts	CDDA rock	hirez rock	hirez jazz/cl.

WAVE	100.0%	2.94E−04	7.42E−04	1.84E−04	4.36E−06

FLAC irls-2021-09-21 -8 --no-mid-side	58.5%	0.06	0.08	0.05	0.05
FLAC irls-2021-09-21 -8 -M	58.5%	0.55	1.23	0.31	0.19
FLAC irls-2021-09-21 -8	58.5%	0.65	1.35	0.44	0.23
FLAC irls-2021-09-21 -5	59.0%	0.66	1.37	0.45	0.24
FLAC 1.3.1 -8	60.2%	0.79	1.36	0.82	0.27
FLAC 1.3.1 -5	60.7%	0.83	1.40	0.87	0.27

WavPack -f	62.0%	0.47	1.20	0.31	0.00
WavPack default	60.4%	0.50	1.10	0.40	0.05
WavPack -hx1	59.1%	0.58	1.63	0.20	0.02
WavPack -hhx4	58.8%	1.58	1.81	2.30	0.67

Monkey's normal	59.3%	0.69	1.77	0.34	0.08
Monkey's insane	58.6%	0.69	1.69	0.40	0.10

TAK -p2	58.0%	0.69	1.80	0.27	0.12
TAK -p4m	57.4%	0.74	1.85	0.34	0.17

OFR --preset 2	57.1%	0.65	1.73	0.18	0.15
OFR --preset 10	56.4%	0.76	1.87	0.37	0.16

It may be surprising to see WavPack -hhx4 not out-compress FLAC, but that is because most of the corpus is high sample rate where WavPack doesn't shine as much and where the new FLAC beta improves a lot.
WAV file sizes:
CDDA rock: 3.05 GB (5h09min)
hirez rock: 3.4 GB (1h47)
hirez jazz/cl.: 3.4 GB (1h46)

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #3 – 2021-11-24 21:41:19

2chdiff - is that full file but stereo, or is that only the extracted mid/side or l/r difference signal?

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #4 – 2021-11-24 21:42:46

That is the difference
one file in stereo - (one file for the left channel + one file for the right channel)

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #5 – 2021-11-24 22:04:34

Hmm the question is if codecs treat the difference as separate signal and compress it separately, or is it somehow used for predictors etc. If the latter, then I'm not sure if compressing the difference signal alone tells much - it's not music and predictors aren't tuned for it... It somehow resembles analysing lossy codecs by listening to difference signal...

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #6 – 2021-11-24 22:31:08

No, not "difference" as in difference signal - as difference in size.

What I did, was I split a stereo file file into a left channel file and a right channel file.
Compressed left channel file and right channel file. That is a safe way to get "dual mono" of the same audio.
Compressed the original file too, with the same setting.

Then a measure of how much use the encoder makes of channel correlation, is: how many percent does it gain when it can look at both?
A measure, but I didn't say it was a precise one. But FWIW I think it says something about the FLAC revision, about some WavPack settings - and, it suggests that OptimFrog's secret doesn't lie in exceptional handling of stereo, but rather in throwing heavy artillery at every signal.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #7 – 2021-11-28 19:00:50

Quite clear results. I only know the FLAC format very well, I just looked up Wavpack and Monkeys Audio. Put simply, it seems WavPack and Monkeys Audio only implement a conversion to mid-side audio, while FLAC also has stereo decorrelation modes called left-side and right-side. To me it seems that for WavPack and Monkeys Audio, though I'm not sure about Monkey's Audio, that either left and right or mid and side channels are treated separately after either converting from left-right to mid-side or not.

So, apparently, the gain is not in the stereo decorrelation but in the way a mid or a side channel can be compressed. The best explanation I can come up with (but please note this is purely guesswork) is that FLAC is less equipped to deal with small signals that might occur in the mid channel of highly-correlated stereo. This would also explain why FLACs benefit is only present for 16-bit (CDDA) material and not for 24-bit signals.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #8 – 2021-11-29 00:14:29

Not sure what are the right guesses CDDA vs hirez ... one thing is that a big part of hirez is uncorrelated noise. (Also I have not timed these. No info here about how much more effort froggy puts in stereo than in mono, for example.)

Anyway, here is the CDDA-only part, and where I have added columns that compare each mono to the FLAC beta at -8 mono, and each stereo to new -8 stereo. The final column is, say, [t](filesize FLAC -8 stereo minus Monkey's insane stereo) minus (filesize FLAC -8 mono minus Monkey's insane mono)[/t], that is: We know that Monkey's insane compresses more than FLAC does, and that difference is how much bigger in stereo than in dual mono?

CDDA part only	mono compression	stereo compression	2chdiff in %pts	1ch vs new FLAC -8	2ch vs new FLAC -8	diff previous two

WAVE	100.0%	100.0%	7.42E−04

FLAC irls-2021-09-21 -8 --no-mid-side	67.2%	67.1%	0.08	0.00	−1.27
FLAC irls-2021-09-21 -8 -M	67.2%	66.0%	1.23	0.00	−0.12
FLAC irls-2021-09-21 -8	67.2%	65.9%	1.35	0.00	0.00
FLAC irls-2021-09-21 -5	67.6%	66.2%	1.37	−0.34	−0.32
FLAC 1.3.1 -8	67.3%	65.9%	1.36	−0.08	−0.07
FLAC 1.3.1 -5	67.7%	66.3%	1.40	−0.46	−0.40

WavPack -f	68.7%	67.5%	1.20	−1.47	−1.62	−0.15
WavPack default	67.4%	66.3%	1.10	−0.20	−0.45	−0.25
WavPack -hx1	66.9%	65.3%	1.63	0.32	0.60	0.28
WavPack -hhx4	66.7%	64.9%	1.81	0.55	1.01	0.46

Monkey's normal	66.1%	64.3%	1.77	1.15	1.57	0.42
Monkey's insane	65.1%	63.4%	1.69	2.12	2.47	0.34

TAK -p2	66.3%	64.5%	1.80	0.95	1.40	0.45
TAK -p4m	65.8%	63.9%	1.85	1.47	1.97	0.50

OFR --preset 2	65.4%	63.7%	1.73	1.81	2.19	0.38
OFR --preset 10	64.4%	62.5%	1.87	2.82	3.34	0.52

We see that getting a stereo signal, will enable the higher-compressing codecs to increase their compression advantages over FLAC, that is not unexpected; and for WavPack, TAK and OptimFrog (but not Monkey!) the higher modes do even better.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #9 – 2021-12-03 17:13:55

One bad-ass track, for what it is worth:
Merzbow: "I Lead You Towards Glorious Times".

There are genres that messes up good codecs, and this is the least compressible CDDA track in my collection. Some compression tests here. ("my final" test, my ass ... too curious. Rehab is for quitters.)

Anyway, because this is noise, one would not expect much help from stereo - indeed, tests show that the stereo file isn't much compressible from PCM, so there cannot have been much eh? Still codecs behave different. These are sorted by stereo size (mono size maintains that order except between LA -high and -high -noseek).

stereo kB	mono kB	stereo − mono			left − right
58782	58722	60	ape	normal	−280
58716	58652	63	ape	extrahigh	−269
58691	58647	43	ape	insane	−290
58354	58545	−191	tak	default	−216
58214	58362	−148	tak	p3e	−230
58184	58340	−157	tak	p4m	−233
*58177*	*58177*	0	*wav*	*uncompressed*	0
57879	57699	180	wv	default	−41
57571	57615	−44	flac	5	−258
57508	57552	−44	flac	8	−285
57507	57551	−44	flac	9pe	−286
54306	54693	−387	ofr	presetmax	−221
53942	53916	25	wv	hx4	80
53921	53914	7	wv	hhx6	83
53909	53587	322	la	highnoseek	−311
53793	53588	204	la	high	−310
53757	53564	193	la	default	−299
52224	52149	75	ofr	default	744
51401	51888	−486	ofr	preset10	1064

I tried to get three settings from each encoder: normal, high and maximal. OFR 10 was chosen as maximal by mistake, and kept as high when I discovered --preset max, and then ... Other choices were a bit ... arbitrary. The new FLAC beta produces bit-identical audio to 1.3.1, so the new "-9 -p -e" was a candidate for max (it didn't squeeze much out).

*The file fools OptimFrog's --preset max big time. And the LA's come out in the wrong order.
* I've known since long that TAK doesn't like this piece of music, and Monkey's is even worse. Those return bigger files than the WAV. But TAK in the very least can utilize stereo.
* Indeed less than half these files can get help from stereo here. FLAC does, as these presets (which include -m) pretty much brute-force searches the stereo options. TAK does even better. Two OFR modes do, so here there actually might be some support to froggy claims that it can make pretty good sense out of stereo.
* The two good WavPacks and the two good OFRs disagree with everything else about what channel should be smallest. Maybe because they are the only to make good sense out of the right channel.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #10 – 2021-12-05 01:13:05

Sorry I’m a little late to respond here, but I have been following along. Thanks for analyzing this, it’s definitely interesting to see how the different compressors respond to stereo! I also was a little confused at first on what the columns meant, but you clarified it nicely.

I am only really familiar with how WavPack handles stereo, and how the “extra” modes work, so I’ll clarify that a little which will hopefully add something to the discussion.

In the “fast” and “normal” modes the default behavior (as was guessed) is just converting left-right to mid-side, and then treating the two channels completely independently. It can be turned off for comparison (-j0), but it’s almost always better. All of the “extra” modes check to make sure mid-side is improving things.

There is obviously still going to be some correlation between the channels even after mid-side encoding, and so the “high” and “extra high” modes take advantage of this. The filters with negative term values (-1, -2, and -3) employ this “cross-correlation”.

As for the “extra” modes, when I created the filters that are available at levels -x1 to -x3, there was very little high-resolution material out there (I think I had three tracks I captured somehow from a DVD) and so I didn’t use that in my corpus. Everyone was just comparing compression using CD audio and so I optimized for that.

The higher modes (-x4 to -x6) create new filters from scratch, so it makes perfect sense to me that those would be best for high-resolution (they have no preconceived notions).

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #11 – 2021-12-30 20:42:24

So I knocked TTA for the wrong reason:

Quote from: Porcus on 2021-11-23 10:00:01

a few findings:
* TTA cannot do mono!

Yes it can do mono! What this TTA version refuses to handle, are ffmpeg-generated .wav files.

So I tested it. Same corpus. First table, the three rightmost figures are unsurprising, not far from far WavPack default or -hx1: sixteen thousand ppm, five thousand ppm, and three hundred ppm.
But the Merzbow mono files fooled it. The monos sum up to 23 kbit/s worse than Monkey's normal, while stereo is 25 better. Mono: worst by far, stereo: between flac -5 and wavpack default. So it is a signal that fools it, and luckily it is a mono (less interesting) such that stereo finds what to do about it.

Quote from: bryant on 2021-12-05 01:13:05

I also was a little confused at first on what the columns meant, but you clarified it nicely.

No wonder for confusion when I cannot even make my mind up on whether size s(h)avings should be positive of negative numbers.

And ... :

Quote from: ktf on 2021-11-28 19:00:50

while FLAC also has stereo decorrelation modes called left-side and right-side

It was only after reading this it dawned for me that left-side and right-side are stereo decorrelation strategies - not weird channel configurations that FLAC chose to support. Explains my ignorant comment here.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #12 – 2021-12-31 10:12:46

Quote from: Porcus on 2021-12-30 20:42:24

It was only after reading this it dawned for me that left-side and right-side are stereo decorrelation strategies - not weird channel configurations that FLAC chose to support.

Yes. It seems most lossless formats only do either left & right or mid & side encoding, but FLAC can also choose to encode left & side or right & side. This is beneficial if there is some form of stereo correlation, but the resulting mid channel is more complex to encode than either left or right.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #13 – 2022-04-06 18:16:35

Tested: 7.1 files - as they are and with channels exported to quad+quad or to stereo+stereo+stereo+stereo
Purpose: Effects of "multi-channel" decorrelation
... also purpose: lobbying at @TBeck for TAK to do 7.1, as it would slay anything that isn't awfully slow. I have included the new TAK beta too!

Caveat: who knows what a representative 7.1 signal looks like - these files turned out to something OptimFROG doesn't like, which is uncommon.

There are some Dolby Digital trailer files at https://thedigitaltheater.com/dolby-trailers/ . I retrieved the 7.1 files, and remuxed the Dolby TrueHD audio streams to .mka. Note to those who want to play with the same thing: ffmpeg -acodec copy picks only the first audio stream, but luckily the lossless stream was first on all, saving me work.
Deleting a duplicate I was down to 18 files, a total of no more than 22 minutes; all but one are 48/24 but with lots of wasted bits.
Exported each to .wav in three ways:
* as-is: 7.1
* two quad files: "front" channels FL+FR+FC+LFE in one file, and "side/back" channels BL+BR+SL+SR in another (for TAK to handle it!)
* four stereos, in order 0&1, 1&2, 3&4, 5&6, 7&8

... and then let encoders run for a couple of nights.

A major surprise emerged after OptimFROG'ing the stereos: this is material where OptimFROG performs worse than FLAC.
A not so big surprise: Monkey's performs bad - due to not utilizing wasted bits. That means, a 16 bit sample in a 24 bit container (padded with zeroes) compresses much worse than 16 in 16. Flac, WavPack and TAK compress them as good as 16 in 16; MPEG-4 ALS does so with the "-l" switch.
Several of these files appear to have parts where fewer than 24 bits are at work - but not necessarily so during the entire file.

Why all these ALS settings tested?
-t# is ... well what is it? Help file says it is "two modes", joint stereo and #channel decorrelation, where channels must be a multiple of the number. I take it that in a 7.1, -t4 means it tries joint stereo and tries 4ch groupings and picks the best.
In the very least, it gives an idea of what the encoder can make out of considering several channels at once.
-l makes use of wasted bits
-p is slow, -7 is awfully slow, -7 -p even worse

Encoding time here is ... often just too expensive. Monkey's Extra High encodes in a minute. TAK -p4m (on two quad files per signal) in two. The ALS encoder hasn't seen much optimization, and the slow ALS modes take over two hours on 22 minutes material (and much worse on the single 96/24 file). The only thing slower is ffmpeg's WavPack encoder at "-compression_level 8". No, not reference WavPack's -hhx6; ffmpeg has its own implementation, and its level "8" took 12x as much time as wavpack.exe -hhx6.

size/1024	Codec (& fileset) & setting	Remarks
476340	ALS-l-7-p-t8	150-ish minutes. -7 is the slow mode. -p is "long-term" prediction (slower)
476445	ALS-l-7-t8	... and -t8 makes full 8ch decorrelation I think?
477487	ALS4ch+4ch.-l-7-p-t4
478512	ALS-l-7-p-t4	Hm, quad decorrelation is slightly worse than splitting in quads
478654	ALS-l-7-t4
480668	ALS2ch+2ch+2ch+2ch.-l-7-p-t2
481019	ALS4ch+4ch.-l-7-p-t2
481134	ALS5.1file_AND_stereoSLSRfile-l-7-p-t2
481410	ALS-l-7-p-t2
481461	TAK4ch+4ch.-p4m_BETA232_	Saves ~ 2 percent over four stereos
481521	TAK4ch+4ch.-p4m	Takes 2 minutes
482031	ALS-l-7
482050	ALS5.1file_AND_stereoSLSRfile.-l-7
482057	ALS4ch+4ch.-l-7-t0	(This had a "-t0" by mistake, I realize)
482062	ALS4ch+4ch.-l-7-t1
482083	ALS2ch+2ch+2ch+2ch.-l-7
483200	TAK4ch+4ch.-p4_BETA232_
483263	TAK4ch+4ch.-p4
485650	TAK5.1file_AND_stereoSLSRfile.-p4m_BETA2.3.2_	The 5.1 part is ~ 1.5 percent smaller than three stereos
485712	TAK5.1file_AND_stereoSLSRfile.-p4m
486690	TAK5.1file_AND_stereoSLSRfile.-p4
488162	ALS-l-7-i	The "-i" is supposed to shut off joint stereo. But -7 is the heavy slow mode
489350	TAK2ch+2ch+2ch+2ch.-p4m_BETA232_
489415	TAK2ch+2ch+2ch+2ch.-p4m
489509	TAK5.1file_AND_stereoSLSRfile.-p2_BETA232_
489540	TAK5.1file_AND_stereoSLSRfile.-p2
490276	TAK2ch+2ch+2ch+2ch.-p4_BETA232_
490340	TAK2ch+2ch+2ch+2ch.-p4
490826	ALS-l-p-t8	No "-7", takes "only" 40 minutes.
491952	ALS-l-t8	t8 makes around 1.8 percent difference
494991	ALS-l-t4	t4 takes out 2/3rds of the "t8" effect
493132	TAK2ch+2ch+2ch+2ch.-p2_BETA232_
493164	TAK2ch+2ch+2ch+2ch.-p2
499654	ALS5.1file_AND_stereoSLSRfile.-l-t2	Half a percent over the next.
501180	ALS-l=wastedbits	This ALS encodes at TAK -p4m speed
505319	flac2ch+2ch+2ch+2ch.ktfs_irlspost-9	ktf's IRLSPOST build, here FLAC utilizes stereo decorrelation
505444	ALS-l-p-i	Uses wasted bits, but no joint stereo. "Long term"
507620	ALS-l-i	Uses wasted bits, but no joint stereo.
509926	flac2ch+2ch+2ch+2ch.-8ep
510731	flac_ktfs_irlsbeta.-9	FLAC encodes as 8x mono.
510856	flac2ch+2ch+2ch+2ch.-8e
512365	flac2ch+2ch+2ch+2ch.-8p
513169	flac5.1file_AND_stereoSLSRfile.-8e
515495	flac-8pe
516245	WavPack-hx4
516298	WavPack4ch+4ch.-hx4
516441	flac-8e
516680	WavPack-hx4j0	j0 is supposed to switch off channel decorr, does that work with 7.1 at -hx4?
517032	WavPack5.1file_AND_stereoSLSRfile.-hx4
518110	flac-8p
520595	OptimFROG2ch+2ch+2ch+2ch.--presetmax	Whiskey.Tango.Frogxtrot?!
521995	OptimFROG2ch+2ch+2ch+2ch.--preset10
523309	WavPack2ch+2ch+2ch+2ch.-hx4
523589	flac.-5
523604	OptimFROG2ch+2ch+2ch+2ch.--preset8
526597	OptimFROG2ch+2ch+2ch+2ch.--preset5
532411	OptimFROG2ch+2ch+2ch+2ch.--preset2
562507	WavPack-f
618871	MLP5.1file_AND_stereoSLSRfile	ffmpeg's MLP encoder. (Does not handle 7.1) TrueHD: 1 KB smaller

below:	CODECS W/O WASTED BITS CAPABILITY
710654	Monkey5.1file_AND_stereoSLSRfile.EXTRA	Extra high better than Insane.
711777	Monkey5.1file_AND_stereoSLSRfile.INSANE
714264	Monkey-EXTRA	Takes 57sec. Weaker than 6ch decorr + stereo decorr
714990	ALS-default	Takes 80sec. No -l, it does not utilize wasted bits.
715475	Monkey-INSANE
719188	Monkey2ch+2ch+2ch+2ch.EXTRA
719587	Monkey2ch+2ch+2ch+2ch.INSANE
757712	Monkey4ch+4ch.EXTRA
759119	Monkey4ch+4ch.INSANE
776954	tta2ch+2ch+2ch+2ch	Seems that TTA also only decorrelates stereo. Four files ...
782144	tta5.1file_AND_stereoSLSRfile	One file is stereo
797231	tta
799264	refalac	70sec. Reassigned BL,BR to make ALAC work!
802489	tta4ch+4ch	... two quads are worse than a 7.1, is that file/block overhead?
803519	refalac5.1file_AND_stereoSLSRfile
803806	refalac2ch+2ch+2ch+2ch
971694	DolbyTrueHD	Muxed out of the downloaded files. Much worse than ffmpeg's TrueHD encoder
1532054	wav	Uncompressed PCM 7.1

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #14 – 2022-04-06 19:03:49

Interesting list. Did you perhaps check whether the TrueHD original was pure MLP or a AC3 with correction?

I really don't understand how FLAC outperforms OptimFROG here. I've only had that with chiptune before. The relatively small difference between FLAC 8 channel and FLAC 4 stereo's (1%), and the modest gains achieved by the much smarter algorithms empoyed by TAK and ALS suggest that this file doesn't have much interchannel correlation to begin with.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #15 – 2022-04-06 19:32:04

Quote from: ktf on 2022-04-06 19:03:49

Did you perhaps check whether the TrueHD original was pure MLP or a AC3 with correction?

Oh. Damn. I'm not doing this over again :-o

13 minutes of the 22: ffmpeg-i says Stream #0:1(eng): Audio: truehd, 48000 Hz, 7.1, s32 (24 bit) (default)
9 minutes were .m2ts files where ffmpeg -i filename -acodec copy m2ts.mka gives something like

Code: [Select]

[SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 90k tbn
  Stream #0:2[0x1100]: Audio: truehd (AC-3 / 0x332D4341), 48000 Hz, 7.1, s32 (24 bit)
  Stream #0:3[0x1100]: Audio: ac3 (AC-3 / 0x332D4341), 48000 Hz, 5.1(side), fltp, 640 kb/s
  Stream #0:4[0x1101]: Audio: eac3 (AC-3 / 0x332D4341), 48000 Hz, 7.1, fltp, 1664 kb/s
  Stream #0:5[0x1102]: Audio: ac3 (AC-3 / 0x332D4341), 48000 Hz, 5.1(side), fltp, 640 kb/s
Output #0, matroska, to 'm2ts.mka':
  Metadata:
    encoder         : Lavf59.16.100
  Stream #0:0: Audio: truehd ([255][255][255][255] / 0xFFFFFFFF), 48000 Hz, 7.1, s32 (24 bit)
Stream mapping:
  Stream #0:2 -> #0:0 (copy)

What's it doing, really?

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #16 – 2022-04-06 20:12:58

Quote from: Porcus on 2022-04-06 19:32:04

Quote from: ktf on 2022-04-06 19:03:49
Did you perhaps check whether the TrueHD original was pure MLP or a AC3 with correction?

Oh. Damn. I'm not doing this over again :-o

That comment was mostly to put the very bad performance of TrueHD into perspective. It probably has a lossy stream as fallback embedded, which would explain why it compressed so badly.

Quote

What's it doing, really?

It copies stream 0:2 to the new (mka) file. Stream 0:2 is truehd. This doesn't tell us anything about whether that truehd stream is AC3+correction data or 'pure' MLP.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #17 – 2022-04-06 21:12:44

Yes, it doesn't tell when in .mka, you are right!

Code: [Select]

> ffmpeg -i .\Chameleon.m2ts -acodec copy -vn -sn m2ts-to.m2ts

[...] 
  Stream #0:2[0x1100]: Audio: truehd (AC-3 / 0x332D4341), 48000 Hz, 7.1, s32 (24 bit)
[...] 
  Stream #0:0: Audio: truehd (AC-3 / 0x332D4341), 48000 Hz, 7.1, s32 (24 bit)
Stream mapping:
  Stream #0:2 -> #0:0 (copy)
[...]

This one does say something about it. But then info on the output file:

Code: [Select]

> ffmpeg -i .\m2ts-to.m2ts

[...] 
  Stream #0:0[0x1100]: Audio: truehd ([131][0][0][0] / 0x0083), 48000 Hz, 7.1, s32 (24 bit)

Now "truehd (AC-3 / 0x332D4341)" has become "truehd ([131][0][0][0] / 0x0083)"

Over to Matroska:

Code: [Select]

> ffmpeg -i .\m2ts-to.m2ts -acodec copy .\m2ts-to.m2ts-to.mka

[...] 

  Stream #0:0[0x1100]: Audio: truehd ([131][0][0][0] / 0x0083), 48000 Hz, 7.1, s32 (24 bit)
  No Program
  Stream #0:1[0x1100]: Audio: ac3, 0 channels, fltp
Output #0, matroska, to '.\m2ts-to.m2ts-to.mka':
  Metadata:
    encoder         : Lavf59.16.100
  Stream #0:0: Audio: truehd ([255][255][255][255] / 0xFFFFFFFF), 48000 Hz, 7.1, s32 (24 bit)
Stream mapping:
  Stream #0:0 -> #0:0 (copy)

and info on the output file:

Code: [Select]

ffmpeg -i .\m2ts-to.m2ts-to.mka

[...]

  Stream #0:0: Audio: truehd, 48000 Hz, 7.1, s32 (24 bit)

Now it has become just "truehd". Which means that no information about AC3 does not rule out it being AC3. Oh.

Well at least it wasn't an outright transcode. That was my worry. Not that I know whether there is anything to worry about from a testing point of view. (Maybe it is? Is there any lossy that when decoded is more friendly towards lossless compressor X than Y - without it being related to wasted bits?)

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #18 – 2022-04-07 10:37:38

It's not that it's plain AC3. The question is if it's possible to find out if it's an AC3 elementary stream with a correction stream, or if it's pure MLP. Clearly, FFmpeg isn't telling you. Maybe that requires verbose output?

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #19 – 2022-04-07 15:41:42

ffmpeg -i mkvfile.mkv -acodec copy -vn -sn mkv-to.m2ts
and then
ffmpeg -loglevel 40 -hide_banner -i .\mkv-to.m2ts
yields

Code: [Select]

[mpegts @ 000002854760b3c0] max_analyze_duration 7000000 reached at 7000000 microseconds st:0
[mpegts @ 000002854760b3c0] start time for stream 1 is not set in estimate_timings_from_pts
[mpegts @ 000002854760b3c0] stream 1 : no TS found at start of file, duration not set
[mpegts @ 000002854760b3c0] Could not find codec parameters for stream 1 (Audio: ac3, 0 channels, fltp): unspecified sample rate
Consider increasing the value for the 'analyzeduration' (0) and 'probesize' (5000000) options
Input #0, mpegts, from '.\mkv-to.m2ts':
  Duration: 00:00:40.13, start: 1.400000, bitrate: 5754 kb/s
  Program 1
    Metadata:
      service_name    : Service01
      service_provider: FFmpeg
  Stream #0:0[0x1100]: Audio: truehd ([131][0][0][0] / 0x0083), 48000 Hz, 7.1, s32 (24 bit)
  No Program
  Stream #0:1[0x1100]: Audio: ac3, 0 channels, fltp
At least one output file must be specified
[AVIOContext @ 0000028547613f00] Statistics: 4886672 bytes read, 3 seeks

... whatever that means.

(Using -codec copy yields pretty much the same audio-relevant output.)

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #20 – 2022-05-02 09:21:02

Tested: near-mono stereo (CDDA)

Background: some time when I did the test on the least-compressible CD in my collection (the above Merzbow, compression figures here and some mildly shocking here) - I recalled that I also once tested the other end of my CD collection (an Édith Piaf compilation).
There I stumbling upon a known deficiency in WavPack 4: bad mono optimization, kept for compatibility.
WavPack 5 has sacrificed the stone-age decoder compatibility, so, why not do that test over again?

Closer inspection reveals the Piaf CD as almost mono indeed; only a few thousand samples differ between the channels, and never more than the LSB (difference peak at -90.31 dBTP). Differences in all tracks yes, but nine of the twelve only in the last second.
Job was probably outsourced to El Cheapo Basement Mastering.

So as that is maybe as good as "mono encoded as stereo", I generated another near-mono that isn't mono: I took the "CDDA" corpus used here - 71 tracks arbitrarily chosen by sorting my non-classical collection by track MD5 sum - and generated one file of the first 10 seconds of each, i.e. 11 min 10 seconds.
Extracted the left channel, ffmpeg-resampled it to 88.2 kHz and back to 44.1 kHz, and made a "stereo" file with the unaltered left channel and the re-resampled channel.
Result is very highly correlated channels, witnessed by encoders that can switch off stereo decorrelation: doing so with FLAC -4 to -8 and WavPack -f and WavPack default created 61 to 65 percent larger files. A bit more or less with other options (actually, WavPack -h isn't too good here).

In both I included a couple of oddball codecs - not because I believe they will be used, but to get a gut feeling for what it takes to get these kinds of signal number-crunched, compared to ordinary ones. After all, there is a long development path from "works on my small development sample" to "robust enough not to make fool out of it on someone else's music", and I am sure that the developers of the alive-and-kicking codecs know.

Test number 1: 41 minutes, most samples "mono as stereo", some LSBs differing at end of every track:

100,0%	wav	Piaf 41 minutes almost mono
.	.	dual mono & the like	.
44,4%	shn	(default)
40,4%	alac	--fast
40,1%	flac	-0
37,9%	wv-ffmpeg	ffmpeg-default	ffmpeg's WavPack runs dual mono.
36,9%	flac	-3
31,5%	7z	ultra
30,2%	als	-i	-i means dual mono. Adding -l changes nothing.
.	.	Some of these are ...	... quite underwhelming
25,4%	sac	--optimize=normal --sparse-pcm	0.01x realtime for this. Complete failure.
20,8%	wv 4.80	-hx	Old WavPack cannot cope, we knew that.
20,4%	wv 4.80	-hhx6	2x realtime
20,3%	wv-ffmpeg	ffmpeg -compression_level 8	0.263 realtime and no match for WavPack 5.
20,2%	flac	-1	-2 about the same
19,4%	tta
19,2%	wv	-fj0	New WavPack ... but shouldn't j0 be dual mono!?
18,7%	rka	-l3	RKAU's heaviest mode does not impress
18,5%	alac
18,4%	flac	-5
18,0%	la	-high-noseek	LA does not impress!
17,9%	ape	FAST
17,8%	wv	-fx
17,7%	rka	-l1	RKAU's lightest mode beats the heaviest
17,6%	la	-normal	LA normal beats high
17,4%	flac	-8	and -8p between -8 and WavPack default
17,2%	wv	default	FLAC -5 and WavPack default shouldn't beat all of the ones above here, eh? ;-)
.	.	TAK starts here	.
17,0%	tak	-p0	TAK files are in the right size order -p0, -p0e etc.
16,9%	flac	-8e	-e makes more sense on CDDA than -p
16,7%	flac	-8ep
16,5%	tak	-p0m
16,5%	flac	flake -5
16,4%	sac	--high --optimize=fast --sparse-pcm	52 hours
16,4%	flac	flaccl -11	Not as good as -8
16,4%	flac	flake -6
16,2%	flac	flaccl -8
16,0%	ofr	--preset0	--preset 0 has none of the frog's "optimizations" (compare WavPack's -x settings)
16,0%	wv	-hx
15,9%	flac	flake -8
15,9%	wv	-hx4
15,9%	flac	flake-11
15,8%	wv	-hhx6	4x realtime, is twice the speed of 4.80's -hhx6 (and saves a quarter size)
15,7%	ape	INSANE	Insane ape fooled again!
15,4%	flac	ktfs' irlspost build -9ep	also ktf's double precision build lines up here
15,4%	flac	ffmpeg-11
15,4%	flac	ffmpeg-12-cholesky12	ffmpeg wins the FLAC game, but:
15,4%	ape	NORMAL
15,4%	flac	ffmpeg-12-cholesky6	... ffmpeg-flac: 6 passes beats 12.
15,3%	ape	HIGH
15,3%	ofr	--preset1	OptimFROGs are in the right size order except 9 beats 10
15,1%	ape	EXTRA HIGH
15,1%	tak	-p1
15,1%	als	default	="-l". A few KB better than with -t2
15,0%	als	-p
15,0%	tak	-p2
14,8%	tak	-p4m	Between TAK -p2 and -p4m there is nothing but other TAK
14,4%	als	-7 -p	Smallest ALS (also tried -z3 -p, avoid)
14,4%	ofr	--preset2	This is OptimFROG's default
14,3%	sac	(default, i.e. "--normal")	< 0.5x realtime speed.
13,9%	sac	--normal --sparse-pcm	< 0.4x realtime. Only improving SAC option on this file.
13,8%	ofr	--preset6	frog-only territory from here
13,6%	ofr	--preset9	beats 10 narrowly
13,5%	ofr	--preset max	1.85x realtime

.
Comments for this corpus:

* WavPack: predictably, it now does well - though not ffmpeg's version.
* FLAC: Here ffmpeg (including Justin Ruggles, creator of original Flake) are doing something damn good.
* TAK: damn good. And, it is damn hard to find signals where TAK is not in size order.

Then the ones that work more asymmetrically, OptimFROG is pretty much where you thing it would be - --preset 9 beating --preset 10 is one of those little glitches in its machinery, but doesn't do much to the overall picture.

* Monkey's: we are getting used to Monkey's Insane getting fooled. Again I think the blocksize just isn't appropriate.
* LA, RKAU and sac: It takes more than just a run-of-the-mill development corpus!
sac in particular: It is utterly useless but for this kind of benchmarking. Use the entire battery of sac's options brings it down to 0.007x realtime. Gnawing at the same segment for forty-five minutes without a disk write. Spending four days to make a compressed CD image file that is unsuited for playback - but ... sometimes there is something to be learned: Even this brute force, where it can chew on a segment for an hour looking for the right model to encode, is quite worthless without some engineering craft. While sac's "--sparse-pcm" modelling twist might be mildly interesting, its --optimize (apparently inspired by its idol OptimFROG, and which again I think the frog took from WavPack's -x?) is of no use.
... on this signal. On the Merzbow - baffling that it is even possible to out-frog OptimFROG on a signal that Monkey's and TAK cannot distinguish from static.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #21 – 2022-05-02 09:50:22

Test number 2: 11 minutes with one channel an up-and-then-downsample of the other
That actually means most samples are different - but music-wise pretty much transparent to each other (admittedly I didn't do any listening test, but they should be). So highly correlated, but very few dual monos.

Since WavPack 4 vs WavPack 5 was one of the points of testing this, let me mention why they aren't both included in the table: Turns out the difference is small (WavPack 5 has slightly bigger overhead per block, some kilobytes - if you don't like that, just choose a bigger blocksize). Apparently, WavPack 4's deficiency in mono was in ... mono! Not in highly correlated stereo.
But WavPack is in for other surprises, as you will see.

Edit: Note again, this is 11 minutes, the previous one was 41. Read the times accordingly. Also, timings are not particularly rigorous, take them as "what ballpark".

.	.	dual mono & the like	.
71,4%	bz2
71,1%	flac	-0
71,0%	shn
68,0%	alac	--fast
67,9%	wv	-fj0	Penalty for WavPack 5: <0.1 points - and vanishing at high --blocksize setting
67,8%	wv-ffmpeg	ffmpeg-default
67,4%	flac	-3
66,6%	wv	-j0
66,6%	flac	-8p--no-mid-side	forces dual mono
66,5%	als	-i	forces dual mono
64,3%	xz		General purpose compressors beating dual mono
61,4%	7z	PPMd
59,7%	7z	LZMA2 ultra
.	.	Some of these are ...	... quite underwhelming
52,9%	la	default	Again LA disapponts
51,5%	wv	-hj0	Shouldn't j0 force dual mono?
50,8%	la	high	LA high beats default, but nothing special.
47,6%	flac	-1
44,0%	ape	FAST	Monkey's in the right order, but ...
43,7%	tta	tta
43,2%	wv	-h	That wasn't good?
43,0%	ape	NORMAL
42,8%	ape	HIGH
42,6%	wv	-hh
42,5%	ape	EXTRA
42,4%	ape	INSANE	... but every monkey beaten by WavPack -f?!
41,8%	wv	-f	-f beats -hh?! Same with 4.80
41,1%	flac	-8p	-4 to -8 between wv -f and here
40,9%	wv	-hx
40,5%	wv		WavPack defaults beats -hx
40,4%	flac	ffmpeg-12cholesky6
40,4%	wv	-x	At least -x improves
40,4%	flac	ffmpeg-8	-compression_level 8 beats ... ?
40,3%	wv	-x4	Max blocksize squeezes 0.11 points
40,2%	ffflac	ffmpeg-12
40,1%	wv	-hhx4	107 seconds
40,1%	wv-ffmpeg	-compression_level 4	183 seconds
40,1%	wv	-hhx6	230 seconds
40,0%	wv-ffmpeg	-compression_level 6	356 seconds. WavPack format 4, and beaten by WavPack.exe 4.80
40,0%	wv480	-hhx4	Only half as fast as 5.40, but still beats ffmpeg ...
40,0%	wv-ffmpeg	-compression_level 8	... except ffmpeg at speeds you do not want to endure in daily use
40,0%	alac	(refalac!)	At 7 seconds, ALAC is suprisingly good
39,9%	flake	flake -11	19 seconds
39,9%	flaccl	flaccl -11	10 seconds
39,8%	flac	flac-irls.-9	ktf's IRLSPOST build takes 66 seconds
39,7%	flac	irls.-9p	3 minutes. In between here: regular -8ep and double precision -8ep
.	.	TAK starts here	.
39,3%	tak	-p0	3 SECONDS. (And on spinning drive.)
38,7%	als	default	12 seconds. Bitexact to "-l".
38,6%	sac	--normal--optimize=normal	SAC's "optimize" only wastes hours on these 11 minutes
38,4%	tak	-p1	all TAK are nicely ordered
38,3%	rka	-l2	RKAU 2 better than 3
38,2%	ofr	--preset0	Bit-exact same file as --preset 1
38,1%	tak	-p2	TAK default shaves a point off -p0
37,9%	sac	--high--optimize=high--sparse-pcm	51 hours compressing 11 minutes.
37,8%	tak	-p4	7 seconds
37,7%	tak	-p4m	13 seconds. Two points s(h)aved off the smallest FLAC, one off the second-smallest fast codec ALS
37,4%	als	-l-7-p	1.2x realtime, like OptimFROG --preset max. -7 about same size
37,1%	ofr	--preset5	5 and 4 worse than 2 ...
36,3%	ofr	--preset2	15 seconds
36,2%	sac	--normal--sparse-pcm	"only" 26 minutes or 0.4x realtime
36,0%	sac	--high--sparse-pcm	32 minutes
35,5%	ofr	--preset3	22 seconds, and smaller files than presets up to 7
35,4%	ofr	--preset8	65 seconds to improve over --preset 3
35,1%	ofr	--preset10	150 seconds
35,0%	ofr	--presetmax	Still > 1x realtime. Tweaking options at this level yields same file

.

The "faster" (de)compressors:
* WavPack: uh, this wasn't as should be, -f beating -hh?
Also, it is most appreciated that the ffmpeg team supports WavPack encoding - but still WavPackers might as well use its default setting and then recompress with WavPack 5. Unless you use the maddest -compression_level settings, that don't really pay off that well.
* FLAC/WavPack: WavPack has only mid/side decorrelation strategy, and I have a hunch that FLAC's willingness to try different strategies is what makes better at this file.
* TTA: Not horrible, not impressive ... nobody cares?
* ALAC: !! What the f**c is it doing there? That is good!
* MPEG-4 ALS is kinda the second-best fast codec here: its default operation is faster than the slowest TAK -p4m and smaller than any FLAC/ALAC/WavPack (/tta)
* TAK: Again TAK starts where FLAC ends. (ALS? The 10 minutes ALS isn't a fast decoder either.)

The "heavier" ones:
* Monkey's (and LA?): no clever stereo decorrelation strateg - maybe a naive mid/side like WavPack? That is more of an objection to something that sets out to win the compression game, than to WavPack that has its priorities more balanced. But at least Monkey's and LA get their sizes in order of the presets this time.
* OptimFROG: surprise that its size orders are so mixed here. Preset 3 beating preset 5 by 1.6 points is quite a bit out of froggy character.
* sac: Again, a good codec takes more than CPU time, it has to be spent wisely.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #22 – 2022-05-03 11:45:35

Oh, on TAK:
* Also tested the TAK 2.3.2 beta, since ... well it isn't targetting improvements, but if the fixes applied would matter somewhere it could very well be on oddities? Nothing to write home about, max difference 9 KB on fortyfive to seventy megabytes. Although, the difference is larger on p4 than on p3 than ... etc, and none are worse (read off integer KB only though).
* Actually it isn't completely true that all TAK are in order, there is one exception: in test 1 (the most-samples-mono Édith Piaf), -p1 is slightly better than -p1e. Both in 2.3.1 and the beta. Around 0.01 percentage points.

And I see that I have included so many that they obscure the magnitudes. And also that the compression is so high that I should maybe have quoted savings in percent rather than points - well I did point out that WavPack 5 saves "a quarter" size off WavPack 4 on the essentially-mono first corpus.
But also, note how patient FLAC users can save like 16 percent over FLAC -5 on the first CD, but only four on the second signal (that is much more varied!)
While on the other hand, the savings in going FLAC to TAK or TAK to OFR are about the same percentwise ballpark in both tests. (Five plus/minus one percent (not point!) - respectively eight plus/minus one.)

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #23 – 2022-05-05 04:43:14

Thanks again Porcus for your meticulous testing! Like you say, not the most useful files for choosing a compressor, but interesting to get down to the nuts and bolts!

And you guessed right about the WavPack mono-optimization thing. It does only apply to stereo that’s 100% mono. The original omission was to not check for and make special provision for truly identical channels (i.e., encode only one channel) and we were instead encoding it as mid/side. Of course the “side” would have been total silence, and this would have been fine if the silence detector worked on individual channels, but it doesn’t. So there it is.

And you may very well be right about the second test results and the ability of some codecs (FLAC) to choose from more than left+right and mid+side. I suspect the other two are left+side and right+side and I think that actually very old WavPack 3 could do this (or at least I experimented with it at one point). In the end though I decided that the improvement was so rare and small that it didn’t justify the extra time to check.

There is one other interesting thing here, and that’s the --sparse-pcm option of sac. I believe that this refers to the situation where some PCM values are missing or there are probabilistic peculiarities in the PCM value’s distribution (the former being just a specific case of the latter).

The simplest and most universally encountered and easily handled version of this is the zeroed LSBs phenomenon that, for example, lossyWAV takes advantage of. This it trivial to implement and seems to crop up in real samples more than one would expect. Interesting fact: WavPack’s version also handles cases where the LSBs are 1s instead of 0s and the case where they’re either 1s or 0s, but still all identical for each sample. I have no idea how often those cases come up, or if other compressors bother, but it was so trivial to add that after the 0’s case that I could not resist.

Funny story is that when I submitted a patch to handle this for FFmpeg Michael resisted a little because this is unnecessarily complicating the code to handle something that should not exist. In a purist sense it’s simply wrong, which might be why some codecs refuse to deal with it all together.

But the zeroed LSB case is just the tip of the iceberg. Imagine the cases where every represented PCM value is a multiple of some other, non-power-of-two integer, like 3 or 5. This is not that different from the zero LSBs case, except it’s no longer trivial to detect, and probably not that common.

I actually wrote a program at some point to analyze PCM audio for these kinds of uneven distributions and then ran it on a sampling of thousands of CD tracks in my collection. The cases are surprisingly common. One of the most common types I found was where the PCM values had obviously been multiplied by some value greater than 1 and just truncated (not dithered) resulting in regularly spaced missing values. I was able to squeeze these out (losslessly, of course) and get significantly better compression, in some cases then blowing WavPack past all the competitors. I even went down this road pretty far devising easy ways to detect these cases and encode the “formula” to convert back to the original sample values. This was complicated by cases I found where this process had occurred twice, and there were variations on how positive and negative values were handled.

Another situation, which I also encountered, were cases where there were no missing codes but that adjacent codes that should have been equally probable, were not. So imagine a file where even values were twice as probable as odd ones. Quite a bit of entropy there to take advantage of (maybe close to half a bit per sample?) but not really obvious how to take advantage of it.

In the end I decided that this was not worthwhile. For one thing I feared that perhaps my rather old CD collection was not representative and modern mastering tools would not create audio like this any more. And of course this was going to be slow and create enormously complex code. It was an interesting exercise and it would have been fun to release a WavPack that significantly bettered existing encoders on specific files, but in the end thought better of it.

Re: "Tested": codecs for the effect of stereo decorrelation (mid/side)

Reply #24 – 2022-05-05 11:29:16

The --sparse-pcm feature of sac is also "relatively cheap" - relatively meaning a 20 to 25 percent time penalty. That isn't much compared to a factor of 50 (not percent, that would be like 5000!) for leaving it do its frog-inspired --optimize thing.
Source available - didn't see a license, but ideas are out of the bag [insert rant about certain codec license here]
Also on the Merzbow track the --sparse-pcm makes much more difference than the other options to put into it. (I wrote wrong here, should be that "Normal" mode beats "High" at compression.)

Quote from: bryant on 2022-05-05 04:43:14

But the zeroed LSB case is just the tip of the iceberg. Imagine the cases where every represented PCM value is a multiple of some other, non-power-of-two integer, like 3 or 5. This is not that different from the zero LSBs case, except it’s no longer trivial to detect, and probably not that common.

I think you mentioned this once here yes. But you only checked for integer multiples?
Hunch: what if the "last three steps" to the final 44.1/16 file are resampling to 44.1, peak normalization and dithering down to 16.
Peak normalization is scaling.

Also I have scratched my head over ... more quiet parts (= blocks) could in principle be handled by
(1) upscaling, remembering the scaling factor, and
(2) wasted bits in the upscaled signal.
Worth it? Maybe not? Had the 14 bits DTS-CDs achieved world domination, it would have been a different thing.

Quote from: bryant on 2022-05-05 04:43:14

For one thing I feared that perhaps my rather old CD collection was not representative and modern mastering tools would not create audio like this any more.

If my above hunch has anything to it, then we shouldn't even be surprised if "modern mastering tools" do this more often Edit: when hit the submit button, TBeck was already writing that he only found it in old recordings. Oh well ... that's what I get from just making layman's speculations. Which, anyway, follow unedited from here:
Modern mastering tools include the musician's own computers to an extent unheard of in 1990. And who knows whether they do things in the "correct" order, when the outcome anyway so much more than enough resolution for nothing to be audible.

Also I have scratched my head over: suppose for example I create a 16-bit signal with peak of 8 bits below digital full scale, and then pad to 24. WavPack/FLAC/TAK handle the 8 wasted LSBs. They also certainly benefit from the 8 MSBs being zero (all numbers are smaller!), but do they exploit this fully?

Padding to 24 isn't farfetched. There must be a lot of files around that once were 16 bits, but were then imported into say Audacity for a minor adjustment (like peak normalization?!) and then exported to 24 bits ... dithered at the bottom. Which means there is a linearly transformed "bunch of wasted bits signal" that differs to this by ... some noise that anyway needs to be stored as residual.
(Insert obvious analogy to least-squares fit here.)

Notice