Tested: codec performance on upsampled ("fake hi-res") material & more

Topic: Tested: codec performance on upsampled ("fake hi-res") material & more (Read 868 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Tested: codec performance on upsampled ("fake hi-res") material & more

2024-03-06 19:00:52

What this is?
Like tested on FLAC in https://hydrogenaud.io/index.php/topic,123025.msg1034721.html#msg1034721 , I created upsamples from my usual CDs (less than full CDs this time).
Objective: Test how lossless codecs fare on such "fake high resolution" signals. Part of that was because we have seen that reference flac's "-e" switch makes much more impact on those - now, is that because the encoder does bad? Spoiler alert: it does actually quite good when up against the competition.

This time I did in addition to the upsamples: also created pitch shifted files to achieve "about as many empty octaves at the top", keeping the tempo, and sample rate. To make something analogous to upsampling by 1.1 resp 2.2 resp 3.7 octaves, I transposed down around similarly, so the most extreme got their highest frequencies down from 22 kHz to around 1660 + filter rolloff.

Both compression rounds found some surprises - including, how the results differed.

So I split this into four:
(0) This introduction with some considerations common to the rest.
(1) Results for upsampled material: From CDDA to 44.1/24 (resampled with dithering), 96/24, 192/24 and - unlike the thread above - also going as far as 576/24.
(2) Then transposed material. This time only 16 bits.
(3) Particular considerations for FLAC: the impact of the 5-bit Rice method from 2007, and of the precision improvement from 1.3.x to 1.4.x.

Why do this at all?

We have long experience with CDDA resolution, but there is "something weird about all high resolution". What's up there in the ultrasonic range? Clipping artefacts, tape hiss or other noise? Tape bias? Or nothing because it is artificially created to be sold as high resolution? Or maybe: nothing, masked by dithering?
The first of the tests set forth to emulate the latter in a "controlled" manner: we do not know as well what a "representative" hi-res corpus is, as there is both the actual audible music and all these unknown inaudibles. And in @ktf's comparison studies, there are not that many high resolution signals - and we know that performance differs quite significantly over them; like, compare WavPack's performance on pages 4 and 7 here: http://audiograaf.nl/losslesstest/revision%206/All%20hi-res%20sources.pdf . The latter is most likely due to WavPack's extended handling of "wasted bits" (more here https://hydrogenaud.io/index.php/topic,121770.msg1024616.html#msg1024616 and reply 47).
What I am doing, is not at all to strive for anything "balanced" or "representative", but explore in a "sort of controlled" way one kind of "high resolution", namely fake high resolution - and we know that there is quite some of it out there.
After having done that, I thought, what happens if instead I push the music down in pitch and create an "empty octave" or three within the audible range? But keeping the sampling rate? I don't know whether the required DSPing would guaranteed clean the top octave(s) of any signal, and it is not going to create precisely the same artefacts in the music either - does that matter? If it does, then "maybe my first test will fail if fake high resolution is 'faked' in a different way" than I did.
Then there was a motivation from testing the improvements in flac 1.4.0. Some of the improvements we know, and the flac-to-flac comparisons will be covered separately, but how good are the codecs against each other?
Especially when taking it to the extreme.

It turns out that some observations are in line with CDDA, some are in line with "earlier surprises", but some may be up to scratching one's head over.

What signals?
The first ten and a half minute from each of my 38 CDDA's. (The shortest of those EPs was only ten and a half minutes already.)
So unlike my other tests where it is weighted by duration, this is same length per work. Matching the total length of the shortest one, the Jordan Rudess charity EP.
The impact of the resampling is so big that there isn't so much sense in discussing genre, but surely the difficult signals still stuck out: harpsichord, thrash metal live recording, ...

All the 38 signals, in the 96/24 upsample version. Would you buy high-rez from this vendor?

Edit: Unsorted imgur gallery at https://imgur.com/a/WoCwnUO

Re: Tested: codec performance on upsampled ("fake hi-res") material & more

Reply #1 – 2024-03-06 19:01:17

(1) Upsamples.

Signals treated as follows:
* As-is, i.e. CDDA just cut down in length
* Then 44.1/24, obtained by going "back and forth" a resampling like the three following:
* sox-resampled (to 24 bits with dither) to 96, 192 and 576 kHz.

You may ask about wasted bits. No, nothing such: tested by MPEG-4 ALS which can switch that feature, and it makes less than a kB difference per file.

The charts may be information overload, so let me first omit several curves and take the best compressor (OptimFROG --preset max) as benchmark.
There is a brown-dotted curve for compression size in % for OptimFROG max and a fixed-predictor (= fast but "weak") flac.
Everything else is in log2(file size / frog file size). That means:
* A difference of 1 means double the file size. Yes, it is that extreme.
* An increasing curve means: going to higher upsampling, it it loses even more (in percentage terms) to the frog:
The steeper the curve, the worse it exploits the waste of space. When you see the weakest FLAC ending up as flat as OptimFROG --preset 2, it means that the filesize grows more in bytes, but both grow by about the same percentage from the respective 192 size. (45 vs 48 percent, for tripling WAVE size.)
* Why they are clustered for 44.1/24 compared to CDDA?
Because resampling 16 to 24 bits makes for eight more bits with a lot of noise in them - and another quite incompressible byte will push every file ratio closer to 1, thus all the ratios closer to each other. And thereby: the logs closer to 0.
I put the y axis where it is because the CDDA likely is the least interesting in the comparison.

576 gives some WTFs.
* The simple fixed-predictor FLAC setting does quite good - especially from 192 to 576, there it is the second flattest.
* ... and it creates smaller files than any Monkey (also, not graphed, Extra High)
* Also WavPack -hhx4 dislikes this signal. It narrowly beats flac -5 in the end. (Also -hhx6 loses to flac -8p.)
* ALS -7, another extremely slow runner, ends up being beaten by ALS default.

Of course, cusps may look "bad" or "good" due to the whims of the reference; and actually, even OptimFROG --preset max struggles to kick the waste of space from 192 to 576 (though it does very well up to 192).
Then on the other hand, if the curve gets steeper, you don't easily spot how it convexifies.

So let me use a different reference to illustrate that - and include several more curves, at the risk of information overflow.
It was tempting to use that flac fixed predictor setting, but ... hm, nah, I chose one more likely to be used by those who care about compression levels. But afterwards I saw that 1.3.4 -5 followed it quite closely.
Anyway, if you ask why I didn't use "-0" straight? -r0 to keep it simple and "not too good", it sometimes happens that the partitioning does some magic, and I wanted to avoid that - to see what a simple codec would do without such "tricks" that on rare occasions makes FLAC outcompress very heavy codecs. Block size 4096 because all the other flacs (except ffmpeg) use that.

So, coming up:
flac -8p as reference.

Several more curves also, including ffmpeg's ALAC and WavPack at the bad end. (Even though for the latter I gave it optimize_mono and allowed it to use joint stereo automatically.)
* TAK does not support 576, but it seems to handle the material well in the default and up. Its weakest mode does not have enough juice for it; I used -p0m to if the additional processing would help (it did, but not dramatically).
* WavPack is not happy about this, but at least the patient user can save bytes by x4 and up.
* Nothing such possible for Monkey's and TTA. Monkey's Extra High is narrowly beaten by Insane, so these are not among those signals that fool the latter.
* ALAC is worst overall. It isn't that ALAC is universally horrible on high resolution: ALAC could do quite well on this 768 kHz publicity stunt file: https://hydrogenaud.io/index.php/topic,122179.msg1011846.html#msg1011846 It was recorded digitally, transferred to analog tape, and then digitized, so there is certainly some ultrasonic noise in it. Here it is not, and the codec cannot exploit the waste of space.

And I threw in a couple more flac curves.

Re: Tested: codec performance on upsampled ("fake hi-res") material & more

Reply #2 – 2024-03-06 19:02:58

(2) Pitch shift.
Signals treated with ffmpeg -i infile -filter:a "rubberband=pitch=<some number>". That shit is buggy and returns a much longer files if the number is low, so the biggest transposition was done in two steps (from one of the other)
Numbers used to match upper frequency to 9999, 4999, 1666. I.e. 0.45347 etc.
Unlike the upsampling, it was kept at 16 bits because that is what ffmpeg defaults to. Hence one column less.

Since all files are "CDDA", I could include TAK all the way. Here are a few:

The graph could be misleading when compared to the previous! It is effectively "only half as tall" as the upsample; it goes to 1.5 rather than 2.5, and the difference of 1 is a doubling in file size.

Look at the most extreme file, the rightmost data points:
* OptimFROG --preset max compressed the most extreme file to 7.8 % of WAVE; for the most extreme upsample, the 576/24, it was 8.1 % of WAVE.
* The FLAC curves now end up clustered at 14 percent of WAVE for the most extreme signal. That is about the same as 1.4.3 had for the upsample, except the -0b was as high as 20.
* WavPack -hhx4 is also around the same, though actually slightly worse percentage: 15.2% of WAVE, was 15.0 for the upsample.
And the ones that fare worse on the graph, they are compressing better on this signal than on the upsample.
* Monkey's compress to 13.8 - Normal beats Insane here. For the 576/24, it was 22.6 to 24.5.
* ALAC and WavPack -hx end up around 17 percent - which is much better than the 45 and 34 percent of WAVE they scored at the upsample.

Rather than posting another graph with more, I might explain where they would be found:
* TTA and ffmpeg-flac somewhat worse than flac -5, and the ultra-slow flac a bit better as expected (beating every Monkey, that is maybe not equally expected)
* Old FLAC does worse ... except in the end, where they converge.
* Also in the end, the two ALAC implementations converge - slightly surprisingly, ffmpeg winning.

I did several more tests with FLAC specifically. I'll leave that the next post.

Anyway, these pitch-shifted signals behave much more normal. Or possibly: More codecs behave more normal on these.

Re: Tested: codec performance on upsampled ("fake hi-res") material & more

Reply #3 – 2024-03-06 19:06:39

(3) FLAC-centric considerations

This was a heads-up. Sure we can point at -e making big savings in bytes per file - but when you are already serving files that shave 1/3 off a Monkey, then we should maybe rather applaud than complain.
I will go a bit more into the versions, especially for the most extreme cases - and also the settings.

Using 1.3.4 -5 for reference when I used 1.4.3 -8p above ... could be questioned. Oh well.

Upsamples:

For the ultra-high upsample, FLAC maintains the "expected size order": Most juice --> smallest files; Fixed predictors --> biggest files; New -5 beats ffmpeg beats 1.3.4.
One exception is that -5l32 loses to -5. Again, suggesting that the encoder's model selection algorithm isn't quite ready for all the way up to order 32. As we have pointed out earlier - but, compared to what other codecs do at these signals ... nothing to whine too loud over?

More in these graphs (edit: fixed diagram ... still it says "same" scale, but fails to tell you it does not go as high up):

1.4.x, with its double-precision calculations in the predictor, makes a big impact on upsamples. "Already" at 96/24, 1.4.3 at -5 compresses to 18 percent smaller files than 1.3.4 at -5.
1.3.4 in turn, improved file size by 0.001% to 0.010% over 1.2.1. Still -5 vs -5.
Then a mild surprise: 1.2.1 vs 1.1.4. New residual compression method. No, not just new algorithm, but allowing for the 5-bit Rice method of storing the residual. Of course that helps ... sometimes.
48/24 improved by 8 percent. But above that? 96/24 only by 0.16 percent. (That's 0.07 percentage points, relative to WAVE.)

On to the pitch transposes. Digging slightly deeper into it, there are some surprises that are not visible on the following graphs:

* Pitch shifted down by ~ an octave: As expected.
The -8xx produce smallest files. (And with one exception, the -8el32 -r8 does best. While -8p typically beats -8e on CDDA, the -8el32 can go far outside subset.)
1.4.3 beats ffmpeg overall and on 32 of 38 files (the six exceptions are metal)
And the dual mono fixed predictors (-0something) produces biggest files with one exception, beating ffmpeg - and 1.3.4 second-biggest.

* Transpose down another octave: No drastic changes.

* But when transposed down 3.7 octaves - highest frequency now 1667 Hz if not for the filter artifacts:
The -8e and -8el32r8 are the two best for 25 of the 38 clips.
But for the remaining 13 - that includes 11 of 14 classical clips (all but the harpsichord, the Valen 12-tone music, Cage percussion) and Jan Johansson (piano jazz) and Tom Waits:
- those two settings (-8e with and without -l32 -r8 --lax) compress the worst except for ffmpeg
- and -5 compresses best, beating -8p.
But those are the narrow calls. Overall for classical music, the -8e's still win. But over the classical music part of the corpus (the leftmost in the spectrogram), fixed predictors beat -8p. File sizes classical music only to follow, worst to best:
208 010 951 bytes for ffmpeg
183 713 323 bytes for -8p
183 660 550 bytes for -0b4096 -r0
183 657 767 bytes for 1.3.4 at -5
183 611 568 bytes for -5
182 214 901 bytes for -8e
181 921 171 bytes for -8el32 -r8

Strange. Not big effects, but still, spawning questions like: Does the model selection algorithm estimate how good fixed predictors work?

I also tried older flac.exe, not graphed: 1.1.4 -5 compressed to a kilobyte-ish worse than 1.3.4 -5, and 1.2.1 another kilobyte-ish worse - but still better than 1.4.3 at -0b4096 -r0.

All files had no tags and no padding.

Re: Tested: codec performance on upsampled ("fake hi-res") material & more

Reply #4 – 2024-03-08 18:55:57

I'm not yet quite sure what to make of this. It is a lot of data. Indeed surprising that while codecs perform in a pretty narrow band on CDDA (within 10% of each other) the differences grow to 500% with extreme lowpass.

Of course, we also see such extremes on other 'specific signals' where some codecs are equipped to handle them and other aren't. Among them wasted bits and sparse PCM. However, those two are really phenomenons that are purely in the digital domain which isn't the case for this lowpass.

What I noticed when working on the FLAC precision increase (single to double precision), was that very smooth signals like sine waves tend to push the linear algebra the encoder uses towards singularities. I assume the same happens for lowpassed signals. At some point the rounding errors get a significant grip on the predictor. At some point the numeric precision is just insufficient.

Re: Tested: codec performance on upsampled ("fake hi-res") material & more

Reply #5 – 2024-03-10 13:57:39

Yeah, it is hard to get around all this. Especially, what is going to be "smoother" from the flac.exe algorithm's point of view?
(Hunch: You did try even higher precision than double? If there's a build declaring it as much more, then pass it over and I'll test. But flac the format limits the precision of the coefficients themselves.)

For the upsamples, a different visualization:
Here is another plot where for each codec, the performance appears not to vary so much when the bloat increases - with the visible expectations. This is essentially the slope of compression curves: d(file size) / d(WAVE file size).
Idea behind this representation
* Upsampling essentially puts in a lot of "samples with no information in it [except for the dither?]", and in principle that is just bloat that could be compressed away. (OptimFROG --preset max actually compresses away more than 23/24ths of the bloat.)
* The following graph shows how well this extra bloat is compressed away, for the 24 bit files: If you follow the black TTA plot
Going up from 44.1/24 to 96/24 adds ~7 GiB to WAVE size, and TTA compresses those down to 29 percent (i.e. compresses away 71), calculated by noting that compressed file size increases by 2.0 GiB.
At the other end, going up from 192/24 to 576/24 means 51 GiB extra WAVE, and compressed file size increases by 12.6 GiB over the 192. That is 25 percent.
* One "would" expect the curves to be reasonably close to horizontal, and most of them are. Maybe there is a natural explanation why the leftmost point is higher for every codec, in that 44.1 was resampled twice (I found out that asking sox to resample to source sample rate, does nothing). So maybe that is slightly deceiving.
* But, but: flac -0r0 is a quite simple thing, and does not behave "horizontal" at all:

In comparison: up from 44.1/16 to 44.1/24 by resampling twice, leaves a lot of incompressible noise in the last 8 bits, and only one codec manages to compress away as much as 9 percent of the extra bytes. (And that could be because it isn't particularly good on CDDA.)

First a big bad chart: New to FLAC is that I included -0me -b4096 -r0, to verify that stereo doesn't make much difference, and that -0 doesn't pick a too bad one among the fixed predictors:

Zooming in on the better ones, those who at least manage to compress some of the bloat down to 18 percent or better (making Monkey's Insane the best that doesn't make it into the chart):

Why flac -0r0 does so much better on heavier upsamples ... it could be that the upsampling algorithm then works "very fixed-predictor friendly" when upsampling as much as 4.35x or 13x ... speculations!
* Some tilt upwards in the end. That makes some impact in the plots of reply#2, because the "extra bloat from 192 to 576" is 60 percent of WAVE size. WavPack -hhx4 increases from 9.53% to 11.05%, and the 1.52 points increase makes for 0.78 GiB extra. But also the slowest flac must up by 1.2 percentage points. Which is something when this 51 GB WAVE bloat only fattens the slowest frog by 1.7 GB.
* TAK ... the bloat introduced by doubling WAVE size from 96 to 192, is apparently compressed better with -p2 than with -p4m. It may of course be that -p2 "misses exploitable patterns" in the 96/24 and finds them in 192 ... hard to tell.

Re: Tested: codec performance on upsampled ("fake hi-res") material & more

Reply #6 – 2024-03-10 14:07:42

Oh, and for flac: I guess the hard thing is to explain why fixed predictors end up doing so well.
The higher settings end up close to the fixed predictors, and they resort to using fixed predictors to a larger degree.

I can give some stats for FIXED subframes. There is not a single VERBATIM subframe among those which follow.
Numbers are for the slowest setting -8el32 -r8, and then for 1.1.4 -5 because I accidentally overwrote the 1.3.4 after compiling the numbers. But 1.1.4 sizes and 1.3.4 sizes virtually the same for the following cases - only for 44.1/24 does the absence of 5-bit Rice make for anything to write about (ten percent larger files), for 94/24 and up it seems not to matter, likely because the signals are indeed easier to predict and the residuals get so small that a Golomb exponent of 15 is enough.

CDDA and pitch shifts are 512 924 subframes, of which around 2000 are CONSTANT, not much varying:
CDDA: 9315 FIXED subframes, so around 500k. 1.1.4 -5 has more FIXED subframes, 38 807 of them.
Two octaves pitch shift: 58642 FIXED (that's much more). But 1.1.4 -5 has as much as 403 366 FIXED, nearly eighty percent.
Most extreme pitch shift: 166764 FIXED - that's a third of them, but way short of the > 95 percent that 1.1.4 -5 produces (489 795).

Upsamples:
192 upsample has 2 233 032 subframes. 4 FIXED. 9k CONSTANT. Old -5 has eighty percent FIXED subframes.
576 upsample has 6 698 944 subframes. 481 FIXED. 27k CONSTANT. Old -5 is down to only 42k LPC subframes.

Wasted bits, just to clear out that it isn't much to talk about:
15 subframes in the 192k upsample, 64 subframes in the 576k.

Notice