Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Test in progress: compressing near-quiet sine tones (Read 2123 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Test in progress: compressing near-quiet sine tones

Before I spend too much time on this: is there any obvious reason why compressing low-volume sine waves should result in size differences like here?


Signals: after compressing (multichannel) silence, which seems affected by wasted bits, I went to near-silence mono: sine waves peaking at 0.000064 (so, 15th and 16th bit doing the work, undithered ... I think!)
Generated 23 frequencies, approx half an octave apart), as 48/16 mono. 30 seconds of length = 8 640 000 mono samples.

3+1 bars from two codecs (three settings of one, the green is another)


As you can see, the 10 kiloherz tone and the 20 kiloherz tone are nearly compressed away, And then there is a green peak ...

Here are two settings of a third codec:

Quite even. Except the red peak at ... the same frequency as the green above! Compressing worse, though.


Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Test in progress: compressing near-quiet sine tones

Reply #1
Before I spend too much time on this: is there any obvious reason why compressing low-volume sine waves should result in size differences like here?
For the 10kHz and 20kHz sines, I would guess the codecs in question have managed to find a predictor that can determine the next sample's value with 100% accuracy, so the residual takes up almost no space at all.

I have no idea why they're unable to come up with a good predictor for the 7kHz sine, though.

Re: Test in progress: compressing near-quiet sine tones

Reply #2
I also noticed something similar when saving sine waves as FLAC and it (also) depends on the file's sample rate.
A sine tone that is 1/4 of the sample rate compresses very well, for example a 12kHz sine wave in a 48kHz file.

Re: Test in progress: compressing near-quiet sine tones

Reply #3
If FLAC compressibility depends on audio frequency it also depends on the sampling rate, because it defines a block as a number of samples and not as a timespan. So if you double the sample rate, then each block is only half the timespan.
ffmpeg's flac will choose bigger blocks for higher sampling rates though.

But for more complex signals, sampling rate seems not to matter for reference FLAC. Tested here: https://hydrogenaud.io/index.php?topic=122056.0
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Test in progress: compressing near-quiet sine tones

Reply #4
There is one patch currently in FLAC git but not yet in any release  that massively affects the compression FLAC can reach on sines: this one. Actually the whole reason I started looking for compression improvements in FLAC is because it had abysmal performance on simple inputs like sines.

The reason for this lies in the way FLAC does some linear algebra. For very clean inputs (like sines) the process FLAC uses to find a predictor gets extremely sensitive to rounding errors. Increasing precision reduces the rounding errors and therefore improves compression. This patch therefore increases compression the most on clean signals like a single piano playing, or in fact a sine wave, while not changing anything on music which has a spectrum similar to white noise like metal (especially tracks with crash cymbals being hit a lot).
Music: sounds arranged such that they construct feelings.

Re: Test in progress: compressing near-quiet sine tones

Reply #5
Well, FLAC's "abysmal" performance isn't that bad in this test. But note again, the signals are very quiet. What I set out to test was actually whether there were big differences in "top N bits all zero" even if there is no outright "wasted bits" feature for the most significant ones. Then something strange showed up.

* The top chart above repeated here: three TAKs (p0, p2, p4m) and then ALAC. ALAC compresses bass well, but behaves even stranger at 7, 10, 14 and 20 kiloherz.


* Monkey's "fast" and "insane". Insane having the insane red leap at 7k.


* Here is another animal known for leaping. The blue and red frogs are --preset 0 and --preset 2. Of course the high presets are nearly invisible, but there is some variation that it is "good an octave off a good" etc.


* A couple of WavPack settings: -h, -hx4, -hx6. Look how the 20k signal is nearly compressed away:

 
* And then it was FLAC.
The blue is -5, and the rest are -8pe with respective blocksizes 256 (the red bad), 1152, 4096 (=default), 16384 and 32768. But you see something happening at 10000 and 20000 here too, like on the top TAK/TAK/TAK/ALAC chart.



Not sure if FLAC is the most puzzling here.
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Test in progress: compressing near-quiet sine tones

Reply #6
A couple more:

MPEG-4 ALS (default) is nicely flat around 5. Half size of ape.
WavPack -fx6 does not catch the 20 kiloherz tone like -hx4 -hx6 -hhx6 do. So it is not only the "x".
LA dips down to nearly compress away 5k/10k/20k like high OptimFROG - although not as impressive.
TTA compresses bass like FLAC/TAK but overall it was the only thing possibly worse than an insane monkey.

Edit: Of course general purpose compressors squeeze these files to pretty much nothing, but since I have NTFS-compressed my "silence & near-silence test folder": NTFS compression (of Windows 10) gets 19 of 23 down to 8 to 1 - i.e. compresses away the all-zero 14 MSBs.  Then using /exe:lzx , the overall size is 1/20th of that again, down to around .7 percent (guesstimating the impact of rounding-up to file cluster size!).  That is half the size of OptimFROG default, but 4x the size of OptimFROG --preset 8.


I guess I don't bother to work more on this.
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Test in progress: compressing near-quiet sine tones

Reply #7
* The top chart above repeated here: three TAKs (p0, p2, p4m) and then ALAC. ALAC compresses bass well, but behaves even stranger at 7, 10, 14 and 20 kiloherz.
I'd say this is pretty strong evidence that TAK uses a linear predictor, similar to ALAC and FLAC.

It's interesting that FLAC was unable to come up with a predictor of the same quality as ALAC, given the similarity between the two. I'm not sure if this is a limitation of the encoder or the format, but I recall some recent conversations about increasing the encoder's precision when coming up with predictors...

Re: Test in progress: compressing near-quiet sine tones

Reply #8
FLAC encoder matters, but ... (my, I said I shouldn't do more of this, rehab is for quitters)

In Reply #4, @ktf mentions an improvement in the works. But, his double precision build posted in this thread doesn't matter much. His IRLS build in the same thread squeezes a little bit.

FLAC bitrates as reported by fb2k:
58 kbit/s for ktf's IRLS build at -9pe -b 32768.  -9pe beats -8pe at every file, but only by .2 percent or so.
59 kbit/s for -9pe, for 1.3.4 -8pe and for ktf's double precision build -8pe
64 for CUETools Flake --lax -11. It is the 10k and 20k signals where it does worse (twice as bad!) than reference
Now look at these:
86 for FLACCL --lax -11
86 for ffmpeg -compression_level 12 -lpc_type cholesky -lpc_passes 4, should do something similar to IRLS -9. Number of passes 2/4/6/10 don't matter much, but look at 11 below.
89 for ffmpeg -compression_level 12. It is all fine that lpc_passes squeeze out a little bit.
103 for ffmpeg -compression_level 12 -lpc_type cholesky -lpc_passes 11. 4 to 10 passes give 86, but the 11th just hurts the bass signals big time. The 12th even more.

Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Test in progress: compressing near-quiet sine tones

Reply #9
So the difference in bitrate between different flac.exe settings, seems to be up in the treble.
-0 is 76 and skyrockets at the top kHz, still beating flaccl and ffmpeg
-4 to -8 are all 72 and the same as -8.
-3e to -5e are 63
-8e is 61 and so is -3pe (this is mono!).


But between different FLAC implementations, the difference is the bass.  Isolating the eight 20 to 110 Hz files and quoting bitrate:

50 for 1.3.4 (whether it is -2 or -8pe) and flake
90 for flaccl and for ffmpeg both at default, at -compression_level 12, and with lpc_passes 2 to 10 on top -compression_level 12
147 for that lpc_passes 11
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Test in progress: compressing near-quiet sine tones

Reply #10
In Reply #4, @ktf mentions an improvement in the works. But, his double precision build posted in this thread doesn't matter much. His IRLS build in the same thread squeezes a little bit.
Ok, I'm guessing it doesn't work here because the quantization error makes sure the signal is not too clean to be properly coded by FLAC.
Music: sounds arranged such that they construct feelings.

Re: Test in progress: compressing near-quiet sine tones

Reply #11
But yeah, I have fired sine waves at FLAC before.

This time the purpose was initially to test "wasted MSBs" - I mean, they don't have an "explicit facility for it" so therefore, how do they handle it?
Hence the low volume. And since it all turned out to be very sensitive to frequency, I got a totally new rabbit hole around my head.


Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Test in progress: compressing near-quiet sine tones

Reply #12
Sensitive to frequency may also be because it doesn't have a really great way to handle the LPC needing to be higher time frequency than the actual sample rate to account for the resulting aliasing or anti-aliasing from the signal not lining up perfectly with the samples.

There would therefore be an effective need for a resampler which is entirely in integer space, using a specific algorithm, and any tables would need to be pre-calculated using a consistent encoding process to produce consistent compressed data. This reproducible resampler would be used to calculate intersample values for the waveform, to determine if an intersample LPC could be used at an even multiple of the actual sample rate, to produce a more consistent waveform for the LPC to track, and thus produce a more compressible output file.

The resulting LPC may or may not need the same resampler to downsample its output by the same factor as the upsample used to produce it, and would produce the signal that the final output residue corrects against. It should be determined whether skipping LPC output samples, or downsampling it using the same algorithm as input, produces a better signal that produces a finer residue that compresses better.

Naturally, this process would add extra complexity to the encode process, and should really only be used for highly tonal signals.

Re: Test in progress: compressing near-quiet sine tones

Reply #13
This time the purpose was initially to test "wasted MSBs" - I mean, they don't have an "explicit facility for it" so therefore, how do they handle it?
Most of the data in a typical FLAC file is the residual. FLAC encodes the residual with Rice codes, which include a unary portion that indicates the most significant used bit in a depth-agnostic way, so FLAC is largely unaffected by changes in bit depth. ALAC works in much the same way, although ALAC uses adaptive Rice codes instead of the fixed per-partition Rice codes in FLAC.

And since it all turned out to be very sensitive to frequency, I got a totally new rabbit hole around my head.
FLAC and ALAC (and presumably TAK) use a linear predictor, which is a fancy way of saying each sample is predicted by adding up fixed multiples of the most recent samples. FLAC's predictor is constant throughout each block, whereas ALAC's predictor is adaptive and updates its coefficients throughout each block.

FLAC can use up to 32 of the most recent samples for prediction, but subset FLAC at 48kHz or lower is limited to the most recent 12 samples. ALAC can also use up to 32 of the most recent samples; I'm not sure if there are any situations where the encoder might be forced to use fewer.

While digging through FFMPEG to learn more about ALAC, I figured out why its files are so much smaller than FLAC despite being so similar: ALAC can use run-length encoding to compress long runs of zero residual to less than 1 bit per sample, and FLAC can't. (Rice codes are always at least one bit long.) With the 10kHz and 20kHz sines, the signal repeats often enough that the predictor is 100% accurate and the residual is entirely zero, so it's coded as a single long run. For lower frequency sines, there are long stretches of zero residual between each change in amplitude which are also compressed this way.

I'm not sure how TAK encodes the residual. It must be doing something more clever than FLAC if it can achieve less than one bit per sample, but it's tough to guess what it could be...

Re: Test in progress: compressing near-quiet sine tones

Reply #14
Zero residual ... yeah, that is a point. 1 bit per sample is 1/16th is pretty close to the minimum bitrate you see for FLAC in the above chart.
FLAC actually started out encouraging more residual coding methods to be added later. I have a hunch that will never happen ...


The order is not enough to explain much. As for TAK, it uses variable-order predictor, where the default -p2 can use up to 32 if I have understood correctly, so it should then beat ALAC (it doesn't until midrange) - however, that argument isn't foolproof, as TAK trades off order against speed, and could miss cases where higher order would make miracles.

Order 32 with FLAC on these files: Adding "--lax -l 32" to flac -8pe changes all the files, but
* files up to & including 1768 kHz are to the byte same size, and same 3536 and with 10k and up
* the rest become smaller, and most dramatically: the 5 kHz file drops from 278 220 to 200 616.
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

 

Re: Test in progress: compressing near-quiet sine tones

Reply #15
Zero residual ... yeah, that is a point. 1 bit per sample is 1/16th is pretty close to the minimum bitrate you see for FLAC in the above chart.
1/16th definitely because it is mono - but as far as I can read from the format specification, it is 1/16th for any 16-bit signal regardless of channel count? (Huh, even the format specification solicits development of other-than-Rice encodings.)


And I guess @TBeck is busy 64-bitting up TAK, but if he has time to explain what it does and maybe correct uninformed n00bs.
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Test in progress: compressing near-quiet sine tones

Reply #16
FLAC actually started out encouraging more residual coding methods to be added later. I have a hunch that will never happen ...
As adding methods would break compatibility with existing decoders that doesn't seem like a good idea. Maybe if FLAC would ever support floats or DSD these reserved bits can be used yes, but definitely not for CDDA.

1/16th definitely because it is mono - but as far as I can read from the format specification, it is 1/16th for any 16-bit signal regardless of channel count?
Minimum FLAC bitrate is 1bit/sample, so 88.2kbit/s for CDDA. Only when the signal is completely constant this can be lower.

Quote
(Huh, even the format specification solicits development of other-than-Rice encodings.)
Yes, that was probably added when FLAC was all new and shiny and never removed.
Music: sounds arranged such that they construct feelings.

Re: Test in progress: compressing near-quiet sine tones

Reply #17
1/16th definitely because it is mono - but as far as I can read from the format specification, it is 1/16th for any 16-bit signal regardless of channel count?
There's one subframe per channel, and each subframe is coded independently, so you can get below 1/16 if many of the subframes are constant. For example, mono encoded as stereo will be decorrelated into a mid channel and a silent side channel, giving a minimum of 1/32 instead of 1/16.

Re: Test in progress: compressing near-quiet sine tones

Reply #18
While digging through FFMPEG to learn more about ALAC, I figured out why its files are so much smaller than FLAC despite being so similar: ALAC can use run-length encoding to compress long runs of zero residual to less than 1 bit per sample, and FLAC can't. (Rice codes are always at least one bit long.)

Just thought over this: https://hydrogenaud.io/index.php/topic,122222.msg1008886.html#msg1008886
Quoting myself:
I found out how horrible ALAC is at compressing silence. (Like, it does not get below 1/3 of uncompressed PCM - that's mono though. And then NTFS compression gets the the .m4a down by 94 percent.)

And yet here, ALAC does quite well.

Codecs are engineering compromises, but obviously not everything is ... engineered at all. Like how the FLAC spec still encourages people to do something different than Rice - never going to be implemented. But runlength isn't witchcraft - or is it? The above observation that ALAC outcompresses TAK at bass ...

(I am not at all saying that anyone wants to listen to sine tones, so they are maybe not so interesting musically. But, being a lazy ass who cannot do software development and therefore is entitled to point out the "mistakes" of everyone who does actual hard work (yes that was an irony!) - I would have thought that when developing a lossless codec, probably even at a very early stage, one would throw a bunch of sines at it to see, "can it handle the simple stuff with no nasty surprises?")
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Test in progress: compressing near-quiet sine tones

Reply #19
I also noticed something similar when saving sine waves as FLAC and it (also) depends on the file's sample rate.
A sine tone that is 1/4 of the sample rate compresses very well, for example a 12kHz sine wave in a 48kHz file.
This part should be easier to explain from a non-audio specific aspect. For example, In the screenshot below, left channel is 6kHz sine and right channel is 4567Hz sine, with 48kHz sample rate at 16-bit. Both tones are not dithered. You can see that the 6kHz tone only uses 5 amplitude values in the whole file. 4567 on the other hand is a prime number which uses much more amplitude values and resulting in much fewer repetitions. In the FFT view, the 6kHz tone's quantization errors are all accumulated at 18kHz, while the 4567Hz one has quantization errors spreading throughout the whole spectrum more evenly due to the longer repetition sequence.
X