Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Lossless codecs in frequency domain? (Read 19025 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Lossless codecs in frequency domain?

Reply #25
Quote
How much correlation is there (typically) left for an ideal lossless encoder that does not have to take (as many) practical considerations?

Depends on how you determine the correlation. An easy and interesting task is to fire up Matlab/Octave and apply a time-variant LPC filter of, say, order 16 to your input audio signal. If you listen to the result (the prediction residual, the signal that lossless encoders code), you'll notice that a) it's much quieter than the original, b) you can still recognize the original music/speech. b) implies that there is still quite some correlation left, but usually this remaining correlation has very little significance during compression. IOW, it's usually not worth removing the remaining correlation.

Chris
If I don't reply to your reply, it means I agree with you.

Lossless codecs in frequency domain?

Reply #26
Quote
How much correlation is there (typically) left for an ideal lossless encoder that does not have to take (as many) practical considerations?

Depends on how you determine the correlation. An easy and interesting task is to fire up Matlab/Octave and apply a time-variant LPC filter of, say, order 16 to your input audio signal. If you listen to the result (the prediction residual, the signal that lossless encoders code), you'll notice that a) it's much quieter than the original, b) you can still recognize the original music/speech. b) implies that there is still quite some correlation left, but usually this remaining correlation has very little significance during compression. IOW, it's usually not worth removing the remaining correlation.

Chris

So I should probably have asked "is there some practical method for determining the 'true entropy' of a finite-length file?", assuming that it is the result of a stochastic process.

How does a time-variant order-16 LPC compare to the decorrelation in FLAC?

When you say that the remaining audible correlation is insiginificant for compression, is that because it is small, or because our imperfect man-made algorithms are not able to exploit it properly?
 
-k

Lossless codecs in frequency domain?

Reply #27
I don't think you can ever measure a "true" entropy (i guess not in polynomial time). Take this example:

I make a C program using a standard rand() function seeded using some value, then produce a binary file where I just dump one billion of pseudorandom numbers.
If I gave you the file, you would have no idea how the file was created and would probably fail to compress it much (if at all).
However, I could easily "remember" the whole file, I just need the seed and the size...

If you could easily determine/identify/describe the underlying process (system) which generated any observed output sequence (a song, for example), you could probably reach maximum entropy while compressing it. Since this is next to impossible on a general level, you're out of luck.

Lossless codecs in frequency domain?

Reply #28
I don't think you can ever measure a "true" entropy (i guess not in polynomial time). Take this example:

I make a C program using a standard rand() function seeded using some value, then produce a binary file where I just dump one billion of pseudorandom numbers.
If I gave you the file, you would have no idea how the file was created and would probably fail to compress it much (if at all).
However, I could easily "remember" the whole file, I just need the seed and the size...

If you could easily determine/identify/describe the underlying process (system) which generated any observed output sequence (a song, for example), you could probably reach maximum entropy while compressing it. Since this is next to impossible on a general level, you're out of luck.

But your example is more like table-lookup is it not? A given "random" generator might return some output from a small seed that for all purposes concerning us here is "random" but really is not. A lossless audio "encoder" could identify a file as "track#7 of this and that CD", simply transmitting a short index to the decoder (assuming that the decoder has access to all CDs ever made). That is kind of trivial, but less useful I think.

The question is if there is some (probably exhaustive, hopefully doable) search method that is guaranteed to find the least costly parameters in a excitation + N-tap filter model where the number of taps and their updates can be chosen at will provided that it minimize the filesize while having to transmit both parameters and residue. May one approach be to build a 2-d array of autocorrelation functions that are taken from larger and larger windows of the whole file (going from 2-sample windows to whole file), navigating this to extract some kind of ideal short-term and long-term model parameters that minimize total cost?

Or maybe the noise->linear time-variant filter model is a limitation in itself, and rather some other modelling should be used. Cepstrum? Neural Nets? Volterra? Perhaps the most compact representations turns out to be a compressed MIDI file along with instrument samples, just like much music is synthesized. In that case, audio compression, music recognition, instrument separation and a few other interesting topics blends quite nicely. No-one knows how to make the encoder for regular "CD" input though.

-k

Lossless codecs in frequency domain?

Reply #29
Quote
If you could easily determine/identify/describe the underlying process (system) which generated any observed output ...

Exactly. That's e.g. what speech coders try to do, which is why they sound quite good at low bit rates. For music, it's much harder.

Quote
But your example is more like table-lookup is it not? ... That is kind of trivial, but less useful I think.

Well, if you knew all audio signals ever produced on planet earth (at least), it would be an increadibly efficient codec. Every sound could be represented by an index and maybe a length, sampling rate, etc, giving you a compression ratio of 1000000:1 or so. The problem is, you cannot know every signal in advance, and your encoder and decoder software would be Terabytes large.  So yes, quite useless.

Quote
How does a time-variant order-16 LPC compare to the decorrelation in FLAC?

Very similar. FLAC files use up to order 12, and update the filter every 4096 samples (in the current reference encoder, since analysis is block-based). If you would update the filter for every sample, you'll get a smoother/better filtering, but compression wouldn't increase since you'd have to transmit more filter settings over time.

Quote
When you say that the remaining audible correlation is insiginificant for compression, is that because it is small, or because our imperfect man-made algorithms are not able to exploit it properly?

Both I guess. It's always a side-info-compression-ratio tradeoff. With longer LPC filters, you can usually improve your prediction (up to a certain limit due to stationarity constraints etc. blabla), but you'll also need more bits to transmit the filter.

Chris
If I don't reply to your reply, it means I agree with you.

Lossless codecs in frequency domain?

Reply #30
Quote
Quote
How does a time-variant order-16 LPC compare to the decorrelation in FLAC?

Very similar. FLAC files use up to order 12, and update the filter every 4096 samples (in the current reference encoder, since analysis is block-based). If you would update the filter for every sample, you'll get a smoother/better filtering, but compression wouldn't increase since you'd have to transmit more filter settings over time.

So a crude mental model of FLAC would be:
The signal spectrum is modelled as up to 12 "bumps", updated every 4096 samples. The frequency, height and width of these bumps are compressed and transmitted along with the residual error after using it to model the signal.

Do you have any gut-feeling about the typical ratio of filter parameters vs residue bandwidth? Does filter parameters consume 1/100 of the total encoded bitrate or 1/2?

I still think that LPC is a very explicit model that is being used. It would be interesting if it was possible to use a more generalized algorithm and then possibly learn that something LPC-like was close to "optimal" after all.

-k

Lossless codecs in frequency domain?

Reply #31
So a crude mental model of FLAC would be:
The signal spectrum is modelled as up to 12 "bumps", updated every 4096 samples. The frequency, height and width of these bumps are compressed and transmitted along with the residual error after using it to model the signal.

Do you have any gut-feeling about the typical ratio of filter parameters vs residue bandwidth? Does filter parameters consume 1/100 of the total encoded bitrate or 1/2?

I still think that LPC is a very explicit model that is being used. It would be interesting if it was possible to use a more generalized algorithm and then possibly learn that something LPC-like was close to "optimal" after all.

-k

Crude model: yes, that's basically it. Typical ratio: let's assume you need around 10 bits per LPC coefficient. 10x12 bits x 10.8 frames/s = 1.3 kbps + maybe some additional side info = less than 2 kbps. Compare that to the 500 or more kbps of a lossless file.

More generalized algorithm: not only model "bumps" but also "valleys", i.e. use zeros in addition to poles in your LPC filter. Done here: http://www.aes.org/e-lib/browse.cfm?elib=7364 for lossless packing on DVD-Audio. I never tried if it gives an advantage for 44-48 kHz sampling rates. Probably doesn't for typical CD audio.

Chris
If I don't reply to your reply, it means I agree with you.

Lossless codecs in frequency domain?

Reply #32
Here's some talk about MDCT and similar (and not similar  ) tricks.

 

Lossless codecs in frequency domain?

Reply #33
First, google on IMDCT, there are precisely reversable integer transforms.

Second, you get the same coding gain from the time or frequency domains. Google "Spectral Flatness Measure" for that one.

It's roughly 4 of one and substantially less than a half-dozen of the other.

You might also look at Microsoft's IMDCT codecs whose name escapes me at the minute but whose creator is Jin Li.
-----
J. D. (jj) Johnston

Lossless codecs in frequency domain?

Reply #34
Crude model: yes, that's basically it. Typical ratio: let's assume you need around 10 bits per LPC coefficient. 10x12 bits x 10.8 frames/s = 1.3 kbps + maybe some additional side info = less than 2 kbps. Compare that to the 500 or more kbps of a lossless file.

So model parameters are basically no cost BW-wise. If one can significantly improve the modelling (thereby shrinking the residue), doing so at the cost of doubling the transmitted parameters is ok.

For "harmonic" content, like speech or pitched instruments, the residue should contain semi-discrete harmonics in the frequency domain, not random noise, right? Might there be some benefit in modelling this using cepstrum or something similar?

-k

Lossless codecs in frequency domain?

Reply #35
First, google on IMDCT, there are precisely reversable integer transforms.

Second, you get the same coding gain from the time or frequency domains. Google "Spectral Flatness Measure" for that one.

It's roughly 4 of one and substantially less than a half-dozen of the other.

You might also look at Microsoft's IMDCT codecs whose name escapes me at the minute but whose creator is Jin Li.


Sorry I should say "practically speaking" the same gain. There isn't much maximum-phase content in real instruments.
-----
J. D. (jj) Johnston