HydrogenAudio

Lossless Audio Compression => Lossless / Other Codecs => Topic started by: 0x3c0 on 2010-11-19 19:31:27

Title: Lossless codecs in frequency domain?
Post by: 0x3c0 on 2010-11-19 19:31:27
Per the title, are there any useful lossless codecs that operate in the frequency domain rather than the time domain? There are two aspects of this I'm interested in; one is whether this accomplishes much better compression than linear prediction (as used by most lossless codecs now), another would be the theory behind such methods. From what I've read so far (which admittedly isn't a lot) the key drawback towards using the Fourier transform is that these tend to not be accurate due to rounding error, and hence not exactly lossless (unless extra steps are taken, which might hurt compressibility).
Title: Lossless codecs in frequency domain?
Post by: DVDdoug on 2010-11-19 22:34:37
Quote
From what I've read so far (which admittedly isn't a lot) the key drawback towards using the Fourier transform is that these tend to not be accurate due to rounding error, and hence not exactly lossless
As I mentioned in another thread, "I'm not a math whiz"...    But, I think you're right.    On one hand I think I've read that Fourier is completely reversible, and on the other hand I read about windowing & overlapping windows (which tells me we can't get the original data back).

Hopefully, someone who is a math whiz can explain the real-world limitations (in a way that those of us who are mathematically challenged can understand).
Title: Lossless codecs in frequency domain?
Post by: JapanAudio on 2010-11-19 23:10:21
Hmm actually you would probably use an integer DCT instead of the DFT. For lossless the quantization noise is taken care of in the residual coding. For lossy you would use MDCT which has overlapping windows to reduce the blocking effect. The main drawback of transform coding is cost.
Title: Lossless codecs in frequency domain?
Post by: saratoga on 2010-11-20 00:34:44
Hmm actually you would probably use an integer DCT instead of the DFT. For lossless the quantization noise is taken care of in the residual coding. For lossy you would use MDCT which has overlapping windows to reduce the blocking effect. The main drawback of transform coding is cost.

Not really sure what you mean by "cost", but I think the actual problem with frequency domain lossless coding is that theres no obvious way to exploit the frequency domain representation to gain in compression without loss.  The main advantage of frequency domain coding in the MDCT or subband codecs is that it enables you to relatively easily compute the masking thresholds and then hide your quantization noise in frequency bins that are masked.  But if you don't have any quantization noise because your compression is lossless, I don't know that theres any point in transforming.
Title: Lossless codecs in frequency domain?
Post by: JapanAudio on 2010-11-20 00:56:05
Hmm actually you would probably use an integer DCT instead of the DFT. For lossless the quantization noise is taken care of in the residual coding. For lossy you would use MDCT which has overlapping windows to reduce the blocking effect. The main drawback of transform coding is cost.

Not really sure what you mean by "cost", but I think the actual problem with frequency domain lossless coding is that theres no obvious way to exploit the frequency domain representation to gain in compression without loss.  The main advantage of frequency domain coding in the MDCT or subband codecs is that it enables you to relatively easily compute the masking thresholds and then hide your quantization noise in frequency bins that are masked.  But if you don't have any quantization noise because your compression is lossless, I don't know that theres any point in transforming.

Because depending on your algorithm you can exploit correlation in the frequency domain, plus the data compaction properties of the DCT come in handy. You could also apply linear prediction in the frequency domain. There are many possibilities, but usually transform coding is more costly in terms of cpu cycles (say encoding time). Just look at how LTAC (transform) was replaced by LPAC (predictive), both lossless.
Title: Lossless codecs in frequency domain?
Post by: saratoga on 2010-11-20 01:29:55
Because depending on your algorithm you can exploit correlation in the frequency domain, plus the data compaction properties of the DCT come in handy. You could also apply linear prediction in the frequency domain.

Or you can exploit it in the time domain

You CAN do all sorts of things.  But usually you only do them if theres a reason to.  Thats why we don't really consider the infinite number of possible transform domains, but just the handful of ones that are actually useful. Do you have any particular reason to think one domain is better then another?

There are many possibilities, but usually transform coding is more costly in terms of cpu cycles (say encoding time).

Not really.  Its about 5-10MHz to do an MDCT in real time.  I wouldn't consider that "costly".  Not when people gladly spend orders of magnitude more CPU time trying to squeeze the last 1% compression out of flac or APE.  If there was actually any real compression advantage to be had, no one would even think twice about the tiny bit of extra CPU time spent transforming.

Title: Lossless codecs in frequency domain?
Post by: JapanAudio on 2010-11-20 03:29:45
First, I never said you would get better compression results with transform coding; I said that it was slower. Second, real-time isn't really fast; I wouldn't want to wait 4 minutes to encode a 4 min song. So when I refer to cost, I mean that linear predictive coding is faster than transform coding for similar results. Nobody wants to wait 10x longer for a 1% compression increase.

I should add that you can't really compare the time and frequency domains to determine which is "better". Because if you use transforms you've got both domains at hand, in a way revealing the naturally occurring two-dimensional features of audio data. But the problem is that no general theory of 'decorrelation' has been formulated to find patterns within a general audio signal. With transforms, you can potentially exploit redundancy in subbands, on wider time spans, etc., but the cost compromise is just too much for little to no improvement over the simple and fast LPC.

That being said, I believe it is only a matter of time before more efficient and novel algorithms replace the primitive linear glottal model, that leads to LPC, which was meant for speech coding in the first place.
Title: Lossless codecs in frequency domain?
Post by: saratoga on 2010-11-20 04:22:33
First, I never said you would get better compression results with transform coding; I said that it was slower.

Yeah and I explained that this is not so.

Second, real-time isn't really fast; I wouldn't want to wait 4 minutes to encode a 4 min song.

Then don't use a 5-10MHz CPU!

Nobody wants to wait 10x longer for a 1% compression increase.

ahahahaha you've obviously never looked at any lossless audio codecs in detail if you think this is so.  Take a look at what people are willing to put up with:

http://synthetic-soul.co.uk/comparison/lossless/ (http://synthetic-soul.co.uk/comparison/lossless/)

Hell people put up with more then that to get fractions of 1%.

So when I refer to cost, I mean that linear predictive coding is faster than transform coding for similar results. Nobody wants to wait 10x longer for a 1% compression increase.

My original point is that you're talking out of your butt if you think running a ~1k FFT or DCT on audio data is going to make your compression take 10x longer.  FFTs are really, really, really fast.  Seriously.  Go look up how long it takes to computer an FFT on a modern PC.  You seem to think it takes hundreds of millions of clock cycles given your estimates above, but thats just not so.

I should add that you can't really compare the time and frequency domains to determine which is "better". Because if you use transforms you've got both domains at hand, in a way revealing the naturally occurring two-dimensional features of audio data.

This is just nonsense.  Also I'm not sure I would call a signal and its Fourier transform 'two-dimensional'.  The halves of the time frequency distribution are really just different ways of looking at the same underlying thing.  They're not actually separately determined.

But the problem is that no general theory of 'decorrelation' has been formulated to find patterns within a general audio signal. With transforms, you can potentially exploit redundancy in subbands, on wider time spans, etc.,

Can you rephrase this more coherently?  I'm not sure if you're just being scatterbrained or really have no idea what you're talking about.

but the cost compromise is just too much for little to no improvement over the simple and fast LPC.

There is no "cost compromise".  And as I said before, computational cost is a non-issue for lossless compression.  Just decoding an APE file in real time can take the better part of a 1GHz CPU.  Some of the formats people on this board use are even slower.  If there was something here, people would use it, no matter the cost.
Title: Lossless codecs in frequency domain?
Post by: JapanAudio on 2010-11-20 04:35:20
Whatever man, I don't even know why I answered your post to begin with. I don't really want to argue with you. So take it for what it is.
Title: Lossless codecs in frequency domain?
Post by: saratoga on 2010-11-20 04:38:20
To put this in perspective, according to the FFTW benchmark page, a ~\$100 Intel CPU can compute about 1.5 million 1024 point FFTs a second.  That means every second, it could Fourier transform almost 5 continuous hours of CD audio.

Or for a 4 minute track, it would take 3.4 milliseconds!

Of course, if you want to do the MDCT, you'd have to pre/post rotate your data, which might add another millisecond to that runtime!  Hope you can spare it while you wait 5 minutes an album for FLAC's linear prediction and entropy coding.

(Yes I realize that in the real world disk speed and memory latency would matter more then raw fft performance, but that just underscores how silly this argument is)
Title: Lossless codecs in frequency domain?
Post by: saratoga on 2010-11-20 04:43:54
Whatever man, I don't even know why I answered your post to begin with. I don't really want to argue with you.

I know why, same reason you kept at it in that thread where you didn't understand masking.  You didn't want to admit that you were mistaken even though you realized you were in over your head.  So you doubled down and hoped no one would notice.  Not the best idea around here.
Title: Lossless codecs in frequency domain?
Post by: C.R.Helmrich on 2010-11-20 10:11:30
That being said, I believe it is only a matter of time before more efficient and novel algorithms replace the primitive linear glottal model, that leads to LPC, which was meant for speech coding in the first place.

Actually, LPC is not that primitive after all, as I found out during my work. But you're right about the novel algorithms: an integer MDCT is used in HD-AAC (which by the way can also operate in pure-lossless mode).

Or for a 4 minute track, it would take 3.4 milliseconds!

Of course, if you want to do the MDCT, you'd have to pre/post rotate your data, which might add another millisecond to that runtime! Hope you can spare it while you wait 5 minutes an album for FLAC's linear prediction and entropy coding.

A little more polite and careful, please, saratoga! An (M)DCT of a waveform alone doesn't give you any compression, it's just a transform. So for a fair comparison, you would have to either compare only (M)DCT vs. LPC analysis and filtering, or compare (M)DCT + entropy coding vs. LPC + entropy coding. From my work experience in that field I can tell you that most of the time, entropy coding is more expensive that LPC or T/F transform, especially in the encoder. That taken into account, LPC- and transform-based codecs are probably quite close to each other speed-wise (with LPC-based codecs probably being a little faster on decoder side since usually, no analysis operation is needed there).

Also, a common flaw I see here on HA audio is that people tend to think only of PC-based applications of audio codecs. On portable devices with relatively weak processors, battery life is critical, and the industry is looking for codecs with minimum decoder complexity. Now take a guess why e.g. APE hasn't reached that market. Nobody likes it when your portable's battery is empty after playing two lossless songs (if it can play them at all).

(No, in the real world, with the rise of flash storage, disk speed and memory latency soon won't matter more then raw fft performance.)

Quote
Can you rephrase this more coherently? I'm not sure if you're just being scatterbrained or really have no idea what you're talking about.

JapanAudio's comment made perfect sense to me. Do you know what (s)he is talking about?

Chris
Title: Lossless codecs in frequency domain?
Post by: knutinh on 2010-11-20 10:43:27
Also, a common flaw I see here on HA audio is that people tend to think only of PC-based applications of audio codecs. On portable devices with relatively weak processors, battery life is critical, and the industry is looking for codecs with minimum decoder complexity. Now take a guess why e.g. APE hasn't reached that market. Nobody likes it when your portable's battery is empty after playing two lossless songs (if it can play them at all).

I cant see any sensible battery-constrained applications for lossless codecs? AAC for iPhones, FLAC for PC/squeezebox/...

Granted, embedded boxes can be used for hifi and be similar to battery-powered devices in many ways, but I dont think the constraint is that hard.

-k
Title: Lossless codecs in frequency domain?
Post by: 0x3c0 on 2010-11-20 12:58:41
Chill guys, no need to get so upset.

As mentioned in my post I'm more interested in the theoretical aspects than practicality. We can just assume for now that we're going to do all the encoding and decoding on an average modern computer; speed optimisation can always come later.

I've thought about using linear prediction in the frequency domain before; given that the Fourier transform and related ones like the DCT are linear transformations, this isn't any different from using linear prediction on a block of samples, with the added disadvantage of using lots of buffer space (at least two past blocks), which also leads to reduced seekability. In addition, if we're talking about 1024-sample DCTs, the matrix involved will be so much larger than regular matrices used in LPC (I think 32x32 is the maximum for FLAC) that encoding will take significantly longer even if we're not using the DCT (which would be comparatively fast anyway). Should have included this in the first post, but this kind of slipped my mind. (Note that I'm not very sure what I'm talking about here and I might be completely off. Correct me if I'm wrong.)

What I was thinking of was more along the lines of applying the DCT then coding the results of the transform directly. I'm not a good enough programmer to throw up a consistent DCT (and an exact IDCT) over a sufficiently large number of samples (like, say, 256), so I can't test how this works myself. There might be gains from using prediction, but as mentioned above, a complete linear prediction is out of the question. The MDCT would be useless since the main purpose (to reduce boundary artifacts with lossy encoding) is irrelevant here, although it wouldn't provide much of a speed hit.

The alternative to using an exact inverse transform would be to use an inexact one and code the differences, but I suspect this will hurt compressibilty quite a bit since the differences are effectively random.

Also, there's no need to confine ourselves to discussing about using Fourier-related transforms; how about wavelet transforms? Wavelets aren't quite representative of natural audio, so they don't concentrate the energy as well in a small number of areas, but they're apparently much easier to implement with just integer arithmetic.
Title: Lossless codecs in frequency domain?
Post by: C.R.Helmrich on 2010-11-20 19:11:12
... which also leads to reduced seekability. In addition, if we're talking about 1024-sample DCTs, the matrix involved will be so much larger than regular matrices used in LPC (I think 32x32 is the maximum for FLAC) that encoding will take significantly longer even if we're not using the DCT (which would be comparatively fast anyway).

Don't know what you mean by 32x32 and reduced seekability. FLAC's default block size is 4096 samples. I don't think that computing the LPC filter in FLAC via autocorrelation, and then applying the filter, is much faster than doing a, say, 1024-sample (M)DCT.

Quote
What I was thinking of was more along the lines of applying the DCT then coding the results of the transform directly.

If you partition the DCT spectrum and use an integer DCT, then that should be possible. In fact, IIRC, that's what HD-AAC is doing in pure lossless mode.

Regarding your question whether this is much more efficient than time-domain compression: that probably depends on the signal. For tonal signals (e.g. harpsichord), it might be. For speech, it probably isn't.

Chris
Title: Lossless codecs in frequency domain?
Post by: 0x3c0 on 2010-11-21 18:22:31
Don't know what you mean by 32x32 and reduced seekability. FLAC's default block size is 4096 samples. I don't think that computing the LPC filter in FLAC via autocorrelation, and then applying the filter, is much faster than doing a, say, 1024-sample (M)DCT.

What I was considering was using linear prediction in frequency domain (as stated in my post), and not just applying a DCT and coding the coefficients. In using linear prediction with n past samples an n-by-n matrix (which normally wouldn't use up O(n^2) memory since there are only at most 2n+1 distinct elements) would be generated, and subsequently solved. According to Wikipedia, these matrices can be solved with O(n^2) time. With that, using an n-sample DCT blocks, and at least 2 past blocks with which to perform linear prediction, there would be n matrices of size at least (2n)x(2n) to invert (per frame). Note that the definition of "block" here differs as used in FLAC; unless we're going to forgo linear prediction post-DCT, we're going to need some kind of "superblock" containing multiple DCT blocks (which cannot be decoded out-of-order). Which leads to the reduced seekability, as mentioned. (This might still be okay if the "superblocks" have around 16 blocks and each block has 256 samples.) I pulled up these to illustrate the disadvantage of naively trying to use LPC in the frequency domain; there are probably better ways than applying linear prediction across all the DCT coefficients, which, as I mentioned, wouldn't be any different from just using linear prediction over a similar number of past samples.

Quote
If you partition the DCT spectrum and use an integer DCT, then that should be possible. In fact, IIRC, that's what HD-AAC is doing in pure lossless mode.

Doesn't HD-AAC use a lossy core and encode the difference? That's rather straightforward, but leads to the issue that I pointed out: using an inexact IDCT would leave an essentially random difference which will be nigh incompressible. Also, what do you mean by partitioning the spectrum? If I'm understanding you right (coding parts of the spectrum separately), how would doing so improve compressibility?

Quote
Regarding your question whether this is much more efficient than time-domain compression: that probably depends on the signal. For tonal signals (e.g. harpsichord), it might be. For speech, it probably isn't.

Chris

Thanks for the input! Now here's another crazy idea (as if we don't have enough already ): if we could somehow separate the "speech" part (highly compressible with LPC) and the "tonal" part (more compressible with DCT+entropy coding), this should produce somewhat better compression than either alone right? (The difficulty probably lies in the separation part.)
Title: Lossless codecs in frequency domain?
Post by: JapanAudio on 2010-11-21 23:32:15
The idea behind partitioning the spectrum is subband analysis. You can decompose it into equal segments or use logarithmic spacing to model hearing more accurately.

The "inexactitude" of the rounded DCT is not really an issue, primarily because reversible intDCTs do exist, but more importantly because there is no interest in reversing the DCT before applying some sort of quantization, as mentioned previously.

Question: Do you have access to MATLAB? You could try many of your ideas and see what kind of results you get. Also, i would suggest searching through scientific journals and papers to see what has already been done and what kind of advances have made recently; that could save you a lot of time. Google scholar is useful for this purpose.
Title: Lossless codecs in frequency domain?
Post by: Soap on 2010-11-21 23:44:56
The idea behind partitioning the spectrum is subband analysis. You can decompose it into equal segments or use logarithmic spacing to model hearing more accurately.

Please correct me where I'm mistaken, but we're talking lossless here, not a perceptual lossy format.

What is the benefit of modeling (or mirroring) human hearing?
Title: Lossless codecs in frequency domain?
Post by: JapanAudio on 2010-11-22 00:01:55
The idea behind partitioning the spectrum is subband analysis. You can decompose it into equal segments or use logarithmic spacing to model hearing more accurately.

Please correct me where I'm mistaken, but we're talking lossless here, not a perceptual lossy format.

What is the benefit of modeling (or mirroring) human hearing?

Yes you're right. But you can also operate on different bands to apply decorrelation algorithms; and the natural way to divide the spectrum is logarithmically. This would actually be the case in many ways you would implement frequency-domain linear prediction (FDLP).

We're not trying to design a psychoacoustic model here. It's just a basic property of sound: frequency increases exponentially with a linearly increasing pitch.
Title: Lossless codecs in frequency domain?
Post by: C.R.Helmrich on 2010-11-22 00:26:42
Doesn't HD-AAC use a lossy core and encode the difference?

Not if it runs in pure lossless mode. In that mode, the "difference" signal represents the input signal.

I think I understand your envisioned coding approach now. I guess you're right with the seekability etc. then.

Quote
Also, what do you mean by partitioning the spectrum? If I'm understanding you right (coding parts of the spectrum separately), how would doing so improve compressibility?

Correct. I was thinking along the lines of FLAC, which segments the time-domain prediction residual and adapts its entropy coder to each segment. You could do the same in the frequency domain, for example like JapanAudio suggested (actually, you could use any arbitrary segmentation if it gives good compression).

Quote
Thanks for the input! Now here's another crazy idea (as if we don't have enough already ): if we could somehow separate the "speech" part (highly compressible with LPC) and the "tonal" part (more compressible with DCT+entropy coding), this should produce somewhat better compression than either alone right? (The difficulty probably lies in the separation part.)

Exactly. (Exactly)

Chris
Title: Lossless codecs in frequency domain?
Post by: saratoga on 2010-11-22 00:29:10
That being said, I believe it is only a matter of time before more efficient and novel algorithms replace the primitive linear glottal model, that leads to LPC, which was meant for speech coding in the first place.

Actually, LPC is not that primitive after all, as I found out during my work. But you're right about the novel algorithms: an integer MDCT is used in HD-AAC (which by the way can also operate in pure-lossless mode).

Or for a 4 minute track, it would take 3.4 milliseconds!

Of course, if you want to do the MDCT, you'd have to pre/post rotate your data, which might add another millisecond to that runtime! Hope you can spare it while you wait 5 minutes an album for FLAC's linear prediction and entropy coding.

A little more polite and careful, please, saratoga! An (M)DCT of a waveform alone doesn't give you any compression, it's just a transform. So for a fair comparison, you would have to either compare only (M)DCT vs. LPC analysis and filtering, or compare (M)DCT + entropy coding vs. LPC + entropy coding. From my work experience in that field I can tell you that most of the time, entropy coding is more expensive that LPC or T/F transform, especially in the encoder. That taken into account, LPC- and transform-based codecs are probably quite close to each other speed-wise (with LPC-based codecs probably being a little faster on decoder side since usually, no analysis operation is needed there).

Isn't that what I said?  You "might add another millisecond" to the minutes you'll spend on the rest of the codec.

Also, a common flaw I see here on HA audio is that people tend to think only of PC-based applications of audio codecs. On portable devices with relatively weak processors, battery life is critical, and the industry is looking for codecs with minimum decoder complexity.

I'd like to point out that I wrote much of the current rockbox iMDCT, which is as far as I am aware the fastest embedded open source iMDCT, so I think I am well aware of these constraints

http://www.hydrogenaudio.org/forums/index....showtopic=82125 (http://www.hydrogenaudio.org/forums/index.php?showtopic=82125)

Now take a guess why e.g. APE hasn't reached that market. Nobody likes it when your portable's battery is empty after playing two lossless songs (if it can play them at all).

This isn't entirely correct.  Ape hasn't reached those markets because until Rockbox wrote a free decoder and optimized it for mobile use, there was no decoder available.  Now that we provide one, several companies support APE on battery powered devices.  But yes, the battery hit is considerable for Ape compared to FLAC.
Title: Lossless codecs in frequency domain?
Post by: saratoga on 2010-11-22 00:46:34
Correct. I was thinking along the lines of FLAC, which segments the time-domain prediction residual and adapts its entropy coder to each segment. You could do the same in the frequency domain, for example like JapanAudio suggested (actually, you could use any arbitrary segmentation if it gives good compression).

I remember there was once a thread where someone compared lowpassed FLAC or Wavepack to lowpassed and resampled (to around Nyquist).  In theory they should have the same size since theres the same amount of information, but for at least the encoder tested the resampled file was smaller.  No idea if this was true in general, but it might suggest that some kind of critically sampled filterbank followed by linear prediction or something similar could work.  That said, I have no idea if it'd actually beat simple time domain coding.  Given how similar such a scheme would be to subband coding in MPEG and ATRAC, I suspect its been tried and abandoned before.
Title: Lossless codecs in frequency domain?
Post by: C.R.Helmrich on 2010-11-22 00:57:04
Isn't that what I said?  You "might add another millisecond" to the minutes you'll spend on the rest of the codec.

Then I misunderstood you. Though - adding to your comparison - FLAC running on a RAM disk would probably be in the range of seconds or ms as well.

Quote
I'd like to point out that I wrote much of the current rockbox iMDCT, which is as far as I am aware the fastest embedded open source iMDCT, so I think I am well aware of these constraints

True. Anyway, it's not your fault that APE decoding is so expensive.  I remember an Athlon 900 MHz having problems decoding APE files with highest compression in real-time while surfing the web. That's expensive! Then again, the developer probably didn't spend too much thought on that when he came up with APE, not knowing/expecting that it would become so popular.

Chris
Title: Lossless codecs in frequency domain?
Post by: saratoga on 2010-11-22 01:13:18
True. Anyway, it's not your fault that APE decoding is so expensive.  I remember an Athlon 900 MHz having problems decoding APE files with highest compression in real-time while surfing the web. That's expensive! Then again, the developer probably didn't spend too much thought on that when he came up with APE, not knowing/expecting that it would become so popular.

Yes its not so bad now.  About 500MHz on ARM11 for the highest compression level, much less on Cortex processors where Neon is available.  c2000 and maybe c3000 are probably fast enough now that a cell phone type device won't have an unreasonable battery life hit.
Title: Lossless codecs in frequency domain?
Post by: knutinh on 2010-11-22 14:13:26
What is the only fundamental mechanism that lossless coding can use? Signal correlation / redundancy, right? Typically the signal can be modelled by a (relatively) static, simple filter excitated by a random noise-source. Other correlations may exist at musically relevant distances, but those distances may render them irrelevant due to memory/computation cost.

I guess the larger one makes the "search space" for signal representation vectors that can express a part of the signal, the more compression is (ideally) achievable. But cost increase rapidly, while compression gain increase very modestly, right?

So what does FD bring us? Isnt the power spectrum just the FT of the temporal correlation function? Is it a computationally more efficient way of increasing the "search space" or what? I did calculate Yule-Walker equations by hand back in school, but I now have problems formulating what it is fundamentally that lossless compression does, and why flac does it better for typical audio than Lempel-Ziv.

To my eyes, waveforms like this seems quite "correlated", in the sense that it is far from random. How much of that correlation is a (low-order?) flac encoder able to remove? If compression is only 50%, it means that the residue is going to be quite significant power compared to the input signal, right? How much correlation is there (typically) left for an ideal lossless encoder that does not have to take (as many) practical considerations?

-k
Title: Lossless codecs in frequency domain?
Post by: C.R.Helmrich on 2010-11-22 23:07:09
Quote
How much correlation is there (typically) left for an ideal lossless encoder that does not have to take (as many) practical considerations?

Depends on how you determine the correlation. An easy and interesting task is to fire up Matlab/Octave and apply a time-variant LPC filter of, say, order 16 to your input audio signal. If you listen to the result (the prediction residual, the signal that lossless encoders code), you'll notice that a) it's much quieter than the original, b) you can still recognize the original music/speech. b) implies that there is still quite some correlation left, but usually this remaining correlation has very little significance during compression. IOW, it's usually not worth removing the remaining correlation.

Chris
Title: Lossless codecs in frequency domain?
Post by: knutinh on 2010-11-23 06:53:01
Quote
How much correlation is there (typically) left for an ideal lossless encoder that does not have to take (as many) practical considerations?

Depends on how you determine the correlation. An easy and interesting task is to fire up Matlab/Octave and apply a time-variant LPC filter of, say, order 16 to your input audio signal. If you listen to the result (the prediction residual, the signal that lossless encoders code), you'll notice that a) it's much quieter than the original, b) you can still recognize the original music/speech. b) implies that there is still quite some correlation left, but usually this remaining correlation has very little significance during compression. IOW, it's usually not worth removing the remaining correlation.

Chris

So I should probably have asked "is there some practical method for determining the 'true entropy' of a finite-length file?", assuming that it is the result of a stochastic process.

How does a time-variant order-16 LPC compare to the decorrelation in FLAC?

When you say that the remaining audible correlation is insiginificant for compression, is that because it is small, or because our imperfect man-made algorithms are not able to exploit it properly?

-k
Title: Lossless codecs in frequency domain?
Post by: Martel on 2010-11-23 07:44:10
I don't think you can ever measure a "true" entropy (i guess not in polynomial time). Take this example:

I make a C program using a standard rand() function seeded using some value, then produce a binary file where I just dump one billion of pseudorandom numbers.
If I gave you the file, you would have no idea how the file was created and would probably fail to compress it much (if at all).
However, I could easily "remember" the whole file, I just need the seed and the size...

If you could easily determine/identify/describe the underlying process (system) which generated any observed output sequence (a song, for example), you could probably reach maximum entropy while compressing it. Since this is next to impossible on a general level, you're out of luck.
Title: Lossless codecs in frequency domain?
Post by: knutinh on 2010-11-23 08:47:02
I don't think you can ever measure a "true" entropy (i guess not in polynomial time). Take this example:

I make a C program using a standard rand() function seeded using some value, then produce a binary file where I just dump one billion of pseudorandom numbers.
If I gave you the file, you would have no idea how the file was created and would probably fail to compress it much (if at all).
However, I could easily "remember" the whole file, I just need the seed and the size...

If you could easily determine/identify/describe the underlying process (system) which generated any observed output sequence (a song, for example), you could probably reach maximum entropy while compressing it. Since this is next to impossible on a general level, you're out of luck.

But your example is more like table-lookup is it not? A given "random" generator might return some output from a small seed that for all purposes concerning us here is "random" but really is not. A lossless audio "encoder" could identify a file as "track#7 of this and that CD", simply transmitting a short index to the decoder (assuming that the decoder has access to all CDs ever made). That is kind of trivial, but less useful I think.

The question is if there is some (probably exhaustive, hopefully doable) search method that is guaranteed to find the least costly parameters in a excitation + N-tap filter model where the number of taps and their updates can be chosen at will provided that it minimize the filesize while having to transmit both parameters and residue. May one approach be to build a 2-d array of autocorrelation functions that are taken from larger and larger windows of the whole file (going from 2-sample windows to whole file), navigating this to extract some kind of ideal short-term and long-term model parameters that minimize total cost?

Or maybe the noise->linear time-variant filter model is a limitation in itself, and rather some other modelling should be used. Cepstrum? Neural Nets? Volterra? Perhaps the most compact representations turns out to be a compressed MIDI file along with instrument samples, just like much music is synthesized. In that case, audio compression, music recognition, instrument separation and a few other interesting topics blends quite nicely. No-one knows how to make the encoder for regular "CD" input though.

-k
Title: Lossless codecs in frequency domain?
Post by: C.R.Helmrich on 2010-11-23 22:15:57
Quote
If you could easily determine/identify/describe the underlying process (system) which generated any observed output ...

Exactly. That's e.g. what speech coders try to do, which is why they sound quite good at low bit rates. For music, it's much harder.

Quote
But your example is more like table-lookup is it not? ... That is kind of trivial, but less useful I think.

Well, if you knew all audio signals ever produced on planet earth (at least), it would be an increadibly efficient codec. Every sound could be represented by an index and maybe a length, sampling rate, etc, giving you a compression ratio of 1000000:1 or so. The problem is, you cannot know every signal in advance, and your encoder and decoder software would be Terabytes large.  So yes, quite useless.

Quote
How does a time-variant order-16 LPC compare to the decorrelation in FLAC?

Very similar. FLAC files use up to order 12, and update the filter every 4096 samples (in the current reference encoder, since analysis is block-based). If you would update the filter for every sample, you'll get a smoother/better filtering, but compression wouldn't increase since you'd have to transmit more filter settings over time.

Quote
When you say that the remaining audible correlation is insiginificant for compression, is that because it is small, or because our imperfect man-made algorithms are not able to exploit it properly?

Both I guess. It's always a side-info-compression-ratio tradeoff. With longer LPC filters, you can usually improve your prediction (up to a certain limit due to stationarity constraints etc. blabla), but you'll also need more bits to transmit the filter.

Chris
Title: Lossless codecs in frequency domain?
Post by: knutinh on 2010-11-24 07:38:47
Quote
Quote
How does a time-variant order-16 LPC compare to the decorrelation in FLAC?

Very similar. FLAC files use up to order 12, and update the filter every 4096 samples (in the current reference encoder, since analysis is block-based). If you would update the filter for every sample, you'll get a smoother/better filtering, but compression wouldn't increase since you'd have to transmit more filter settings over time.

So a crude mental model of FLAC would be:
The signal spectrum is modelled as up to 12 "bumps", updated every 4096 samples. The frequency, height and width of these bumps are compressed and transmitted along with the residual error after using it to model the signal.

Do you have any gut-feeling about the typical ratio of filter parameters vs residue bandwidth? Does filter parameters consume 1/100 of the total encoded bitrate or 1/2?

I still think that LPC is a very explicit model that is being used. It would be interesting if it was possible to use a more generalized algorithm and then possibly learn that something LPC-like was close to "optimal" after all.

-k
Title: Lossless codecs in frequency domain?
Post by: C.R.Helmrich on 2010-11-24 18:03:47
So a crude mental model of FLAC would be:
The signal spectrum is modelled as up to 12 "bumps", updated every 4096 samples. The frequency, height and width of these bumps are compressed and transmitted along with the residual error after using it to model the signal.

Do you have any gut-feeling about the typical ratio of filter parameters vs residue bandwidth? Does filter parameters consume 1/100 of the total encoded bitrate or 1/2?

I still think that LPC is a very explicit model that is being used. It would be interesting if it was possible to use a more generalized algorithm and then possibly learn that something LPC-like was close to "optimal" after all.

-k

Crude model: yes, that's basically it. Typical ratio: let's assume you need around 10 bits per LPC coefficient. 10x12 bits x 10.8 frames/s = 1.3 kbps + maybe some additional side info = less than 2 kbps. Compare that to the 500 or more kbps of a lossless file.

More generalized algorithm: not only model "bumps" but also "valleys", i.e. use zeros in addition to poles in your LPC filter. Done here: http://www.aes.org/e-lib/browse.cfm?elib=7364 (http://www.aes.org/e-lib/browse.cfm?elib=7364) for lossless packing on DVD-Audio. I never tried if it gives an advantage for 44-48 kHz sampling rates. Probably doesn't for typical CD audio.

Chris
Title: Lossless codecs in frequency domain?
Post by: _m²_ on 2010-11-24 19:04:46
Here (http://encode.ru/threads/1137-Sac-(State-of-the-Art)-Lossless-Audio-Compression?p=22654&viewfull=1#post22654)'s some talk about MDCT and similar (and not similar  ) tricks.
Title: Lossless codecs in frequency domain?
Post by: Woodinville on 2010-11-24 19:12:11
First, google on IMDCT, there are precisely reversable integer transforms.

Second, you get the same coding gain from the time or frequency domains. Google "Spectral Flatness Measure" for that one.

It's roughly 4 of one and substantially less than a half-dozen of the other.

You might also look at Microsoft's IMDCT codecs whose name escapes me at the minute but whose creator is Jin Li.
Title: Lossless codecs in frequency domain?
Post by: knutinh on 2010-11-24 20:28:17
Crude model: yes, that's basically it. Typical ratio: let's assume you need around 10 bits per LPC coefficient. 10x12 bits x 10.8 frames/s = 1.3 kbps + maybe some additional side info = less than 2 kbps. Compare that to the 500 or more kbps of a lossless file.

So model parameters are basically no cost BW-wise. If one can significantly improve the modelling (thereby shrinking the residue), doing so at the cost of doubling the transmitted parameters is ok.

For "harmonic" content, like speech or pitched instruments, the residue should contain semi-discrete harmonics in the frequency domain, not random noise, right? Might there be some benefit in modelling this using cepstrum or something similar?

-k
Title: Lossless codecs in frequency domain?
Post by: Woodinville on 2010-11-25 20:57:56
First, google on IMDCT, there are precisely reversable integer transforms.

Second, you get the same coding gain from the time or frequency domains. Google "Spectral Flatness Measure" for that one.

It's roughly 4 of one and substantially less than a half-dozen of the other.

You might also look at Microsoft's IMDCT codecs whose name escapes me at the minute but whose creator is Jin Li.

Sorry I should say "practically speaking" the same gain. There isn't much maximum-phase content in real instruments.