Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Lossless codecs in frequency domain? (Read 19217 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Lossless codecs in frequency domain?

Per the title, are there any useful lossless codecs that operate in the frequency domain rather than the time domain? There are two aspects of this I'm interested in; one is whether this accomplishes much better compression than linear prediction (as used by most lossless codecs now), another would be the theory behind such methods. From what I've read so far (which admittedly isn't a lot) the key drawback towards using the Fourier transform is that these tend to not be accurate due to rounding error, and hence not exactly lossless (unless extra steps are taken, which might hurt compressibility).

Lossless codecs in frequency domain?

Reply #1
Quote
From what I've read so far (which admittedly isn't a lot) the key drawback towards using the Fourier transform is that these tend to not be accurate due to rounding error, and hence not exactly lossless
As I mentioned in another thread, "I'm not a math whiz"...    But, I think you're right.    On one hand I think I've read that Fourier is completely reversible, and on the other hand I read about windowing & overlapping windows (which tells me we can't get the original data back).

Hopefully, someone who is a math whiz can explain the real-world limitations (in a way that those of us who are mathematically challenged can understand).

Lossless codecs in frequency domain?

Reply #2
Hmm actually you would probably use an integer DCT instead of the DFT. For lossless the quantization noise is taken care of in the residual coding. For lossy you would use MDCT which has overlapping windows to reduce the blocking effect. The main drawback of transform coding is cost.

Lossless codecs in frequency domain?

Reply #3
Hmm actually you would probably use an integer DCT instead of the DFT. For lossless the quantization noise is taken care of in the residual coding. For lossy you would use MDCT which has overlapping windows to reduce the blocking effect. The main drawback of transform coding is cost.


Not really sure what you mean by "cost", but I think the actual problem with frequency domain lossless coding is that theres no obvious way to exploit the frequency domain representation to gain in compression without loss.  The main advantage of frequency domain coding in the MDCT or subband codecs is that it enables you to relatively easily compute the masking thresholds and then hide your quantization noise in frequency bins that are masked.  But if you don't have any quantization noise because your compression is lossless, I don't know that theres any point in transforming.

Lossless codecs in frequency domain?

Reply #4
Hmm actually you would probably use an integer DCT instead of the DFT. For lossless the quantization noise is taken care of in the residual coding. For lossy you would use MDCT which has overlapping windows to reduce the blocking effect. The main drawback of transform coding is cost.


Not really sure what you mean by "cost", but I think the actual problem with frequency domain lossless coding is that theres no obvious way to exploit the frequency domain representation to gain in compression without loss.  The main advantage of frequency domain coding in the MDCT or subband codecs is that it enables you to relatively easily compute the masking thresholds and then hide your quantization noise in frequency bins that are masked.  But if you don't have any quantization noise because your compression is lossless, I don't know that theres any point in transforming.

Because depending on your algorithm you can exploit correlation in the frequency domain, plus the data compaction properties of the DCT come in handy. You could also apply linear prediction in the frequency domain. There are many possibilities, but usually transform coding is more costly in terms of cpu cycles (say encoding time). Just look at how LTAC (transform) was replaced by LPAC (predictive), both lossless.

Lossless codecs in frequency domain?

Reply #5
Because depending on your algorithm you can exploit correlation in the frequency domain, plus the data compaction properties of the DCT come in handy. You could also apply linear prediction in the frequency domain.


Or you can exploit it in the time domain 

You CAN do all sorts of things.  But usually you only do them if theres a reason to.  Thats why we don't really consider the infinite number of possible transform domains, but just the handful of ones that are actually useful. Do you have any particular reason to think one domain is better then another?

There are many possibilities, but usually transform coding is more costly in terms of cpu cycles (say encoding time).


Not really.  Its about 5-10MHz to do an MDCT in real time.  I wouldn't consider that "costly".  Not when people gladly spend orders of magnitude more CPU time trying to squeeze the last 1% compression out of flac or APE.  If there was actually any real compression advantage to be had, no one would even think twice about the tiny bit of extra CPU time spent transforming. 


Lossless codecs in frequency domain?

Reply #6
First, I never said you would get better compression results with transform coding; I said that it was slower. Second, real-time isn't really fast; I wouldn't want to wait 4 minutes to encode a 4 min song. So when I refer to cost, I mean that linear predictive coding is faster than transform coding for similar results. Nobody wants to wait 10x longer for a 1% compression increase.

I should add that you can't really compare the time and frequency domains to determine which is "better". Because if you use transforms you've got both domains at hand, in a way revealing the naturally occurring two-dimensional features of audio data. But the problem is that no general theory of 'decorrelation' has been formulated to find patterns within a general audio signal. With transforms, you can potentially exploit redundancy in subbands, on wider time spans, etc., but the cost compromise is just too much for little to no improvement over the simple and fast LPC.

That being said, I believe it is only a matter of time before more efficient and novel algorithms replace the primitive linear glottal model, that leads to LPC, which was meant for speech coding in the first place.

Lossless codecs in frequency domain?

Reply #7
First, I never said you would get better compression results with transform coding; I said that it was slower.


Yeah and I explained that this is not so.

Second, real-time isn't really fast; I wouldn't want to wait 4 minutes to encode a 4 min song.


Then don't use a 5-10MHz CPU! 

Nobody wants to wait 10x longer for a 1% compression increase.


ahahahaha you've obviously never looked at any lossless audio codecs in detail if you think this is so.  Take a look at what people are willing to put up with:

http://synthetic-soul.co.uk/comparison/lossless/

Hell people put up with more then that to get fractions of 1%.

So when I refer to cost, I mean that linear predictive coding is faster than transform coding for similar results. Nobody wants to wait 10x longer for a 1% compression increase.


My original point is that you're talking out of your butt if you think running a ~1k FFT or DCT on audio data is going to make your compression take 10x longer.  FFTs are really, really, really fast.  Seriously.  Go look up how long it takes to computer an FFT on a modern PC.  You seem to think it takes hundreds of millions of clock cycles given your estimates above, but thats just not so. 

I should add that you can't really compare the time and frequency domains to determine which is "better". Because if you use transforms you've got both domains at hand, in a way revealing the naturally occurring two-dimensional features of audio data.


This is just nonsense.  Also I'm not sure I would call a signal and its Fourier transform 'two-dimensional'.  The halves of the time frequency distribution are really just different ways of looking at the same underlying thing.  They're not actually separately determined. 

But the problem is that no general theory of 'decorrelation' has been formulated to find patterns within a general audio signal. With transforms, you can potentially exploit redundancy in subbands, on wider time spans, etc.,


Can you rephrase this more coherently?  I'm not sure if you're just being scatterbrained or really have no idea what you're talking about.

but the cost compromise is just too much for little to no improvement over the simple and fast LPC.


There is no "cost compromise".  And as I said before, computational cost is a non-issue for lossless compression.  Just decoding an APE file in real time can take the better part of a 1GHz CPU.  Some of the formats people on this board use are even slower.  If there was something here, people would use it, no matter the cost. 

 

Lossless codecs in frequency domain?

Reply #8
Whatever man, I don't even know why I answered your post to begin with. I don't really want to argue with you. So take it for what it is.

Lossless codecs in frequency domain?

Reply #9
To put this in perspective, according to the FFTW benchmark page, a ~$100 Intel CPU can compute about 1.5 million 1024 point FFTs a second.  That means every second, it could Fourier transform almost 5 continuous hours of CD audio. 

Or for a 4 minute track, it would take 3.4 milliseconds!

Of course, if you want to do the MDCT, you'd have to pre/post rotate your data, which might add another millisecond to that runtime!  Hope you can spare it while you wait 5 minutes an album for FLAC's linear prediction and entropy coding.

(Yes I realize that in the real world disk speed and memory latency would matter more then raw fft performance, but that just underscores how silly this argument is)

Lossless codecs in frequency domain?

Reply #10
Whatever man, I don't even know why I answered your post to begin with. I don't really want to argue with you.


I know why, same reason you kept at it in that thread where you didn't understand masking.  You didn't want to admit that you were mistaken even though you realized you were in over your head.  So you doubled down and hoped no one would notice.  Not the best idea around here.

Lossless codecs in frequency domain?

Reply #11
That being said, I believe it is only a matter of time before more efficient and novel algorithms replace the primitive linear glottal model, that leads to LPC, which was meant for speech coding in the first place.

Actually, LPC is not that primitive after all, as I found out during my work. But you're right about the novel algorithms: an integer MDCT is used in HD-AAC (which by the way can also operate in pure-lossless mode).

Or for a 4 minute track, it would take 3.4 milliseconds!

Of course, if you want to do the MDCT, you'd have to pre/post rotate your data, which might add another millisecond to that runtime! Hope you can spare it while you wait 5 minutes an album for FLAC's linear prediction and entropy coding.

A little more polite and careful, please, saratoga! An (M)DCT of a waveform alone doesn't give you any compression, it's just a transform. So for a fair comparison, you would have to either compare only (M)DCT vs. LPC analysis and filtering, or compare (M)DCT + entropy coding vs. LPC + entropy coding. From my work experience in that field I can tell you that most of the time, entropy coding is more expensive that LPC or T/F transform, especially in the encoder. That taken into account, LPC- and transform-based codecs are probably quite close to each other speed-wise (with LPC-based codecs probably being a little faster on decoder side since usually, no analysis operation is needed there).

Also, a common flaw I see here on HA audio is that people tend to think only of PC-based applications of audio codecs. On portable devices with relatively weak processors, battery life is critical, and the industry is looking for codecs with minimum decoder complexity. Now take a guess why e.g. APE hasn't reached that market. Nobody likes it when your portable's battery is empty after playing two lossless songs (if it can play them at all).

(No, in the real world, with the rise of flash storage, disk speed and memory latency soon won't matter more then raw fft performance.)

Quote
Can you rephrase this more coherently? I'm not sure if you're just being scatterbrained or really have no idea what you're talking about.

JapanAudio's comment made perfect sense to me. Do you know what (s)he is talking about?

Chris
If I don't reply to your reply, it means I agree with you.

Lossless codecs in frequency domain?

Reply #12
Also, a common flaw I see here on HA audio is that people tend to think only of PC-based applications of audio codecs. On portable devices with relatively weak processors, battery life is critical, and the industry is looking for codecs with minimum decoder complexity. Now take a guess why e.g. APE hasn't reached that market. Nobody likes it when your portable's battery is empty after playing two lossless songs (if it can play them at all).

I cant see any sensible battery-constrained applications for lossless codecs? AAC for iPhones, FLAC for PC/squeezebox/...

Granted, embedded boxes can be used for hifi and be similar to battery-powered devices in many ways, but I dont think the constraint is that hard.

-k

Lossless codecs in frequency domain?

Reply #13
Chill guys, no need to get so upset.

As mentioned in my post I'm more interested in the theoretical aspects than practicality. We can just assume for now that we're going to do all the encoding and decoding on an average modern computer; speed optimisation can always come later.

I've thought about using linear prediction in the frequency domain before; given that the Fourier transform and related ones like the DCT are linear transformations, this isn't any different from using linear prediction on a block of samples, with the added disadvantage of using lots of buffer space (at least two past blocks), which also leads to reduced seekability. In addition, if we're talking about 1024-sample DCTs, the matrix involved will be so much larger than regular matrices used in LPC (I think 32x32 is the maximum for FLAC) that encoding will take significantly longer even if we're not using the DCT (which would be comparatively fast anyway). Should have included this in the first post, but this kind of slipped my mind. (Note that I'm not very sure what I'm talking about here and I might be completely off. Correct me if I'm wrong.)

What I was thinking of was more along the lines of applying the DCT then coding the results of the transform directly. I'm not a good enough programmer to throw up a consistent DCT (and an exact IDCT) over a sufficiently large number of samples (like, say, 256), so I can't test how this works myself. There might be gains from using prediction, but as mentioned above, a complete linear prediction is out of the question. The MDCT would be useless since the main purpose (to reduce boundary artifacts with lossy encoding) is irrelevant here, although it wouldn't provide much of a speed hit.

The alternative to using an exact inverse transform would be to use an inexact one and code the differences, but I suspect this will hurt compressibilty quite a bit since the differences are effectively random.

Also, there's no need to confine ourselves to discussing about using Fourier-related transforms; how about wavelet transforms? Wavelets aren't quite representative of natural audio, so they don't concentrate the energy as well in a small number of areas, but they're apparently much easier to implement with just integer arithmetic.

Lossless codecs in frequency domain?

Reply #14
... which also leads to reduced seekability. In addition, if we're talking about 1024-sample DCTs, the matrix involved will be so much larger than regular matrices used in LPC (I think 32x32 is the maximum for FLAC) that encoding will take significantly longer even if we're not using the DCT (which would be comparatively fast anyway).

Don't know what you mean by 32x32 and reduced seekability. FLAC's default block size is 4096 samples. I don't think that computing the LPC filter in FLAC via autocorrelation, and then applying the filter, is much faster than doing a, say, 1024-sample (M)DCT.

Quote
What I was thinking of was more along the lines of applying the DCT then coding the results of the transform directly.

If you partition the DCT spectrum and use an integer DCT, then that should be possible. In fact, IIRC, that's what HD-AAC is doing in pure lossless mode.

Regarding your question whether this is much more efficient than time-domain compression: that probably depends on the signal. For tonal signals (e.g. harpsichord), it might be. For speech, it probably isn't.

Chris
If I don't reply to your reply, it means I agree with you.

Lossless codecs in frequency domain?

Reply #15
Don't know what you mean by 32x32 and reduced seekability. FLAC's default block size is 4096 samples. I don't think that computing the LPC filter in FLAC via autocorrelation, and then applying the filter, is much faster than doing a, say, 1024-sample (M)DCT.


What I was considering was using linear prediction in frequency domain (as stated in my post), and not just applying a DCT and coding the coefficients. In using linear prediction with n past samples an n-by-n matrix (which normally wouldn't use up O(n^2) memory since there are only at most 2n+1 distinct elements) would be generated, and subsequently solved. According to Wikipedia, these matrices can be solved with O(n^2) time. With that, using an n-sample DCT blocks, and at least 2 past blocks with which to perform linear prediction, there would be n matrices of size at least (2n)x(2n) to invert (per frame). Note that the definition of "block" here differs as used in FLAC; unless we're going to forgo linear prediction post-DCT, we're going to need some kind of "superblock" containing multiple DCT blocks (which cannot be decoded out-of-order). Which leads to the reduced seekability, as mentioned. (This might still be okay if the "superblocks" have around 16 blocks and each block has 256 samples.) I pulled up these to illustrate the disadvantage of naively trying to use LPC in the frequency domain; there are probably better ways than applying linear prediction across all the DCT coefficients, which, as I mentioned, wouldn't be any different from just using linear prediction over a similar number of past samples.

Quote
If you partition the DCT spectrum and use an integer DCT, then that should be possible. In fact, IIRC, that's what HD-AAC is doing in pure lossless mode.


Doesn't HD-AAC use a lossy core and encode the difference? That's rather straightforward, but leads to the issue that I pointed out: using an inexact IDCT would leave an essentially random difference which will be nigh incompressible. Also, what do you mean by partitioning the spectrum? If I'm understanding you right (coding parts of the spectrum separately), how would doing so improve compressibility?

Quote
Regarding your question whether this is much more efficient than time-domain compression: that probably depends on the signal. For tonal signals (e.g. harpsichord), it might be. For speech, it probably isn't.

Chris


Thanks for the input! Now here's another crazy idea (as if we don't have enough already ): if we could somehow separate the "speech" part (highly compressible with LPC) and the "tonal" part (more compressible with DCT+entropy coding), this should produce somewhat better compression than either alone right? (The difficulty probably lies in the separation part.)

Lossless codecs in frequency domain?

Reply #16
The idea behind partitioning the spectrum is subband analysis. You can decompose it into equal segments or use logarithmic spacing to model hearing more accurately.

The "inexactitude" of the rounded DCT is not really an issue, primarily because reversible intDCTs do exist, but more importantly because there is no interest in reversing the DCT before applying some sort of quantization, as mentioned previously.

Question: Do you have access to MATLAB? You could try many of your ideas and see what kind of results you get. Also, i would suggest searching through scientific journals and papers to see what has already been done and what kind of advances have made recently; that could save you a lot of time. Google scholar is useful for this purpose.

Lossless codecs in frequency domain?

Reply #17
The idea behind partitioning the spectrum is subband analysis. You can decompose it into equal segments or use logarithmic spacing to model hearing more accurately.

Please correct me where I'm mistaken, but we're talking lossless here, not a perceptual lossy format.

What is the benefit of modeling (or mirroring) human hearing?
Creature of habit.

Lossless codecs in frequency domain?

Reply #18
The idea behind partitioning the spectrum is subband analysis. You can decompose it into equal segments or use logarithmic spacing to model hearing more accurately.

Please correct me where I'm mistaken, but we're talking lossless here, not a perceptual lossy format.

What is the benefit of modeling (or mirroring) human hearing?

Yes you're right. But you can also operate on different bands to apply decorrelation algorithms; and the natural way to divide the spectrum is logarithmically. This would actually be the case in many ways you would implement frequency-domain linear prediction (FDLP).

We're not trying to design a psychoacoustic model here. It's just a basic property of sound: frequency increases exponentially with a linearly increasing pitch.

Lossless codecs in frequency domain?

Reply #19
Doesn't HD-AAC use a lossy core and encode the difference?

Not if it runs in pure lossless mode. In that mode, the "difference" signal represents the input signal.

I think I understand your envisioned coding approach now. I guess you're right with the seekability etc. then.

Quote
Also, what do you mean by partitioning the spectrum? If I'm understanding you right (coding parts of the spectrum separately), how would doing so improve compressibility?

Correct. I was thinking along the lines of FLAC, which segments the time-domain prediction residual and adapts its entropy coder to each segment. You could do the same in the frequency domain, for example like JapanAudio suggested (actually, you could use any arbitrary segmentation if it gives good compression).

Quote
Thanks for the input! Now here's another crazy idea (as if we don't have enough already ): if we could somehow separate the "speech" part (highly compressible with LPC) and the "tonal" part (more compressible with DCT+entropy coding), this should produce somewhat better compression than either alone right? (The difficulty probably lies in the separation part.)

Exactly. (Exactly)

Chris
If I don't reply to your reply, it means I agree with you.

Lossless codecs in frequency domain?

Reply #20
That being said, I believe it is only a matter of time before more efficient and novel algorithms replace the primitive linear glottal model, that leads to LPC, which was meant for speech coding in the first place.

Actually, LPC is not that primitive after all, as I found out during my work. But you're right about the novel algorithms: an integer MDCT is used in HD-AAC (which by the way can also operate in pure-lossless mode).

Or for a 4 minute track, it would take 3.4 milliseconds!

Of course, if you want to do the MDCT, you'd have to pre/post rotate your data, which might add another millisecond to that runtime! Hope you can spare it while you wait 5 minutes an album for FLAC's linear prediction and entropy coding.

A little more polite and careful, please, saratoga! An (M)DCT of a waveform alone doesn't give you any compression, it's just a transform. So for a fair comparison, you would have to either compare only (M)DCT vs. LPC analysis and filtering, or compare (M)DCT + entropy coding vs. LPC + entropy coding. From my work experience in that field I can tell you that most of the time, entropy coding is more expensive that LPC or T/F transform, especially in the encoder. That taken into account, LPC- and transform-based codecs are probably quite close to each other speed-wise (with LPC-based codecs probably being a little faster on decoder side since usually, no analysis operation is needed there).


Isn't that what I said?  You "might add another millisecond" to the minutes you'll spend on the rest of the codec.

Also, a common flaw I see here on HA audio is that people tend to think only of PC-based applications of audio codecs. On portable devices with relatively weak processors, battery life is critical, and the industry is looking for codecs with minimum decoder complexity.


I'd like to point out that I wrote much of the current rockbox iMDCT, which is as far as I am aware the fastest embedded open source iMDCT, so I think I am well aware of these constraints

I've written about this in some detail here:

http://www.hydrogenaudio.org/forums/index....showtopic=82125

Edit:  And you asked me about it in that thread.  Funny you think I need reminding about it now!

Now take a guess why e.g. APE hasn't reached that market. Nobody likes it when your portable's battery is empty after playing two lossless songs (if it can play them at all).


This isn't entirely correct.  Ape hasn't reached those markets because until Rockbox wrote a free decoder and optimized it for mobile use, there was no decoder available.  Now that we provide one, several companies support APE on battery powered devices.  But yes, the battery hit is considerable for Ape compared to FLAC.

Lossless codecs in frequency domain?

Reply #21
Correct. I was thinking along the lines of FLAC, which segments the time-domain prediction residual and adapts its entropy coder to each segment. You could do the same in the frequency domain, for example like JapanAudio suggested (actually, you could use any arbitrary segmentation if it gives good compression).


I remember there was once a thread where someone compared lowpassed FLAC or Wavepack to lowpassed and resampled (to around Nyquist).  In theory they should have the same size since theres the same amount of information, but for at least the encoder tested the resampled file was smaller.  No idea if this was true in general, but it might suggest that some kind of critically sampled filterbank followed by linear prediction or something similar could work.  That said, I have no idea if it'd actually beat simple time domain coding.  Given how similar such a scheme would be to subband coding in MPEG and ATRAC, I suspect its been tried and abandoned before.

Lossless codecs in frequency domain?

Reply #22
Isn't that what I said?  You "might add another millisecond" to the minutes you'll spend on the rest of the codec.

Then I misunderstood you. Though - adding to your comparison - FLAC running on a RAM disk would probably be in the range of seconds or ms as well.

Quote
I'd like to point out that I wrote much of the current rockbox iMDCT, which is as far as I am aware the fastest embedded open source iMDCT, so I think I am well aware of these constraints

True. Anyway, it's not your fault that APE decoding is so expensive.  I remember an Athlon 900 MHz having problems decoding APE files with highest compression in real-time while surfing the web. That's expensive! Then again, the developer probably didn't spend too much thought on that when he came up with APE, not knowing/expecting that it would become so popular.

Chris
If I don't reply to your reply, it means I agree with you.

Lossless codecs in frequency domain?

Reply #23
True. Anyway, it's not your fault that APE decoding is so expensive.  I remember an Athlon 900 MHz having problems decoding APE files with highest compression in real-time while surfing the web. That's expensive! Then again, the developer probably didn't spend too much thought on that when he came up with APE, not knowing/expecting that it would become so popular.


Yes its not so bad now.  About 500MHz on ARM11 for the highest compression level, much less on Cortex processors where Neon is available.  c2000 and maybe c3000 are probably fast enough now that a cell phone type device won't have an unreasonable battery life hit. 

Lossless codecs in frequency domain?

Reply #24
What is the only fundamental mechanism that lossless coding can use? Signal correlation / redundancy, right? Typically the signal can be modelled by a (relatively) static, simple filter excitated by a random noise-source. Other correlations may exist at musically relevant distances, but those distances may render them irrelevant due to memory/computation cost.

I guess the larger one makes the "search space" for signal representation vectors that can express a part of the signal, the more compression is (ideally) achievable. But cost increase rapidly, while compression gain increase very modestly, right?

So what does FD bring us? Isnt the power spectrum just the FT of the temporal correlation function? Is it a computationally more efficient way of increasing the "search space" or what? I did calculate Yule-Walker equations by hand back in school, but I now have problems formulating what it is fundamentally that lossless compression does, and why flac does it better for typical audio than Lempel-Ziv.


To my eyes, waveforms like this seems quite "correlated", in the sense that it is far from random. How much of that correlation is a (low-order?) flac encoder able to remove? If compression is only 50%, it means that the residue is going to be quite significant power compared to the input signal, right? How much correlation is there (typically) left for an ideal lossless encoder that does not have to take (as many) practical considerations?

-k