Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: 'No Free Lunch'-Algorithms in Lossy formats? (Read 7049 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

'No Free Lunch'-Algorithms in Lossy formats?

While daydreaming today I asked myself whether lossy formats use techniques to recognise which 'compressing' technique should be used on which sample. Somewhat in the vain of the "No Free Lunch" theorem (Link). Implying that there is no optimal technique for all samples.

Assuming that there are different kinds of compression techniques (/psymodels?) and each sample has its own optimal technique, it would follow that an optimal compression algorithm would be an adaptive one. Where for each sample a prediction is being made as to which technique to use.

Could anyone share light on whether similar techniques are already being used, or whether I'm talking complete rubbish? I'm new and a bit rusty in the computer science department, so my deepest apologies.

'No Free Lunch'-Algorithms in Lossy formats?

Reply #1
Quote
Assuming that there are different kinds of compression techniques (/psymodels?) and each sample has its own optimal technique, it would follow that an optimal compression algorithm would be an adaptive one. Where for each sample a prediction is being made as to which technique to use.


I am confused rate-distortion theory is this what you are thinking of? 
budding I.T professional

'No Free Lunch'-Algorithms in Lossy formats?

Reply #2
I am confused rate-distortion theory is this what you are thinking of? 


No, I don't think so. Although I'm not well aware with this theory.

I'm wondering, for instance, if lossy compression techniques, like lame, use multiple perceptual models when compressing a single track. Or that it uses a single perceptual model (perhaps in the assumption that a single model is optimal for all samples/tracks).

I hope I made it more clear this time.

Edit: Perhaps I haven't got a good idea what a perceptual model is. I'm assuming there are multiple models, and perhaps that encoders use only one in general.

'No Free Lunch'-Algorithms in Lossy formats?

Reply #3

I am confused rate-distortion theory is this what you are thinking of? 


No, I don't think so. Although I'm not well aware with this theory.

I'm wondering, for instance, if lossy compression techniques, like lame, use multiple perceptual models when compressing a single track. Or that it uses a single perceptual model (perhaps in the assumption that a single model is optimal for all samples/tracks).

I hope I made it more clear this time.

Edit: Perhaps I haven't got a good idea what a perceptual model is. I'm assuming there are multiple models, and perhaps that encoders use only one in general.


Well, a psy model is a way of deciding mathematically the best way to encode something. So you're suggesting there would be some way of deciding which way of deciding the best way of encoding something would be? How do you propose to do that?

My point is, the psy model does exactly what you are describing. You'd need some sort of psy model to decide between different psy models!

'No Free Lunch'-Algorithms in Lossy formats?

Reply #4
Perhaps certain practical issues has to be considering when speccing a codec, such as finite complexity, delay, cpu demands, etc.

Perhaps this means that a state of the art codec performs worse for certain material, for certain bitrates etc, compared to the current state of the art knowledge about human perception.

It has been said that although mp3 is generally more efficient at low bitrates, mp2 may(?) be better at very high bitrates.

Perhaps knowledge of this gathered through extensive listening tests means that an ensemble of multiple codecs can gain a little compared to each on its own.

However, I am guessing that the benefit, if any, would be very small compared to the effort. And that the knowledge gained through those large-scale listening tests should rather be pumped into improving and replacing codecs?

-k

'No Free Lunch'-Algorithms in Lossy formats?

Reply #5
Quote
I'm wondering, for instance, if lossy compression techniques, like lame, use multiple perceptual models when compressing a single track. Or that it uses a single perceptual model (perhaps in the assumption that a single model is optimal for all samples/tracks).


Yes, it's a single psychacoustics model called gpsycho with Lame. Basically I think what you are trying to say is the psychoacoustics model adjusted for different samples on different types of music? The answer would be no that would be to infinitely complex. If you take into account ATH levels, masking, etc.
budding I.T professional

'No Free Lunch'-Algorithms in Lossy formats?

Reply #6
What is a sample? It is a measure of signal level, nothing more. Individual samples have no audio meaning; they must be utilized within the gestalt of what went before and what comes next.

'No Free Lunch'-Algorithms in Lossy formats?

Reply #7
Thanks for the answers. Generally, that was the idea I had in mind.

Quote
Well, a psy model is a way of deciding mathematically the best way to encode something. So you're suggesting there would be some way of deciding which way of deciding the best way of encoding something would be? How do you propose to do that?


I was thinking these models can (could) be highly specialized. That they work better in different circumstances, so that in one point psymodel X gives better results than Y. Like in the No Free Lunch analogy: there is no general optimal search-algorithm for all search-spaces, or here, no best psymodel for all music (or like knutinh pointed at, bitrates). And that every specific instance will have it's own best algorithm.
I have to remind of course, I was daydreaming.

Quote
You'd need some sort of psy model to decide between different psy models!


haha! Yeah, haven't thought of that. You're right. The meta model. I'd almost think this would automatically reduce my idea to rubbish. But the option of having multiple specialized models could still be useful, I think.

Quote
Perhaps certain practical issues has to be considering when speccing a codec, such as finite complexity, delay, cpu demands, etc.


Encoding would be more complex, yes. But first of all, I'm wondering whether it would be a good idea, independent of whether it would be computationally too complex.
And besides, to a certain extent computational/complexity issues could be solved over time (computers becoming faster, hyperthreading technologies etc...)

Quote
The answer would be no that would be to infinitely complex. If you take into account ATH levels, masking, etc.


Could you provide me links on these subjects? I'm still thinking it should be possible to have a solution with finite complexity.

Quote
What is a sample? It is a measure of signal level, nothing more. Individual samples have no audio meaning; they must be utilized within the gestalt of what went before and what comes next.


I admit I was thinking about individual samples in the first place, but I understand that psymodels don't work that way.
So, a 'sample' would be the collection of samples which a psymodel takes into account. This could be the whole track, or there could be an algorithm to decide, probably on the basis of complexity, what the optimal size would be. The higher the complexity, the smaller the collection. (I'm just thinking as I go here. I have no idea how these things actually work).

On the other hand, perhaps it is only the bitrate which would decide what would be the best psymodel. Like knutinh said.

Quote
However, I am guessing that the benefit, if any, would be very small compared to the effort. And that the knowledge gained through those large-scale listening tests should rather be pumped into improving and replacing codecs?


I was hoping this could be a way to avoid certain encoding artifacts (implying that those would be related to the psymodels, which I don't know whether that is indeed the case. So my next question is whether these artifacts are related to the psymodels.) And perhaps also better compression-rates. (And of course, higher quality.)

This is just out-of-the-box thinking, and at this point I'm not (yet) interested whether the effort would be equally rewarded.

'No Free Lunch'-Algorithms in Lossy formats?

Reply #8
The most important point is that lossy codecs are already highly specialized. The number of signals they can meaningfully compress without destroying is very small compared to the size of the problem domain. Generally, No-Free-Lunch is used to argue against the application of limited domain algorithms to problems where the domain is not well understood. In the case of lossy codecs, the problem domain is fairly well understood.

From the wikipedia article:
Quote
For practising engineers and other optimization professionals, the theorem justifies the view that as much prior domain knowledge should be utilized as possible, and custom optimization routines constructed for particular domains.
That's precisely what LAME and other lossy codecs do - apply large amounts of knowledge of the problem domain to find a very specific solution. For example - MP3 works nicely to compress music samples, but would be a horrible solution to compressing radar signals.

'No Free Lunch'-Algorithms in Lossy formats?

Reply #9
This "no free lunch" "theorem" (which is not a theorem but a conjecture) is just telling, in the context of audio coding, that all "fixed" algorithm would, on average, produce the same results.
In our context, a fixed algorithm would mean something like a basic RD based selection, with distortion computed as a trivial SNR value.

First, we know that basic SNR decision does not work well in current audio codecs, so the conjecture still holds.

Now, let's have a look at current encoders:

*the input corpus is supposed to be audio targetted to humans, and not noise. This is already a very big reduction in the distribution of the input corpus, so the "overall" of the conjecture does not apply in our context.

*A psymodel is a set of adaptative rules applied according to analysis of the input signal, and not a fixed thing. Psymodel is selecting between different coding modes (long/short, ms/lr, pns,...) and adjusting its computations based on signal (ath, masking,....).
You can already consider the different coding modes to be some different coding algorithms, so this broaden your good efficiency area over the input signals corpus.
As the psymodel is adjusting its computed values dynamically, this broaden even more your efficiency area.

'No Free Lunch'-Algorithms in Lossy formats?

Reply #10
Different psymodels do not make sense in this area. A psymodel does not target a piece of music - it targets a single human!

It does not make sense to propose different psymodels, "because depending on the sample one may perform better than the other". This doesnt add up, because psymodels arent about samples, but about listeners.

Psymodels target humans.
The compression-techique targets samples.

Theoretically, it is still true that no psymodel will perform optimally on every HUMAN. But in practice, how do you propose to change that? Does the LISTENER need to do some kind of hearing-test, so that a psymodel just for him should be created? But what about other people listening to the music of this listener? The point is: in our current music world, it is actually DESIRABLE to have a psymodels which performs "averagely good" on every LISTENER. We actually want music, which sounds acceptable to the highest amount of people. It does not need to be perfect for every, it just should be acceptable for everyone. So, a general-purpose psymodel.

Regarding compression-techniques..... thats a different matter. Here, your theorem may make sense. But the problem here is COST. For a compression-technique to become popular, it needs to be resource-friendly, so that it gets implemented in as much software as possible.

- Lyx
I am arrogant and I can afford it because I deliver.

 

'No Free Lunch'-Algorithms in Lossy formats?

Reply #11
Thanks, guys. The responses were very interesting, to say the least.

If my understanding is correct, cabbagerat makes a point which is made by Gabriel as well. Namely that the lossy codecs are already highly specialized. So there's no need to try and look for any algorithms which are even more specialized. The domain itself is already too specific to have any meaningful subdomains for which there could be specialized algorithms.
Although that is highly likely, or rather a given fact, I'd like to go through the theoretical possibility that there could still be made some advance at this point. Classic music just seems way too different to be treated the same as metal. I know I'm reasoning from my own perceptions, instead of a technical point of view. This is just poking to see whether there are still some hidden loose ends.

This is where the second point of Gabriel kicks in: "A psymodel is a set of adaptative rules applied according to analysis of the input signal". Well, that's an end of story for me on one hand, but a 'begin story' on the other hand.

Can anyone send me in the direction to more technical information on these psymodels? I'm interested to see the algorithms which are used in these models. My background is in artificial intelligence, so my first interest is in how these models adapt. Given certain features of the input-signal, how does the model decide to select certain coding modes? Second, what are these different coding modes? And finally, what information do the models use from the signal (and how do they retract it)?

I'm hoping for some online papers. So if anyone could send me in the direction of some papers/studies, I'd be really thankful.

This is purely from interest, btw. I'm not doing this for study or work.

Finally, in response to Lyx: you've made an interesting point. I admit, I hadn't seen this target point-of-view. But, although I agree with you that the psymodel should be aimed at the avarage human (and there's little or no use for being able to change these models for individual users with specific needs), the fact that the psymodel is developed with the avarage human being in mind doesn't imply that the result will be equally good for every input.
My point being: the quality of a psymodel is not necessarily dependent on how well it represents the avarage human. A perfect psymodel, in terms of representing the target, could very well give bad results on certain input-signals (at least, based on what i know at this point of these models). I don't think representing the target is the entire story to determine the quality of a psymodel. But you can prove me wrong, of course!