Lossy encoding NOT based on perceptual encoding?

Topic: Lossy encoding NOT based on perceptual encoding? (Read 26244 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Lossy encoding NOT based on perceptual encoding?

2009-07-20 02:08:53

I was wondering if there has been any work done on lossy audio encoding that is NOT based on perceptual encoding.?

I don't know exactly a situation where you would need to deliver compressed audio w/o regard to its (humanly-perceived) quality. Maybe something to do with sonar, whales, or general atmospheric research?

Lossy encoding NOT based on perceptual encoding?

Reply #1 – 2009-07-20 02:40:53

You're asking about something that makes no sense. The point of lossy encoding is to throw out the parts that humans cannot hear. Lossy is intrinsically aimed at some target audience. Without a target audience, you cannot begin to make intelligent decisions about what data should be thrown out. Therefore, it makes no sense to consider lossy audio that has no target audience.

Lossy encoding NOT based on perceptual encoding?

Reply #2 – 2009-07-20 05:07:57

Bit shaving combined with lossless compression would count as "lossy encoding NOT based on perceptual encoding", and is often discussed here. lossyWAV is but one example. See link for discussion not only of the implementation itself, but also the considerations made, pros/cons, and the rationale. Far from making no sense, IMHO.

Lossy encoding NOT based on perceptual encoding?

Reply #3 – 2009-07-20 07:41:49

Stock ADPCM (remember that?) is a non-perceptual lossy algorithm.

Lossy encoding NOT based on perceptual encoding?

Reply #4 – 2009-07-20 08:12:30

Isn't WavPack lossy also a non-perceptual lossy encoder?

Lossy encoding NOT based on perceptual encoding?

Reply #5 – 2009-07-20 08:51:16

ADPCM and WavPack Lossy could be considered as lossy encoders with very simple perceptual algorithm, because they still reduce size based on assumption that humans will accept the difference to the original if they hear any difference at all.

Lossy encoding NOT based on perceptual encoding?

Reply #6 – 2009-07-20 09:29:47

Quote from: muaddib on 2009-07-20 08:51:16

ADPCM and WavPack Lossy could be considered as lossy encoders with very simple perceptual algorithm, because they still reduce size based on assumption that humans will accept the difference to the original if they hear any difference at all.

I'd say the same for LossyWAV given the discussions that has been regarding finetuning?

Lossy encoding NOT based on perceptual encoding?

Reply #7 – 2009-07-20 11:11:45

Quote from: odyssey on 2009-07-20 09:29:47

Quote from: muaddib on 2009-07-20 08:51:16
ADPCM and WavPack Lossy could be considered as lossy encoders with very simple perceptual algorithm, because they still reduce size based on assumption that humans will accept the difference to the original if they hear any difference at all.

I'd say the same for LossyWAV given the discussions that has been regarding finetuning?

Clearly LossyWAV can be used in a perceptual manner, and IIRC the default is to shave the bitdepth only to the edge of audability.
Its fundamental technique (of decreasing bitdepth), though, is not dependent on human audio perception - unlike any lossy encoder which tosses masked frequencies and the like.

Lossy encoding NOT based on perceptual encoding?

Reply #8 – 2009-07-20 11:31:39

Quote from: Soap on 2009-07-20 11:11:45

Clearly LossyWAV can be used in a perceptual manner, and IIRC the default is to shave the bitdepth only to the edge of audability. Its fundamental technique (of decreasing bitdepth), though, is not dependent on human audio perception - unlike any lossy encoder which tosses masked frequencies and the like.

Not true - the fundamental parameters of LossyWAV's operation (the widths of the spectrum analysis windows and the per-band masking estimation) are purely psychoacoustic in derivation. (One could imagine aliens with radically different cochlear structure, perhaps with a much coarser/finer cilia density or whatnot, for which the FFT window sizes used in LossyWAV would be unoptimal.)

The key point about LossyWAV is that it makes far, far fewer assertions about the nature of human hearing compared to other lossy encoding schemes - so that we can presumably be more certain about its transparency, for all possible inputs, than many other codecs.

Lossy encoding NOT based on perceptual encoding?

Reply #9 – 2009-07-20 12:38:20

Current wavpack lossy uses auto noise shaping and mid-side switching so there is a basic model there. That it often reaches near transparency at 250k or even lower is a testiment to that.

Lossy encoding NOT based on perceptual encoding?

Reply #10 – 2009-07-20 12:43:21

Quote from: Axon on 2009-07-20 11:31:39

Not true - the fundamental parameters of LossyWAV's operation (the widths of the spectrum analysis windows and the per-band masking estimation) are purely psychoacoustic in derivation. (One could imagine aliens with radically different cochlear structure, perhaps with a much coarser/finer cilia density or whatnot, for which the FFT window sizes used in LossyWAV would be unoptimal.)

The key point about LossyWAV is that it makes far, far fewer assertions about the nature of human hearing compared to other lossy encoding schemes - so that we can presumably be more certain about its transparency, for all possible inputs, than many other codecs.

I'm not sure what statement of mine you are disagreeing with. I stated that LossyWAV uses a basic perceptual model to determine audibility (though I spelled audibility wrong in that post), but that the basic technique in which LossyWAV throws away data is not itself based on perceptual encoding.
Shaving bitdepth and increasing the noise floor is a technique not dependent on an understanding of human hearing, and as such has little in common with most perceptual encoding techniques which exploit "holes" in human hearing.

Lossy encoding NOT based on perceptual encoding?

Reply #11 – 2009-07-20 13:25:17

They are not perceptual according to Florin Ghido:

Disadvantages of OptimFROG DualStream compared with transform coders
(TC), such as MPC, OGG, MP3, AAC, WMA etc.:
- as it does not take into account the human auditory system
limitations, it needs a much higher bitrate (up to twice) to achieve
perceptual transparency. However, together with reaching perceptual
transparency, many other important audio qualities are preserved

http://www.losslessaudio.org/DualStream.php

Lossy encoding NOT based on perceptual encoding?

Reply #12 – 2009-07-20 13:51:06

In a sense even optimizing RMS error is perceptual in that it assumes people are less likely to hear smaller errors

More often a codec is considered perceptual if its taking advantage of the specific time/bandwidth limits of human hearing or some other effect thats specific to how the ear works and not generally applicable to most systems (i.e. higher SNR == better).

Lossy encoding NOT based on perceptual encoding?

Reply #13 – 2009-07-20 15:50:21

Quote from: Soap on 2009-07-20 12:43:21

Shaving bitdepth and increasing the noise floor is a technique not dependent on an understanding of human hearing, and as such has little in common with most perceptual encoding techniques which exploit "holes" in human hearing.

No, but choosing what data to throw away is still based on human hearing. That's the point I'm making. The codecs that simply increase the noise floor without taking into account human hearing are not based on perceptual encoding, but they're roughly equivalent to simply decreasing bit-depth anyhow.

Edit: Put another way, is 16-bit PCM lossy? It's certainly lossy when starting with 24-bit sources. And 4-bit ADPCM is lossy when starting with just about any source. But are these "lossy codecs"? That seems to be the core of this discussion. What is a lossy codec, precisely? What is perceptual encoding? How far are we willing to stretch these terms?

I acknowledge that I'm probably wrong with respect to WavPack lossy and similar. However, that loss is basically equivalent to simply decreasing the bit-depth. Are we considering that to be "lossy encoding"? That makes sense to me in one regard, but by that same reasoning, it would make 16-bit PCM necessarily a "lossy codec" in certain frames of reference.

Lossy encoding NOT based on perceptual encoding?

Reply #14 – 2009-07-20 15:59:13

Wait. You're telling me that shaving bitdepth is roughly equivalent to simply decreasing bit-depth?

Lossy encoding NOT based on perceptual encoding?

Reply #15 – 2009-07-20 16:00:13

Quote from: shadowking on 2009-07-20 13:25:17

They are not perceptual according to Florin Ghido:

Quote

[...]
Advantages of OptimFROG DualStream compared to WavPack hybrid:
- true separate quantization levels for each channel
- quality mode maintains constant quality thorough the whole file
- advanced noise shaping option, improving transparency
[...]

They most certainly are. How can noise shaping not be a perceptual technique?

Lossy encoding NOT based on perceptual encoding?

Reply #16 – 2009-07-20 16:04:21

Quote from: Soap on 2009-07-20 15:59:13

Wait. You're telling me that shaving bitdepth is roughly equivalent to simply decreasing bit-depth?

Yup. By increasing the noise floor, you decrease the amount of information needed to reconstruct the signal in a manner precisely analogous to decreasing bit-depth. Unless I've got some massive misunderstanding going on...

Lossy encoding NOT based on perceptual encoding?

Reply #17 – 2009-07-20 16:06:05

Quote from: Soap on 2009-07-20 12:43:21

I'm not sure what statement of mine you are disagreeing with. I stated that LossyWAV uses a basic perceptual model to determine audibility (though I spelled audibility wrong in that post), but that the basic technique in which LossyWAV throws away data is not itself based on perceptual encoding.
Shaving bitdepth and increasing the noise floor is a technique not dependent on an understanding of human hearing, and as such has little in common with most perceptual encoding techniques which exploit "holes" in human hearing.

You really seem totally confused.

What do you think MP3 encoders do beside determine audibility and shave off bits where they can afford to?

What do you think a perceptual model does besides finding "holes in the human hearing"?

Lossy encoding NOT based on perceptual encoding?

Reply #18 – 2009-07-20 16:19:57

Quote from: Garf on 2009-07-20 16:06:05

What do you think MP3 encoders do beside determine audibility and shave off bits where they can afford to?

What do you think a perceptual model does besides finding "holes in the human hearing"?

I'm the one confused?
I don't know where I went wrong with my explanation - but clearly I did, for you are the third to tell me I'm wrong and then basically state what I've already stated.

I said bit depth, not rate (though we can argue all day that is a distinction w/o a difference).

LossyWAV is an example of decreasing bitdepth (increasing noise), not tossing content which the human system has trouble hearing (unless you want to argue low-amplitude sounds are just that) due to masking and that ilk. This technique is not judging the audibility of transients, masked frequencies, etc.
The fact LossyWAV does attempt (successfully) to determine audibility is something I mentioned in my first post. The point was, though, that LossyWAV is an example of decreasing bitdepth, which is an easy technique and can be applied blindly, can be considered a lossy encoding style, and need not be based on human perception.

Lossy encoding NOT based on perceptual encoding?

Reply #19 – 2009-07-20 16:39:23

Quote from: Soap on 2009-07-20 16:19:57

LossyWAV is an example of decreasing bitdepth (increasing noise), not tossing content which the human system has trouble hearing (unless you want to argue low-amplitude sounds are just that) due to masking and that ilk.

But these really are the same thing, which is why I asked you those 2 questions.

People think the MP3 encoders "remove sounds", but they do no such thing. What they really do is just add noise (by reducing bitdepth/rate to code each band!). They use a perceptual model to know where they can do this so it affects audibility the least.

There really is no difference at all between the two.

Lossy encoding NOT based on perceptual encoding?

Reply #20 – 2009-07-20 16:49:36

So mp3 and co use a much more aggressive method of adding noise to reduce bitrate as the psymodel is more advanced . I think thats the way i understand it.

Lossy encoding NOT based on perceptual encoding?

Reply #21 – 2009-07-20 16:50:28

They lowpass, they do all sorts of tricks (which their model tells them are unlikely to be audible) to improve compression efficiency. They do not simply decrease precision.

Across the board decreasing the bitdepth isn't the same game at all. There are similarities, but the differences are so great that comparisons are invalid IMHO.
And as I stated from my first post (I'll work on my clarity), LossyWAV attempts to decrease bitdepth with applied intelligence (a basic psymodel if you will), but decreasing bitdepth ("bitshaving") need not be done with the aid of a model.

Lossy encoding NOT based on perceptual encoding?

Reply #22 – 2009-07-20 17:18:42

Quote from: shadowking on 2009-07-20 16:49:36

So mp3 and co use a much more aggressive method of adding noise to reduce bitrate as the psymodel is more advanced . I think thats the way i understand it.

Exactly. Plus a little more.

This "adding noise by quantization" works if the signal is "self-dithering". In transform codecs you also have nonlinear quantization artefacts for very low signal-to-noise ratios (for "nonlinear artefacts" read "not really noise anymore"). This is where the "metallic/watery" sounds come from. Also, the really high frequencies are usually chopped off ("quantized to zero" so to speak).

Finally you can go further into the parametric coding direction (perceptual noise substitution, intensity stereo, ...)

Still, the "adding noise by quantization"-idea (reduced precision) is the most important one that is involved, Soap. Pretty much anything else that is part of such a codec (filterbank, huffman coding, etc) I would classify as "noiseless" building blocks.

Cheers!
SG

Lossy encoding NOT based on perceptual encoding?

Reply #23 – 2009-07-20 17:41:31

Every AD conversion is already a lossy process. You choose quantization word length and rate according to your needs. There are plenty of use cases, where those parameters are not chosen based on perceptual models. Take two sonars, A with a very long range but coarse, B with a short range but very precise. If both systems are able to process about the same data rate, you'd choose a higher sensitivity and word length for A and a higher temporal resolution for B while sacrificing the other.

Lossy encoding NOT based on perceptual encoding?

Reply #24 – 2009-07-20 17:42:14

Quote from: SebastianG on 2009-07-20 17:18:42

Quote from: shadowking on 2009-07-20 16:49:36
So mp3 and co use a much more aggressive method of adding noise to reduce bitrate as the psymodel is more advanced . I think thats the way i understand it.

Exactly. Plus a little more.

This "adding noise by quantization" works if the signal is "self-dithering". In transform codecs you also have nonlinear quantization artefacts for very low signal-to-noise ratios (for "nonlinear artefacts" read "not really noise anymore"). This is where the "metallic/watery" sounds come from. Also, the really high frequencies are usually chopped off ("quantized to zero" so to speak).

Finally you can go further into the parametric coding direction (perceptual noise substitution, intensity stereo, ...)

Still, the "adding noise by quantization"-idea (reduced precision) is the most important one that is involved, Soap. Pretty much anything else that is part of such a codec (filterbank, huffman coding, etc) I would classify as "noiseless" building blocks.

Cheers!
SG

Well, that's a bit off, I think. MP3 and the usual run of perceptual encoders change the bit resolution in a filterbank representation based on a perceptual model crossed with the bit rate requirements.

Dithering in the quantizer is not to be desired, it would introduce more noise at high frequencies. Controlling watery/jingling sounds comes about more from making sure that you don't have lines popping on and off, on and off, across sucessive blocks.

Some other non-perceptual methods (by the definition I use in my tutorials, i.e. those that have no explicit perceptual model, as opposed to some passive accomodation) are most any ADPCM, but not including the word done at Lucent that did perceptual noise shaping, G722, most CELP for voice (postfiltering is a kinda-sorta-perceptual technique), APCM (does anyone really use it?), delta-mod of various kinds (cvsd, etc), zip archiving, Unix ? compress ...

All non-perceptual at the heart, they don't actively change noise injection according to an explicit perceptual model.

Notice