Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Lossy encoding NOT based on perceptual encoding? (Read 26647 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Lossy encoding NOT based on perceptual encoding?

Reply #25
LossyWAV is an example of decreasing bitdepth (increasing noise), not tossing content which the human system has trouble hearing (unless you want to argue low-amplitude sounds are just that) due to masking and that ilk.


But these really are the same thing, which is why I asked you those 2 questions.

People think the MP3 encoders "remove sounds", but they do no such thing. What they really do is just add noise (by reducing bitdepth/rate to code each band!). They use a perceptual model to know where they can do this so it affects audibility the least.

There really is no difference at all between the two.


Well, in that many lines in the spectrum are reduced to zero bits, and thereby zeroed in the spectrum, this is a way of removing parts of the signal.  What you're realy doing is removing INFORMATION that isn't audible, by injuecting noise through the process of using large quantizer step sizes.

And rate loops, by and large, are not strictly speaking "bit allocation", rather they allocate noise and count the bits that result.
-----
J. D. (jj) Johnston

Lossy encoding NOT based on perceptual encoding?

Reply #26
You're asking about something that makes no sense. The point of lossy encoding is to throw out the parts that humans cannot hear. Lossy is intrinsically aimed at some target audience. Without a target audience, you cannot begin to make intelligent decisions about what data should be thrown out. Therefore, it makes no sense to consider lossy audio that has no target audience.


Sure it makes sense.  Lossless compressors decrease size by taking advantage of the redundancy in the data.  There is no model of perception there (although there is a degree of modelling the content).

You could take a lossless codec like FLAC and  do something like add rate distortion optimization on the coded residue without using a psychoacoustic metric for distortion and instead just optimizing for the best SNR.

I'm not sure *why* you'd want this... music for dolphins perhaps.  But it's not an illogical question.


Lossy encoding NOT based on perceptual encoding?

Reply #27
Well, that's a bit off, I think. MP3 and the usual run of perceptual encoders change the bit resolution in a filterbank representation based on a perceptual model crossed with the bit rate requirements.

I focused on the parts that are lossy. Of course, this is done in some domain that is favorable w.r.t. energy compaction (or "diagonalization") and allows clever distribution of the noise.

Dithering in the quantizer is not to be desired, it would introduce more noise at high frequencies. Controlling watery/jingling sounds comes about more from making sure that you don't have lines popping on and off, on and off, across sucessive blocks.

I think you should reconsider this. I actually expect subtractive dithering to increase the perceived quality-per-bit ratio when applied correctly and selectivly. You could use subtractive dithering to blend smoothly between low-SNR-quantization and PNS for noisy-type signal parts. If it's supposed to sound noisy I want my artefacts to be noisy as well and not just some "lines popping on and off". Those weird high frequency artefacts can be really annoying...

Some other non-perceptual methods (by the definition I use in my tutorials, i.e. those that have no explicit perceptual model, as opposed to some passive accomodation) are most any ADPCM, but not including the word done at Lucent that did perceptual noise shaping, G722, most CELP for voice (postfiltering is a kinda-sorta-perceptual technique), [...]

I think most CELP coders come with a code book search that applies perceptual weighting -- albeit a very simple one (that is possibly based on the LPC filter). I guess this is somewhere in the gray area between "perceptual" and what you call "passive accomodation". :-)

Cheers!
SG

Lossy encoding NOT based on perceptual encoding?

Reply #28
I think you should reconsider this. I actually expect subtractive dithering to increase the perceived quality-per-bit ratio when applied correctly and selectivly.


It will raise the bit rate of the regiions of the spectrum that are being subtractively dithered. This is not necessarily what you want or need in order to get the best result.

As to the fixed-Q noise shaping in some CELPS, I suppose youc ould cal it an attmept. It certanily imoroved quality.
-----
J. D. (jj) Johnston

Lossy encoding NOT based on perceptual encoding?

Reply #29
I'm not sure *why* you'd want this... music for dolphins perhaps.
That's the point I'm getting at. Unless you have a reason to do it, what sense does doing it make?

Lossy encoding NOT based on perceptual encoding?

Reply #30
I think you should reconsider this. I actually expect subtractive dithering to increase the perceived quality-per-bit ratio when applied correctly and selectivly.

It will raise the bit rate of the regiions of the spectrum that are being subtractively dithered.

I think you're missing something. The idea of subtractive dithering is to use the same dither signal on both sides (encoder and decoder). This has two implications: Since the noise is subtracted again at the decoder side "it is not part of" the overall error (--> higher SNR compared to additive dithering using the same linear quantizer). Also, it allows you to account for shifted PDFs the noiseless coding stage has to deal with. Actually, selecting code books (same quantizer step size, different offsets) in a pseudo-random fashion is equivalent to subtractive dithering.

Cheers!
SG

Lossy encoding NOT based on perceptual encoding?

Reply #31
I think you should reconsider this. I actually expect subtractive dithering to increase the perceived quality-per-bit ratio when applied correctly and selectivly.

It will raise the bit rate of the regiions of the spectrum that are being subtractively dithered.

I think you're missing something. The idea of subtractive dithering is to use the same dither signal on both sides (encoder and decoder). This has two implications: Since the noise is subtracted again at the decoder side "it is not part of" the overall error (--> higher SNR compared to additive dithering using the same linear quantizer). Also, it allows you to account for shifted PDFs the noiseless coding stage has to deal with. Actually, selecting code books (same quantizer step size, different offsets) in a pseudo-random fashion is equivalent to subtractive dithering.

Cheers!
SG


Yes, I undestand subtractive dither. HOWEVER, it will mean that there are more quantized values that are non-zero, and in doing so, will raise the BIT RATE. Not the noise level, the BIT RATE, and ergo raise the noise level elsewhere because you'll have to back down on bits elsewhere.

The problem is not that it adds noise, but that it adds BIT RATE. Consider, if I have something that is 7/8 zeros, and I dither, I'm going to be back to 1 bit per line vs. .2 bits per line.  At high frequencies, that is a bleepin' lot of bits to lose at low rates.
-----
J. D. (jj) Johnston

Lossy encoding NOT based on perceptual encoding?

Reply #32
Yes, I undestand subtractive dither. HOWEVER, it will mean that there are more quantized values that are non-zero, and in doing so, will raise the BIT RATE. Not the noise level, the BIT RATE, and ergo raise the noise level elsewhere because you'll have to back down on bits elsewhere.

The problem is not that it adds noise, but that it adds BIT RATE. Consider, if I have something that is 7/8 zeros, and I dither, I'm going to be back to 1 bit per line vs. .2 bits per line.  At high frequencies, that is a bleepin' lot of bits to lose at low rates.

Agreed. I recently took a look at the number of MDCT lines which are not quantized to zero in AAC, and even at relatively high bit rates (in terms of bits per sample), the number is surprisingly low. I would say that for certain bit rates, almost 90% of the high-frequency part of the spectrum is quantized to zero. Please correct me if I'm wrong, but I think that, depending on the PDF of the dither, dithering would lower this to 50%.

Going back to earlier posts: I never became friends with the (apparently common) notion that transform coders add noise which is inaudible. I also prefer calling it "removal of spectral and temporal acoustic information and trying to conceal this removal from the human listener". Time-domain coders, on the other hand, usually introduce wide-band noise which they then try to hide psychacoustically by means of spectral (and temporal) noise shaping. That's also what LossyWAV does.

Chris
If I don't reply to your reply, it means I agree with you.

Lossy encoding NOT based on perceptual encoding?

Reply #33
Well, in that many lines in the spectrum are reduced to zero bits, and thereby zeroed in the spectrum, this is a way of removing parts of the signal.  What you're realy doing is removing INFORMATION that isn't audible, by injecting noise through the process of using large quantizer step sizes.


I still don't really understand the difference between noise-shaping and dither, which I think is related to what you are saying, but is "adding noise" a function/goal, or is just a bi-product of removing information? It seems counter-productive to add bit rate in order to mask some destructive operation you have done on the audio data..

Lossy encoding NOT based on perceptual encoding?

Reply #34
The last page seems to be focused on nitty gritty details.

To reply the thread starter:
Isnt it kind of obvious that if you have a lossy encoding, meaning that errors are introduced to the signal in order to save bits, then any implementation will have some error-characteristic, and that characteristic is going to be the result of some implicit or explicit model of the application that is to use the decoded data (such as a human listener).

Now, the implicit/explicit model may be good or bad, leading to better or worse compromises between rate and perceptual distortion. Even an analog tape recorder with no Dolby noise reduction has some perceptually sensible use of the physics of the magnetic media.

-k

Lossy encoding NOT based on perceptual encoding?

Reply #35
Well, in that many lines in the spectrum are reduced to zero bits, and thereby zeroed in the spectrum, this is a way of removing parts of the signal.  What you're realy doing is removing INFORMATION that isn't audible, by injecting noise through the process of using large quantizer step sizes.


I still don't really understand the difference between noise-shaping and dither, which I think is related to what you are saying, but is "adding noise" a function/goal, or is just a bi-product of removing information? It seems counter-productive to add bit rate in order to mask some destructive operation you have done on the audio data..


Well, in the classical sense I'm talking about neither. You could think of what a perceptual coder does as signal-dependent noise shaping, but that's a gross oversimplification.
-----
J. D. (jj) Johnston

Lossy encoding NOT based on perceptual encoding?

Reply #36
The last page seems to be focused on nitty gritty details.

To reply the thread starter:
Isnt it kind of obvious that if you have a lossy encoding, meaning that errors are introduced to the signal in order to save bits, then any implementation will have some error-characteristic, and that characteristic is going to be the result of some implicit or explicit model of the application that is to use the decoded data (such as a human listener).

Now, the implicit/explicit model may be good or bad, leading to better or worse compromises between rate and perceptual distortion. Even an analog tape recorder with no Dolby noise reduction has some perceptually sensible use of the physics of the magnetic media.

-k


Well, as I said, I tend to reserve the term "perceptual encoder" for encoders that have a specific, explicit perceptual model.

This is to separate them from LMS coders, that may have some fixed or signal-dependent noise shaping based on rule, or from homomorphic models.
-----
J. D. (jj) Johnston

Lossy encoding NOT based on perceptual encoding?

Reply #37
To reply the thread starter:
Isnt it kind of obvious that if you have a lossy encoding, meaning that errors are introduced to the signal in order to save bits, then any implementation will have some error-characteristic, and that characteristic is going to be the result of some implicit or explicit model of the application that is to use the decoded data (such as a human listener).


For example sake, say you are using a microphone  to measure/record air pressure and atmospheric qualities.  That analog-to-digital data must be stored, and to be stored it must be encoded.  A human ear may never even listen to that data, as it may go into some processing application.  So if a lossy encoding scheme is employed in this case, it would be difficult in my mind to describe it as perceptual.

I didn't really want to get knee-deep in examples, because I don't feel I'm knowledgeable enough on digital audio to properly explain enough examples to encompass my original query, but hopefully you get the gist.

Lossy encoding NOT based on perceptual encoding?

Reply #38
What you are talking about makes perfect sense to me. There are many types of sampled analog data that have nothing to do with human audio perception. There is your example of sounds which are basically too low in frequency to hear, or seismic data (which are basically sounds in the ground), or even non-audio data like EKGs. I'm sure there are also cases where perfectly audible sounds are being used for some purpose other than listening (like scientific study or process control) where regular perceptual encoders might destroy important (but likely inaudible) details.

For data like this an obvious first choice would be lossless encoders, and I have been approached about using WavPack for a whole host of non-audio applications (especially because WavPack can handle floating point data). But I'm sure there are also situations in which non-perceptual lossy encoding would be appropriate, but the characteristics of the signal and the relevant information would have to be taken into account before knowing what would work.

Lossy encoding NOT based on perceptual encoding?

Reply #39
I think that last line hits the point.

Any lossy encoder works on the basis that a smaller difference is better - the question is what domain you measure that difference in - or how you calculate it.

The calculation may have a little perceptual relevance, a lot, or none at all. In truth it's quite hard to have none at all - once you looking in a log, power, or floating point domain, you could argue it's got some basis on perception - though you could also argue that the world is logarithmic, and linear calculations are somewhat artificial / unnatural. (That argument could get quite philosiphical!).

So on this basis, the line between perceptual and non-perceptual becomes quite fuzzy.


Much clearer is to look at the various limitations of human hearing, and to say in what way, if at all, a given codec exploits them. If there is a cut off, it's between the codecs that (at least) dynamically spectrally shape the coding noise related to some estimate of short-term spectral masking, and those that don't.

That's what I'd call a perceptual codec, but it's probably abusing the term. I can't think of a simple term that properly nails this distinction though!


mp2, mp3, aac, vorbis, musepack etc are "codecs that dynamically spectrally shape the coding noise related to some estimate of short-term spectral masking".

lossyWAV (at present), NICAM, WavPack (most / ?all? lossy modes), ADPCM, a-law, u-law etc are not "codecs that dynamically spectrally shape the coding noise related to some estimate of short-term spectral masking"

Some of those codecs in the second list have things like log or floating point representation of sample values, fixed noise shaping related to hearing threshold, and even something approaching a spectral masking calculation - these are perceptually related tricks - but the codecs don't dynamically spectrally shape the coding noise.

It's one useful distinction.

Cheers,
David.

Lossy encoding NOT based on perceptual encoding?

Reply #40

I think you're missing something. The idea of subtractive dithering is to use the same dither signal on both sides (encoder and decoder). This has two implications: Since the noise is subtracted again at the decoder side "it is not part of" the overall error (--> higher SNR compared to additive dithering using the same linear quantizer). Also, it allows you to account for shifted PDFs the noiseless coding stage has to deal with. Actually, selecting code books (same quantizer step size, different offsets) in a pseudo-random fashion is equivalent to subtractive dithering.

Yes, I undestand subtractive dither. HOWEVER, it will mean that there are more quantized values that are non-zero, and in doing so, will raise the BIT RATE.

I think you missed my point. I'm not considering an entropy coder that ignores the known dither signal. The enctropy coder I would use accounts for the known dither signal.

The problem is not that it adds noise, but that it adds BIT RATE. Consider, if I have something that is 7/8 zeros, and I dither, I'm going to be back to 1 bit per line vs. .2 bits per line.  At high frequencies, that is a bleepin' lot of bits to lose at low rates.

Let's assume we have a memoryless source which emits symbols from a 3-symbols alphabet, "zero", "pulsone", and "minusone" with probabilities 7/8, 1/16, 1/16. The entropy is 0.669 bits. That's much more than 0.2 bits.

Also, this is a good example of what I was talking about w.r.t. sound of the artefacts and "linearity". If it is supposed to sound noisy and we don't have "enough bitrate" the undithered mid-tread scalar quantizer will sound awful.

Cheers!
SG

Lossy encoding NOT based on perceptual encoding?

Reply #41
Let's assume we have a memoryless source which emits symbols from a 3-symbols alphabet, "zero", "pulsone", and "minusone" with probabilities 7/8, 1/16, 1/16. The entropy is 0.669 bits. That's much more than 0.2 bits.

Also, this is a good example of what I was talking about w.r.t. sound of the artefacts and "linearity". If it is supposed to sound noisy and we don't have "enough bitrate" the undithered mid-tread scalar quantizer will sound awful.

Cheers!
SG


The problem with your point is that if you capture more of the signal, you have more entropy to code, no matter how you cut it.

As to the mid-tread quantizer sounding awful, that's what all the best coders use.
-----
J. D. (jj) Johnston

Lossy encoding NOT based on perceptual encoding?

Reply #42
The problem with your point is that if you capture more of the signal, you have more entropy to code, no matter how you cut it.

What do you mean by "capture more of the signal"? Please explain.

As to the mid-tread quantizer sounding awful, that's what all the best coders use.

OK. So, that's proof that you can't do better, right?

Check out JPEG2000 part 2, specifically the trellis-coded quantization part. Check out "CELT" (Jean-Marc Valin's new low-delay full-bandwidth speech coder).

Cheers!
SG

Lossy encoding NOT based on perceptual encoding?

Reply #43
The problem with your point is that if you capture more of the signal, you have more entropy to code, no matter how you cut it.

What do you mean by "capture more of the signal"? Please explain.


Consider, when  you capture the under .5 step size information, you are adding information to the bitstream.

Yes. Really.
Quote
As to the mid-tread quantizer sounding awful, that's what all the best coders use.

OK. So, that's proof that you can't do better, right?

Check out JPEG2000 part 2, specifically the trellis-coded quantization part. Check out "CELT" (Jean-Marc Valin's new low-delay full-bandwidth speech coder).

Cheers!
SG


Well, the issue really is "how compressible" (in the noiseless sense) is the data. There is a tradeoff, and the results to date seem pretty clear for audio.
-----
J. D. (jj) Johnston

 

Lossy encoding NOT based on perceptual encoding?

Reply #44
> The point of lossy encoding is to throw out the parts that humans cannot hear.

Not always. Speech coding (in particular low bit rate) doesn't have perceptual components.
In general, if you are at low bit rate, the best approach to lossy compression is to model the source (in the speech coding case, the vocal tract).
At high bit rates, it's simpler to model the sink (the listener).