Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Psychoacoustics used byVorbis (Read 6303 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Psychoacoustics used byVorbis

Hello,

I'm new in this forum and hope to get some help. Does anybody know a paper or another source of information about the psychoacoustics used for Vorbis? I found many information about the way Vorbis encodes and decodes but nothing about the masking thresholds given by the psychoacoustic model, the number of critical bands etc.. Currently I try to get this information from the sourcecode, but this seems to be quite difficult... Can anybody help me?

Thank you and regards,
Olli

 

Psychoacoustics used byVorbis

Reply #1
Quote
Hello,

I'm new in this forum and hope to get some help. Does anybody know a paper or another source of information about the psychoacoustics used for Vorbis? I found many information about the way Vorbis encodes and decodes but nothing about the masking thresholds given by the psychoacoustic model, the number of critical bands etc.. Currently I try to get this information from the sourcecode, but this seems to be quite difficult... Can anybody help me?

Thank you and regards,
Olli



Hello. The Psychoacoustics model used by Vorbis is very similar to AAC.  It blocks the data into small or large windows the default size is 256/2048 on 44.1 kHz input material. It then performs a MDCT IV to transform the data into the frequency domain. Therefore it's a transform coder. The window function used is a customized designed window function similar to a sine window that has good sidelobe rejection, this was designed to avoid any patents.  Psychacoustics masking is done through an Absolute Threshold of Hearing model. The masking curves are strange they are not derived from Fletch-Munson curves, but instead from an "emperical adjusted" set of "Ehmer Curves". Ehmer was a naval research officer who published a paper for the Acoustical Society of America in 1958.  The lead developer "Monty" decided to use these for whatever purpose he had in mind. The noise floor (floor1.c) uses a mathematical model called a piecewise linear approximation to fit data. The old noise floor 0 is not used any more, but was an LPC model that was intended to use by used on speech and low-bitrate material. A technique called "Noise Normalization" that was created specifically for Vorbis is similar to PNS used by AAC. It's applied to compensate for energy lose in frequency bands due quantization by reviving the residue noise that's left over. It does this by using some sort of apsort algorithm defined in (psy.c). The critical band structure used by Vorbis is 1/3 octave bands. Vorbis uses it's on stereo model that's directly linked to the residue. It's not called "joint stereo", but is a little more sophisticated it's called point/phase stereo. It quantization stereo image into point and diffuse imaging meaning amplitude and reverberation components. It terms of entropy coding it differs from AAC in that uses multistage Vector Quantization on the noise residue that's left over. What's left of the signal is lastly encoded with Huffman codes also used by AAC (except there is no error recovery books those are patented by AAC).  I hope that helps to answer some of your questions  . If you are looking for anything else you can find most of what you are looking for in "psy.c" in the source code. Everything else is pretty self-explanatory. In terms of optimizing the low level libraries or playing around with them I don't have the slightest clue how you do that .


I wrote this page a while back when I was digging through the source code:

http://wiki.hydrogenaudio.org/index.php?ti...cal_Information
budding I.T professional

Psychoacoustics used byVorbis

Reply #2
Thank you very much for the answer! It was quite helpful and I know a lot of vorbis' theory now. But there is still a question regarding the sourcecode. I expect the encoder to analyze the blockwise transformed piece of sound. That means there must be an individual ATH for each block with respect to Ehmers psychoacoustic. My goal is to output this ATH per block to a file but I'm not sure where these ATHs are generated. Does anybody know that?

Thank you in advance and kind regards,
Olli

Psychoacoustics used byVorbis

Reply #3
Quote
Thank you very much for the answer! It was quite helpful and I know a lot of vorbis' theory now. But there is still a question regarding the sourcecode. I expect the encoder to analyze the blockwise transformed piece of sound. That means there must be an individual ATH for each block with respect to Ehmers psychoacoustic. My goal is to output this ATH per block to a file but I'm not sure where these ATHs are generated. Does anybody know that?

Thank you in advance and kind regards,
Olli



Is there particular reason why you need to do this like for research, etc? It's in the source code. I think this where Ehmer curves are calculated for each frequency band.  He is extremely stingy when it comes to comments in the source code so you really can't tell what these numbers mean.  I am assuming the top part is an an array of ATH numbers in decibels. Most of the masking is done here.

http://svn.xiph.org/trunk/vorbis/lib/masking.h
budding I.T professional

Psychoacoustics used byVorbis

Reply #4
Indeed this is for research. The ATH array from masking.h seems to be a basic ATH curve without any masking, anyway it looks like a typical curve when you plot these values. I think, that the tonemasks-array represents the influence of a tone onto the ATH. It is seperated in 17 tone height bands where each of these bands includes a half octave. For each of these bands there are 6 pressure levels (dB) with 56 "influence values" for nearby frequencies. As I understood Ehmers paper, this influence is defined only for frequencies above the masking frequency, that means for the next 56 frequencies in the ATH array. But all these values are only directly used in the initating process (_vb_psy_init in psy.c)

I supposed the blockwise analysation to be located in the method vorbis_analysis_buffer in block.c. I think, that this has something to do with the vi->pcm... (pcm = pulse-code-modification = quantization?) values but I'm absolutely not sure about that... Are you or do you have another Idea?

Thank you and regards,
Olli

Psychoacoustics used byVorbis

Reply #5
Quote
Indeed this is for research. The ATH array from masking.h seems to be a basic ATH curve without any masking, anyway it looks like a typical curve when you plot these values. I think, that the tonemasks-array represents the influence of a tone onto the ATH. It is seperated in 17 tone height bands where each of these bands includes a half octave. For each of these bands there are 6 pressure levels (dB) with 56 "influence values" for nearby frequencies. As I understood Ehmers paper, this influence is defined only for frequencies above the masking frequency, that means for the next 56 frequencies in the ATH array. But all these values are only directly used in the initating process (_vb_psy_init in psy.c)

I supposed the blockwise analysation to be located in the method vorbis_analysis_buffer in block.c. I think, that this has something to do with the vi->pcm... (pcm = pulse-code-modification = quantization?) values but I'm absolutely not sure about that... Are you or do you have another Idea?

Thank you and regards,
Olli


That's pretty interesting about the Ehmer Curves I was not aware of that and I did look at Ehmers papers. I don't have any other ideas about how this works. I wish I could help you more. There is a lot of math going on here and that's not one of my strong suites. I understand everything visually and conceptually. I studied C briefly in college, but I honestly cannot remember too many things about it. I will continue to look through the source code and if I find anything else I will bring it your attention. In the meantime if you are hard-pressed to find what you are looking for you might want to try the Vorbis-Dev mailing list! best of luck to you.
budding I.T professional