Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Research on Mp3 compression technique (Read 3905 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Research on Mp3 compression technique

Hello everyone ,

I am currently trying to find documentation on the techniques and algorithms used in MP3 for my research (which is focused on MP3 and CELP, making some comparison between the algorithms and techniques that they used and how they behave given variety of audio input including music and speech). I have managed to find some in the www.mp3-tech.org but would like to find out more documents which perhaps can give me an idea on the algorithms and mathematical formula involved in Mp3 encoding process.  I am a beginner in this area , and hence I am currently still trying my best to understand the technology. By the way, I have tried to post this in the Mp3-tech section but it is perhaps the wrong forum, in any case I would like to apologize for any inconveniences  .

Based on what I have read, this is what I currently have in mind for MP3:
MP3 uses psychoacoustic model (but it does not use Linear Predictive coding used in CELP) - (If I am not mistaken, one of the document said that new advances such as the "Twin VQ" combine both psychoacoustic model and LPC. So would I be right to say that MP3 as an "older technology" only use the psychoacoustic model?)

I am also not sure on whether MP3 use vector or scalar quantisation - the documents that I have read mention that most audio coding techniques utilise vector quantisation (and I know that CELP speech coding technique use it) but it does not say whether or not MP3 use VQ as well. I have also found that the papers that I find give only brief explanations on Huffman coding and MDCT - I am currently reading a paper by Painter & Spanias on Perceptual Audio coding, which gives me more insight on it but I am not too sure if it applies to the MP3 technology . So if anyone have the answers to my questions or if anyone can refer me to a good reading material, would you please advise me on these matters? . Thank you very much for your time and advises.

Research on Mp3 compression technique

Reply #1
In a very abstrat sense, they (MP3 and CELP) don't differ that much. They bnoth exploit inter sample correlations (CELP via LPC, MP3 via its filterbank) quantize the samples (non-uniform scalar for mp3, (usually) VQ for CELP) and code them somehow (Huffman and stuff).

You're right, MP3 does not make use of LPC. It doesn't have to since it transformes the signal blockwise into the frequency domain which also kind of removes inter sample correlations (like LPC).

Decorrelation in CELP is usually done via LPC (short-term prediction) and long-term prediction aka pitch prediction. The latter only works well if the signal is just the output of an instrument playing a single note (not polyphonic) like one human voice for example.

Exploiting correlations is a good idea if you want to compress something. (Decorrelated data usually leads to a more compact representation)

Don't get me started on TwinVQ. I don't like its architecture. (I consider the idea of coding LSP parameters of the LPC synthesis filter for a transform codec to be a pretty stupid one.)

MP3 uses a non-uniform scalar quantizer which uses f(x)=sign(x)*pow(abs(x),4.0/3.0) as expander function.

The reason why encoders like LAME perform so well is: They know how to quantize the samples properly. Quantization is done time- and frequency adaptive. This is where the psychoacoustic model comes in. It tries to determine how much quantization distortions we can tolerate in certain situations and in certain frequency/time regions. Most of the data reduction is due to the quantization. Then, a lossless coding stage tries to squeze the remaining information together.

If you havn't already read this one try it:
http://www.eas.asu.edu/~spanias/papers/pap...dspanias-00.pdf

Sebi

Research on Mp3 compression technique

Reply #2
Quote
If you havn't already read this one try it:
http://www.eas.asu.edu/~spanias/papers/pap...dspanias-00.pdf

Sebi
[{POST_SNAPBACK}][/a]


[a href="http://www.ece.rochester.edu/~gsharma/SPS_Rochester/presentations/JohnstonPerceptualAudioCoding.pdf]http://www.ece.rochester.edu/~gsharma/SPS_...AudioCoding.pdf[/url]

This might be useful, as well, but of course there's little text to go with it.
-----
J. D. (jj) Johnston

 

Research on Mp3 compression technique

Reply #3
Hi SebastianG and Woodinville,

Thank you very very much for your help  , I really appreciate your advises. I will try to read the documentations that both of you have suggested. You guys really brighten up my day . If anyone have any other advises, they are welcomed  . Once again, thank you very much for your help.


tech_noobie