The Knowledgebase/Wiki article MP3
, section "Model 2 technical details"
(http://wiki.hydrogenaudio.org/index.php?title=MP3#Model_2_technical_details) suddenly introduces the abbreviation "BM", yet no mention is made of BM before in the entire article, nor the words 'basilar' or 'membrane', which I assume is what it refers to.3. Grouping of spectral values into threshold calculation partitions.
includes the first mention of BM.
I did notice that elsewhere, much of the Wiki entry was commented out, as it probably came from another source with diagrams that weren't present in the HA Knowledgebase/Wiki.
I'm going to go ahead an edit that section (presumably copied from elsewhere or having had an edit to a previous section that dispensed with the definition of BM), but would of course be happy to have a revert or correction should anyone think it means something else.
As that Wiki page has so rarely seen edits, I wouldn't presume anyone would spot comments in the Discussion or Talk section, hence my post here. Hopefully it will catch the eye of as many codec gurus here as it would if I'd posted under MP3 Technical.
Below is quoted text before my edit, without original formatting:
Model 2 technical details
The psychoacoustic model calculates just-noticeable distortion (JND) profiles for each band in the filterbank. This noise level is used to determine the actual quantizers and quantizer levels. There are two psychoacoustic models defined by the standard. They can be applied to any layer of the MPEG/Audio algorithm. In practice however, Model 1 has been used for Layers I and II and Model 2 for Layer III. Both models compute a signal-to-mask ratio (SMR) for each band (Layers I and II) or group of bands (Layer III).
The more sophisticated of the two, Model 2, will be discussed. The steps leading to the computation of the JND profiles is outlined below.
1. Time-align audio data
The psychoacoustic model must estimate the masking thresholds for the audio data that are to be quantized. So, it must account for both the delay through the filterbank and a data offset so that the relevant data is centered within the psychoacoustic analysis window. For the Layer III algorithm, time-aligning the psychoacoustic model with the filterbank demands that the data fed to the model be delayed by 768 samples.
2. Spectral analysis and normalization.
A high-resolution spectral estimate of the time-aligned data is essential for an accurate estimation of the masking thresholds in the critical bands. The low frequency resolution of the filterbank leaves no option but to compute an independent time-to-frequency mapping via a fast Fourier Transform (FFT). A Hanning window is applied to the data to reduce the edge effects of the transform window.
Layer III operates on 1152-sample data frames. Model 2 uses a 1024- point window for spectral estimation. Ideally, the analysis window should completely cover the samples to be coded. The model computes two 1024-point psychoacoustic calculations. On the first pass, the first 576 samples are centered in the analysis window. The second pass centers the remaining samples. The model combines the results of the two calculations by using the more stringent of the two JND estimates for bit or noise allocation in each subband.
Since playback levels are unknown3, the sound-pressure level (SPL) needs to be normalized. This implies clamping the lowest point in the absolute threshold of hearing curves to +/- 1-bit amplitude.
3. Grouping of spectral values into threshold calculation partitions.
The uniform frequency decomposition and poor selectivity of the filterbank do not reflect the response of the BM. To accurately model the masking phenomenon characteristic of the BM, the spectral values are grouped into a large number of partitions. The exact number of threshold partitions depends on the choice of sampling rate. This transformation provides a resolution of approximately either 1 FFT line or 1/3 critical band, whichever is smaller. At low frequencies, a single line of the FFT will constitute a partition, while at high frequency|frequencies many lines are grouped into one.
4. Estimation of tonality indices.
It is necessary to identify tonal and non-tonal (noise-like) components because the masking abilities of the two types of signals differ. Model 2 does not explicitly separate tonal and non-tonal components. Instead, it computes a tonality index as a function of frequency. This is an indicator of the tone-like or noise-like nature of the spectral component. The tonality index is based on a measure of predictability. Linear extrapolation is used to predict the component values of the current window from the previous two analysis windows. Model 2 uses this index to interpolate between pure tone-masking-noise and noise-masking-tone values. Tonal components are more predictable and thus have a higher tonality index. As this process has memory, it is more likely to discriminate better between tonal and non-tonal components, unlike psychoacoustic Model 116.
5. Simulation of the spread of masking on the BM.
A strong signal component affects the audibility of weaker components in the same critical band and the adjacent bands. Model 2 simulates this phenomenon by applying a Spreading function to spread the energy of any critical band into its surrounding bands. On the Bark scale, the spreading function has a constant shape as a function of partition number, with slopes of +25 and –10 dB per Bark.
6. Set a lower bound for the threshold values.
An empirically determined absolute masking threshold, the threshold in quiet, is used as a lower bound on the audibility of sound.
7. Determination of masking threshold per subband.
At low frequencies, the minimum of the masking thresholds within a subband is chosen as the threshold value. At higher frequencies, the average of the thresholds within the subband is selected as the masking threshold. Model 2 has the same accuracy for the higher subbands as for low frequency ones because it does not concentrate non-tonal components16.
8. Pre echo detection and window switching decision.
9. Calculation of the signal-to-mask ratio (SMR).
SMR is calculated as a ratio of signal energy within the subband (for Layers I and II) or a group of subbands (Layer III) to the minimum threshold for that subband. This is the final output of the psychoacoustic model.
The masking threshold computed from the spread energy and the tonality index.