Although I cannot answer your question, may I ask why wouldn't you just use TooLAME (http://en.wikipedia.org/wiki/TooLAME) instead?
TooLAME isn´t developed for a long time now, also the development of it´s successor twoLame seems to be stopped for some years already. Both encoders don´t provide the quality of MusePack. MusePack has a more advanced encoder, allowing much higher quality. Of course the mp2 bitstream format does not allow all MusePack features (M/S stereo, huffman, PNS, true VBR,...), but you should be able to encode mp2 files about the same quality as musepack with a major bitrate increase (like 256 - 320 kbps comparable to musepack standard).
I need the encoder to author DVDs containing music. Unfortunately LPCM causes problems with several players and there is also no free, high-quality AC-3 encoder around. MusePack based MP2 seems to be good choice too me.
I found out that the problem are the SMRs returned by "Psychoakustisches_Modell", beginning from the second call of that function they are much too low. Unfortunately I haven´t found a reason why. Any ideas?
I was able to trace the problem inside "Psychoakustisches_Modell" to the function "PreechoControl", which is changing a global array. After commenting the array-modifying code the bitrate drop is gone, but the bitrate allocation is awful. Most likely this is not the only problem. I can encode decent sounding 448 kbps MP1 file using a fixed allocation table (not using psychoacoustics at all).
What am I missing that the function "Psychoakustisches_Modell" is returning no useable values?
No offence, but I think you're wasting your time. I seriously doubt that there could be any audible difference between TooLAME at 384 kbps (maybe even lower) and the thing you're trying to make.
In any case, claims like
Both encoders don´t provide the quality of MusePack. MusePack has a more advanced encoder, allowing much higher quality.
certainly demand for a proof (especially considering we're talking about bitrates around ~400 kbps).
I understand what you mean, I remember having quite bad results toolame, I just checked twolame and the first sample I´ve tried was easily ABXable (8/8) at 256kbps (and noticeable artifacts at 192kbps) and my ears are not "tuned" to hear encoding artifacts. By the way, I´m not talking about ~400kps bitrates, 384kbps is maximum for mp2 and it would be great if it could already sound transparent at 256kbps. High bitrate doesn´t imply high quality. Try blade at 320kbps for mp3, you will find a lot of killer samples that won´t sound transparent. You also cannot compare this bitrates of a old subband-coder with no entropy coding to modern mdct-based codecs (aac, vorbis or even mp3).
MusePack achieves this high quality because of a highly tuned encoder. Because MP2 is basically MusePack with several features missing it is possible to create mp2 bitstream of similar quality at higher bitrates using the MusePack encoder.
To musepack source:
I take it all back! I was completely wrong. It had nothing to do with "Psychoakustisches_Modell". MS needed to be disabled on two places and I noticed I was coding one quantizer wrong, making the sound quite awful (and I didn´t used that one in my fixed allocation table). So I basically got it, now I need to change the code to a more sophisticated CBR allocation, all the mp2 allocation tables etc.
At the moment there is just one question left: Combine Penalty. It´s about how many scalefactors are coded for each band in a frame (1, 2 or 3). Therefore it uses this magic table:
static const unsigned char Penalty [256] = {
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
0, 2, 5, 9, 15, 23, 36, 54, 79,116,169,246,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
};
#define P(new,old) Penalty [128 + (old) - (new)]
P is called with new and old scalefactor index and the value compared to combine penalty value of the profile (6 default for all profiles). MP2 uses different scalefactors so this table probably needs to be updated with new values. Unfortunately I have no idea what this values represent ans how this values have been calculated in the first place.
Apparently this table provides penalty for replacing "old" with "new" depending on their difference. The penalty is maximum if old < new, or if the difference exceeds 11. Since "old" and "new" are actually indices into scalefactor table, and, if I get it right, the scalefactors are defined as scf = 10**(-0.1*index/1.26), then the difference of indices maps to certain ratio of scalefactors. If we plot the penalty values Penalty[128] through Penalty[139] vs. the corresponding scalefactor ratio values, we will get the following graph:
(http://www.i.com.ua/~alexeysp/p.png)
Now, don't cite me on this, but it seems that in these coordinates the penalty values are remarkably well described by the following function:
p(x) = 4.5*((1/x**2)-1)
where x is the scalefactor ratio (or by simple quadratic function for reverse ratios). The proportionality constant probably comes from some power-of-two to power-of-ten conversion, but it's just a guess.
For what it's worth, hope this helps.
Thank you very much! That makes a sense and helped a lot. I calculated new values for mp2 scalefactors.
I hope I´m able to finish that encoder so it can be released beginning of next year.
Interesting. You say the reason this is done, because very small (denormal) numbers slow done the processor?
Based on the fact that Musepack encoder operates on floating point numbers in the range -32767 ... +32767, the smallest number (positive, absolute) you get with 16 bit input is 1. With 24 bit input the smallest number is 1/256. But added is 1/512, that doesn´t make much sense, because -1/256 is also possible and resulting in -1/512, a even smaller number (absolute value).
I read that the smallest normal number in 32 bit float is about 1,401*10^(-45). That is a lot smaller than 1/512. In fact to reach that small number you would need a 165-Bit int audio source.
Are you sure that´s the reason? I haven´t implemented this stuff in my encoder yet and it is very fast (faster than Musepack).
I also read that SSE2 provides a flag in a control register telling the processor to make all denormal numbers 0, wouldn´t that make more sense instead of adding a constant?
I can't remember all the details of layer 2, but if there are any IIR filters in the signal path will eventually create denormals, regardless of value range, when there is digital silence at the input.
Yes, I'd you use the SSE2 extension you can disable special denormal processing. But it is only used for sse2 instructions (e.g. addpd) and not x87 instructions (e.g. fadd )
Yes denormals are super slow.
I´ve tested different files, also digital silence and the encoder never slows down (since denormals are about 700 times slower than normal numbers I think that would be noticeable). Of course there may be a audio file causing denormals somewhere in the processing, but I don´t see why adding 1/512 helps (except preventing 0 at the input, and that only for 16/24 Bit int sources). I also could not find anything like this in twolame code.
I think already finished most of the work, the encoder should support all features of mp2 except intensity stereo (and dual channel is just like stereo, just a flag in the header is different).
Now I have to test if my modified allocation code causes any problems. That´s the biggest difference between Musepack and MP2: Musepack can allocate yust as many bits for every subband how the encoder thinks it´s best. In MP2 there are not only fixed frame sizes, but also allocation tables that doesn´t allow all quantizers for all subbands.
The way I did it: Musepack allocation increases the Resolution of each subband until the Mask-To-Noise-Ratio (MNR) is smaller than 1.
For MP2 I added:
-Increase all resolutions until there are codeable by MP2 (new MNRs for changed subbands)
-If VBR: Find mininum frame size that allows coding these resolutions
-Decrease resolution of all subbands one (codeable) step until audio fits into frame (for VBR that only happens if maximum bitrate is reached)
-Calculate new MNRs
-Increase resolution of subband with highest MNR and calculate new MNR in a loop until frame is completly filled.
Any better ideas?
When lowering resolution, can you lower critical band(s) last or not at all?
The Musepack Psy-Model decides what bands are critical.
Instead of reducing one single band I reduce all bands. The idea is, that if for example band X might have the lowest MNR of all, therefore I decreased it, but after decreasing the MNR increased dramatically, while decreasing the resolution of band Y with higher MNR would have increased the MNR of that band just a little.
Therefore I decrease all bands and then increase always the one with highest MNR until the frame is filled. The idea is that the highest MNR of all bands should be as low as possible, rather than that the average MNR should be as low as possible. Of course if frame is not overfull in the first place the decreasing step is skipped. To get the highest quality, the PsyModel should be set to a value that comes closest to the mp2 bitrate. I need to to some further testing, but musepack standard (5.0) should go with about 256kbps. That means most of the frames will only be increased, but still 1/3 has to be decreased (depends of course on the sample, but average for some music I tested). Of course the encoder will offer the same parameters to control the PsyModel like Musepack.
The allocation algorithm treats all bands the same, but the PsyModel calculates the values to get the MNR. The MNR is also calculated different for transient and non-transient bands (the PsyModel also finds out what bands are transient bands).
Something completely non-technical: Any idea how I should name that encoder, mp2enc or similar doesn´t sound like a completely new mp2 encoder. I asked the Musepack team, they do not want the encoder named in a way that resembles Musepack.