What you see is to be expected. AFAIK, the only way to decode the LC part is at the half sampling rate because the inverse transform will only produce the number of samples that went into the original MDCT. (Out of interest, the old mp3pro files used MP3 at half the sampling rate, i.e. MPEG-2 layer 3, plus SBR for the upper half and a normal MP3 decoder without MP3Pro features would only produce the lower half).I believe you need to resample (and I think that's what a AAC+ decoder does) though you'll probably find an app like foobar2000 (Windows) or SoX (command line multiplatform) does a good job of resampling and much faster than MATLAB, last I heard.
I guess we need to inspect the source code of an open source decoder to find out. I can't see how mathematically they could generate with an iMDCT containing twice as many samples as 'frequency' bins (positive and negative 'frequencies' both count towards the total) provided in the transform domain. Therefore, I'd assume it must be done after converting to the time domain (i.e. by upsampling with appropriate anti-alias filtering).I'd like to be enlightened by anyone who knows for sure.
SoX is a free open source project on Sourceforge with very good resampling, so you could use it externally or adapt their code providing you comply with their license.Also tools like ffmpeg might have useful source code.
libspeex also contains a good resampler that is used in various codecs (including opus): http://svn.xiph.org/trunk/speex/libspeex/resample.c
Thats' from the 3gpp code isn't it? I think the intention of spline_resampler.c is to allow handsets whose DACs don't support the rate of a received file to downsample to a supported rate.As you want to upsample, I'm not sure this code is directly applicable, but I haven't looked into it.With downsampling, you need to filter first to remove frequencies above the Nyquist limit, then interpolate to the new, lower sampling rate.With upsampling, you need to interpolate to the new higher sampling rate then filter afterwards to remove any frequencies that have been introduced above the Nyquist limit of the lower rate.In theory, which ever way you're going between the same pair of sampling rates (up or down), the cut-off frequency should be the same, assuming an ideal filter.The speex resampler code looks useful, seems to have a liberal license, works for arbitrary rates, and it implements upsampling intelligently, in that it recognises that the filter design for downsampling must ensure good attenuation at and above the Nyquist limit, but for upsampling, the content is already low on content very close to the Nyquist limit and zero above it until you introduce aliasing by your chosen method of interpolation, so they can be more relaxed about the attenuation close to the limit and preserve audio frequencies better by choosing a slightly higher cut-off frequency when upsampling than they do when downsampling. It also offers FIXED POINT or FLOATING POINT versions, which you can choose depending on your hardware, and I believe it has been tested when compiled for numerous popular platforms (certainly the Opus source code which includes the same resampler has been tested very widely prior to IETF standardization)The speex one calculates the sinc function on the fly, calculates the cut-off mathematically but has a number of Kaiser window functions pre-calculated in the source code, but it includes some values for adjusting the filter cut-off frequency for upsampling versus downsampling.It can essentially be treated as a black box that just does the job without having to understand how.(P.S. That's the right SoX project you linked to a few posts above, and their resampling code has been implemented in a fb2k plugin which you mentioned in the other link. Some people get very picky about inaudible differences that can show up on graphs, where SoX resampler performs very well. I doubt there's an audible difference from fb2k's PPHS resampler or speex's for normal sampling rates. I guess there's a modest chance of slight audibility when upsampling from very low sample rates such as 8kHz.)
How about the 'mono' question,do you know have to get the mono decoder output file ?I set the CT mono debug mode, but it seems everytime when the program goes to 'interleaveSamples(&TimeDataFloat,&TimeDataFloat[frameSize],pTimeDataPcm,frameSize,&numChannels);'the 'numChannels' will change from 1 to 2, but if I let the numchannels keep 1, I will get the mono output ,but the sound will sounds totally wrong...I choose the 96kbps-mono.I am confused.Looking forward to your reply.
Quote from: wind on 05 November, 2012, 06:02:03 PMHow about the 'mono' question,do you know have to get the mono decoder output file ?I set the CT mono debug mode, but it seems everytime when the program goes to 'interleaveSamples(&TimeDataFloat,&TimeDataFloat[frameSize],pTimeDataPcm,frameSize,&numChannels);'the 'numChannels' will change from 1 to 2, but if I let the numchannels keep 1, I will get the mono output ,but the sound will sounds totally wrong...I choose the 96kbps-mono.I am confused.Looking forward to your reply.I thought the whole idea of the special mono mode was this:If the original encode was stereo, the AAC-LC part (low frequencies) must be decoded as stereo and downmixed to mono. However, you can save computational resources with the SBR layer by downmixing the components before conducting the band replication.Pure speculation but I wonder if what you're doing by forcing numChannels to 1 is over-riding the initial stereo decode of the LC layer (whose stereo information could be encoded in various ways, such as L-R or M-S stereo for example) and possibly you occasionally get the M and occassionally get the L, for example, depending on what was chosen for each frame.I'm not really familiar with the 3GPP code to tell what the problem is. If you've already stripped away the SBR layer, it might be totally useless to specify mono decoding, as it only does anything different in the SBR layer (which you've discarded), and you're best to simply decode as stereo and downmix using any of the usual formulae such as mono=(L+R)/2 or mono=(L+R)/sqrt(2).
So that's an interleave function?Why do you need to interleave mono data? You should only need to interleave multiple channels. Are you perhaps forcing it to interleave mono data as if it were stereo, thus ending up with a stereo file of half the duration, perhaps?