HydrogenAudio

Lossy Audio Compression => AAC => AAC - Tech => Topic started by: Makaveli7184 on 2015-01-03 08:27:52

Title: HE-AAC/SBR Decoder Delay?
Post by: Makaveli7184 on 2015-01-03 08:27:52
So I've been trying to understand about how gapless playback in AAC through the iTunSMPB tag actually works, by reading some technical papers/pdfs/wikis, of which the following are pertaining to this topic:

http://www.hydrogenaud.io/forums/index.php?showtopic=87847 (http://www.hydrogenaud.io/forums/index.php?showtopic=87847)
http://www.hydrogenaud.io/forums/index.php?showtopic=98450 (http://www.hydrogenaud.io/forums/index.php?showtopic=98450)
http://wiki.multimedia.cx/index.php?title=AAC (http://wiki.multimedia.cx/index.php?title=AAC)

In particular, the thread titled "HE-AAC gapless playback" was very useful to that end. Here's a recap of what I came out with:

However, I still wasn't quite clear on what decoder delay actually is or how it translates into silent samples in the encoded audio. To gain more insight into this whole thing, I took a lossless FLAC track from a gapless album (Sample Rate = 44.1 Khz) and encoded it to FhG HE-AAC and Apple HE-AAC. For better accuracy, I used qaac with smart padding enabled to circumvent Apple's faulty HE encoding (cutting off 1 frame short at the end).

Next, I took the FhG encoded HE-AAC file, marked down its iTunSMPB tag value ( 00000000 00001084 0000000F 0000000000F6E76D etc.), stripped all tags from it and then decoded it to PCM. Next, I opened the decoded file in a sound editor, stripped 4228 (#1084) samples from the beginning (i.e. total encoder + decoder delay for both HE+LC), removed the last 15 (#F) samples, and ended up with 16181101 (#F6E76D) audio samples whose offset now perfectly matches that of the original lossless FLAC source. This was all expected. So far, so good.

Again, I repeated the whole thing with the Apple encoded HE-AAC file. Its iTunSMPB reads ( 00000000 00000840 00000409 00000000007B73B7 etc.). Those values were obviously doubled before marking them down and proceeding. Now to my surprise, after cutting off delay + padding in sound editor, I ended up with original sample count. That's something I wasn't expecting since I hadn't taken into account the implicit HE decoder delay. Also, audio's offset was off by 962 samples (HE decoder delay exactly). Finally, cutting off the padding samples at the end of the stream this time resulted in truncating valid audio samples.

So to sum up, here's what I was expecting the Apple-encoded file structure to be like:

EXPLICIT-DELAY-VALUE = #840 x 2 = 4224 samples (i.e. Encoder-Delay + LC-Decoder-Delay)
IMPLICIT-HE-DECODER-DELAY = 481 x 2 = 962 samples
ORIGINAL-SAMPLE-COUNT = #7B73B7 x 2 = 16181102 samples
PADDING = #409 x 2 = 2066 samples

And here is how it actually looked like:

EXPLICIT-DELAY-VALUE = #840 x 2 = 4224 samples (i.e. Encoder-Delay + LC-Decoder-Delay)
IMPLICIT-HE-DECODER-DELAY = 481 x 2 = 962 samples
ORIGINAL-SAMPLE-COUNT = #7B73B7 x 2 = 16181102 samples
!!!PADDING = #409 x 2 minus IMPLICIT-HE-DECODER-DELAY = 2066 - 962 samples!!!


So what gives? Why does the Apple-encoded file have a wrong padding sample count? Doesn't this mean that the Apple HE-AAC iTunSMPB tag actually includes the HE decoder delay of 962 samples, albeit lumped in within the padding value???
Title: HE-AAC/SBR Decoder Delay?
Post by: nu774 on 2015-01-03 23:51:48
You just have to slide the window like this, so number of valid samples and number of total samples remain the same.

Before:
|--------------|-------------------------------------------|------------|
After:                                                                       
|--------------------|--------------------------------------------|-----|

You add delay by 481 samples, and you subtract padding by 481 samples.
This means that padding for HE-AAC has to be greater than 481 samples.

HOWEVER, Apple's encoder often fails to put enough padding, which yields truncated output on decoding. qaac addresses this issue by feeding silence at the end to the Apple's encoder.