Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: History: where did the "1152" originate? (Read 3136 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

History: where did the "1152" originate?

There might be a good technical reason for 8*8*18 which I couldn't find in two minutes of www searching.

I did actually wonder why one didn't rather go for (two times) 588 over (two times) 576. Did they fear the RIAA lawyers going "this is specifically created to pirate CDs" ?

Re: History: where did the "1152" originate?

Reply #1
What this about? ^^

Edit: Thanks @ktf

Re: History: where did the "1152" originate?

Reply #2
The frame size of MP3 is 1152 samples. The question is: why this specific number?
Music: sounds arranged such that they construct feelings.

Re: History: where did the "1152" originate?

Reply #3
1152 samples is twice the length of the hybrid filterbank for long blocks.  Since the filter bank uses 50% overlap, you get twice as many samples back.  This is actually really common in transform codecs, usually the frame size is twice the transform size due to how the MDCT works.  MP3 just uses a weird size rather than the more common 1024 point MDCT -> 2048 sample frame size you see in pure MDCT codecs. 

See here:  https://wiki.hydrogenaud.io/index.php?title=MP3#The_hybrid_polyphase_filterbank


Re: History: where did the "1152" originate?

Reply #5
MP3 just uses a weird size

Yes, and the question of the topic is: why did they choose that weird size?

The link above explains in more detail. but you have 32 subbands each with 18 spectral points.  The 50% overlap then doubles that length to 1152.  You can't have 2*588 samples because that wouldn't divide evenly into 32 subbands.  The actual number of subbands and points per subband are to maintain compatibility with layer 2, itself based on MUSICAM.

Re: History: where did the "1152" originate?

Reply #6
32 subbands each with 18 spectral points

That kind of explains the "8*8*18", now the only number that's left to explain is the 18 itself.

Re: History: where did the "1152" originate?

Reply #7
Yeah, even if compatibility with MUSICAM "only pushes the question one generation back", you don't really need an explanation why why someone went for "32". Can the fly on the wall when the big question We are designing this thing for radio, should we consider scrapping the four whole bytes for "28" or "42" in case that makes it easier to pyr8 CDs when whoever makes a file format out of this ignores gaplessness? was popped, please step ahead and confirm?

Re: History: where did the "1152" originate?

Reply #8
32 subbands each with 18 spectral points

That kind of explains the "8*8*18", now the only number that's left to explain is the 18 itself.

It is a common value for subband codecs (even those that don't retain compatibility with MPEG/MUSICAM), so presumably it is reasonably optimal, then MPEG got stuck with it even when they moved beyond pure subband. 

Re: History: where did the "1152" originate?

Reply #9
I assume the "same number of subbands and points per subband as mp2" is so they could create mp2/mp3 decoders more easily back when both were in HW?

Re: History: where did the "1152" originate?

Reply #10
I assume the "same number of subbands and points per subband as mp2" is so they could create mp2/mp3 decoders more easily back when both were in HW?

It isn't just that the numbers are the same, it is actually the same filterbank in both codecs.  MP3 decoding is essentially a superset of MP2 decoding, so an MP3 decoder is usually also an MP2 decoder just by not doing the extra layer 3 steps.  This was a design goal of the 3 "layers" in mpeg1 audio.  They actually did the same thing again with AAC, where you have different profiles that build on top of LC (HE, SSR, etc).  

Re: History: where did the "1152" originate?

Reply #11
From High quality audio bit-rate reduction system family for different applications by G. Stoll and Y.F. Dehery:
Quote
The sub-band signals are divided into digital frames of 12 successive audio samples with a duration of 8 ms. This was found to be a good compromise to consider the worst case of the temporal masking effects, i.e. the pre-masking of quantizing noise which may exist in front of the audio signal.

Also, from A MUSICAM source codec for digital audio broadcasting and storage by Y.F. Dehery; M. Lever; P. Urcun:
Quote
The bit allocation is transmitted as side information at the rythms of the spectral analysis (frames of 24 ms at 48 KHz). It determines the amount of binary information used to quantize each subband signal. This quantization is performed over frames of 36 subband samples. ... The scaling block size (e.g. 8 ms at 48 KHz) was chosen in relation to the energetic statistics of high slew-rate audio signals.

8 ms for a signal with a 48 kHz sampling rate is 384 = 32 * 12, which is also the number of audio samples for a single frame of MP1. MP2 groups 3 frames for MP1 together and exploits redundancies between them. That would be 384 * 3 = 1152.

On the other hand, predecessors of ASPEC, - which is in itself the predecessor to MP3 - such as OCF, PXFM, and an unnamed hybrid encoder all used "a power of 2 number" of audio samples for a single block.

It seems like using 1152 audio samples for a single frame originated from the engineering decisions made by developers and backers of MP2.