MP3 Frame length conundrum

2010-11-21 19:26:54

Hi,

In my app, I need to determine mp3 frame length (in bytes) quickly. Most of the time it works great, but I have some mp3 files that have frames of 1 bytes longer or shorter than what I compute. I.e. when I scan to the next frame, I don't see the Sync header (FF byte), it's sometimes 1 bytes before or after the position I calculated. Here's an example of 2 such frames. They are only different by the padding bit, so one frame should be 1 byte longer than the other, right? But in my mp3 stream, both frames have exactly the same length, 261 bytes.

First frame is FF F2 50 C0
Next frame is FF F2 52 C0 -- the same, but the padding bit is set

Those are Mpeg V2 Layer3, 40 kbps bitrate, 22,050Hz mono frames, one with one without padding. Frame length should be 144*40000/22050 + padding = 261.2 + padding. The second one should be 262 bytes, but it's 261 in my stream!

Both frames have the same byte length, 261 byte, judging by the occurence of valid mp3 frame headers in the stream. How is this possible? Does it mean a bug in the encoder? But all players play this back fine. Or the lengths of mp3 frames depend on previous frames? Or do I need to always scan the stream to detect frame headers (1 byte both forward and backwards), and that's normal in decoders?

I'm new to mp3 decoding, so I don't know the standard well, but all I have read so far suggests I should be able to determine the length of mp3 frame precisely using only the 4 byte mp3 frame header. I have a bunch of mp3's that have mp3 frame lengths that are 1 byte off ..

I'd appreciate any suggestions at all

MP3 Frame length conundrum

Reply #1 – 2011-03-10 00:02:27

Quote from: migdalskiy on 2010-11-21 19:26:54

Hi,

In my app, I need to determine mp3 frame length (in bytes) quickly. Most of the time it works great, but I have some mp3 files that have frames of 1 bytes longer or shorter than what I compute. I.e. when I scan to the next frame, I don't see the Sync header (FF byte), it's sometimes 1 bytes before or after the position I calculated. Here's an example of 2 such frames. They are only different by the padding bit, so one frame should be 1 byte longer than the other, right? But in my mp3 stream, both frames have exactly the same length, 261 bytes.

First frame is FF F2 50 C0
Next frame is FF F2 52 C0 -- the same, but the padding bit is set

Those are Mpeg V2 Layer3, 40 kbps bitrate, 22,050Hz mono frames, one with one without padding. Frame length should be 144*40000/22050 + padding = 261.2 + padding. The second one should be 262 bytes, but it's 261 in my stream!

This is an older thread, but I ran across it while researching something and thought I'd fill it in for Google fodder, or in case the OP is still listening.

The problem is in the formula used for calculating frame size. The "144" is not a magic number, it's Bits_Per_Sample, which = (Samples_Per_Frame /

.

For a .MP3 file (which is MPEG 1 Layer 3) the magic value 144 would be correct. The number of samples in a frame is 1152, divided by 8 gives 144.

However, the sample header describes MPEG 2 Layer 3, which uses 576 samples per frame. This yields 72 for Bits_Per_Sample. So, 260 bytes is the size of two frames, not one.

In the worst case, your code should never assume that a frame immediately follows another frame. This will be the case most of the time, but tags and broken frames or other corruption will take your parsing routine down hard if you blindly assume the next byte is a valid frame header.

My solution was to write and use mpg_find_frame(), which takes a buffer, a *sizeof(buffer), and an *offset within that buffer. It starts looking at the provided offset for 0xFF, and if it finds it, looks to see if the next byte is ((buf + *offset + 1) & 0xE0) == 0xE0. If so, it will further still check to ensure the version, layer, and bitrate fields are not "Reserved", and then update *offset to the location it found. If it doesn't find the sync, it moves up one byte and checks again. Keep doing this until you run out of buffer length, then return an error. At this point, it's your (or your user's) call to keep searching for a sync pattern in a fresh buffer, or give up.

I've also added code to recognize ID3/ID3v2 tags and return their size so the file reader can skip them intelligently. ID3v2 tags can be large, so you can save a lot of time by fseek'ing instead of parsing them one byte at a time.

Here's part of the code I wrote for handling MPEG audio, specifically the part needed for calculating frame sizes. Hope it helps. (I'd appreciate comments if anyone spots an error.)

Code: [Select]

// MPEG versions - use [version]
const uint8_t mpeg_versions[4] = { 25, 0, 2, 1 };

// Layers - use [layer]
const uint8_t mpeg_layers[4] = { 0, 3, 2, 1 };

// Bitrates - use [version][layer][bitrate]
const uint16_t mpeg_bitrates[4][4][16] = {
  { // Version 2.5
    { 0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0 }, // Reserved
    { 0,   8,  16,  24,  32,  40,  48,  56,  64,  80,  96, 112, 128, 144, 160, 0 }, // Layer 3
    { 0,   8,  16,  24,  32,  40,  48,  56,  64,  80,  96, 112, 128, 144, 160, 0 }, // Layer 2
    { 0,  32,  48,  56,  64,  80,  96, 112, 128, 144, 160, 176, 192, 224, 256, 0 }  // Layer 1
  },
  { // Reserved
    { 0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0 }, // Invalid
    { 0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0 }, // Invalid
    { 0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0 }, // Invalid
    { 0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0 }  // Invalid
  },
  { // Version 2
    { 0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0 }, // Reserved
    { 0,   8,  16,  24,  32,  40,  48,  56,  64,  80,  96, 112, 128, 144, 160, 0 }, // Layer 3
    { 0,   8,  16,  24,  32,  40,  48,  56,  64,  80,  96, 112, 128, 144, 160, 0 }, // Layer 2
    { 0,  32,  48,  56,  64,  80,  96, 112, 128, 144, 160, 176, 192, 224, 256, 0 }  // Layer 1
  },
  { // Version 1
    { 0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0 }, // Reserved
    { 0,  32,  40,  48,  56,  64,  80,  96, 112, 128, 160, 192, 224, 256, 320, 0 }, // Layer 3
    { 0,  32,  48,  56,  64,  80,  96, 112, 128, 160, 192, 224, 256, 320, 384, 0 }, // Layer 2
    { 0,  32,  64,  96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416, 448, 0 }, // Layer 1
  }
};

// Sample rates - use [version][srate]
const uint16_t mpeg_srates[4][4] = {
    { 11025, 12000,  8000, 0 }, // MPEG 2.5
    {     0,     0,     0, 0 }, // Reserved
    { 22050, 24000, 16000, 0 }, // MPEG 2
    { 44100, 48000, 32000, 0 }  // MPEG 1
};

// Samples per frame - use [version][layer]
const uint16_t mpeg_frame_samples[4][4] = {
//    Rsvd     3     2     1  < Layer  v Version
    {    0,  576, 1152,  384 }, //       2.5
    {    0,    0,    0,    0 }, //       Reserved
    {    0,  576, 1152,  384 }, //       2
    {    0, 1152, 1152,  384 }  //       1
};

// Slot size (MPEG unit of measurement) - use [layer]
const uint8_t mpeg_slot_size[4] = { 0, 1, 1, 4 }; // Rsvd, 3, 2, 1


uint16_t mpg_get_frame_size (char *hdr) {
    
    // Quick validity check
    if ( ( ((unsigned char)hdr[0] & 0xFF) != 0xFF)
      || ( ((unsigned char)hdr[1] & 0xE0) != 0xE0)   // 3 sync bits
      || ( ((unsigned char)hdr[1] & 0x18) == 0x08)   // Version rsvd
      || ( ((unsigned char)hdr[1] & 0x06) == 0x00)   // Layer rsvd
      || ( ((unsigned char)hdr[2] & 0xF0) == 0xF0)   // Bitrate rsvd
    ) return 0;
    
    // Data to be extracted from the header
    uint8_t   ver = (hdr[1] & 0x18) >> 3;   // Version index
    uint8_t   lyr = (hdr[1] & 0x06) >> 1;   // Layer index
    uint8_t   pad = (hdr[2] & 0x02) >> 1;   // Padding? 0/1
    uint8_t   brx = (hdr[2] & 0xf0) >> 4;   // Bitrate index
    uint8_t   srx = (hdr[2] & 0x0c) >> 2;   // SampRate index
    
    // Lookup real values of these fields
    uint32_t  bitrate   = mpeg_bitrates[ver][lyr][brx] * 1000;
    uint32_t  samprate  = mpeg_srates[ver][srx];
    uint16_t  samples   = mpeg_frame_samples[ver][lyr];
    uint8_t   slot_size = mpeg_slot_size[lyr];
    
    // In-between calculations
    float     bps       = (float)samples / 8.0;
    float     fsize     = ( (bps * (float)bitrate) / (float)samprate )
                        + ( (pad) ? slot_size : 0 );
    
    // Frame sizes are truncated integers
    return (uint16_t)fsize;
}

MP3 Frame length conundrum

Reply #2 – 2014-06-18 06:42:14

This is indeed an older post, but years later it provided the answer to my question. Thank you SirNickity for writing up an answer to a post that was already a year old. Three years later, I had the exact same question as the OP and found your answer exactly what I needed.

The problem is that the most popular Internet search results for "Mp3 frame header" point to a page that suggests 144 is the defacto constant for computing frame length. And all the other pages and write ups on MP3 file structure are blatantly copying each other - misinformation spreads badly.

Thanks again.

Re: MP3 Frame length conundrum

Reply #3 – 2020-03-31 20:25:01

Just wanted to say to SirNickity, thanks 9 years later

I'm working on HTTP-Live Streaming (my own variation) and splitting my mp3 stream into 5-second chunks. The code you posted made a perfect drop-in replacement to quickly grab 192 "frames" (5.0155 seconds @ 44.1 kHz) of MP3 data at a time, regardless of the bitrate. Saved me the headache of writing that from scratch. This also allowed me to include a VBR option very easily without any other code changes.

I have the basic knowledge of the MP3 file structure, but this just saved me a lot of time and debugging.

Hit me up privately if you would like to know the software project involved... I don't want to shamelessly plug my product...

Re: MP3 Frame length conundrum

Reply #4 – 2021-01-04 04:49:33

Hello,
I'm doing some research about mp3 music files and stump into some problem.
As I understand, the frame header can be found based on "sync word". Assuming that the data frame also contains a bit range similar to "sync word", can the mp3 player find the exact frame header? If not, how does it navigate to the exact frame to play?

Notice