Finding Zeros Sample in Lame MP3 Frame Header

Topic: Finding Zeros Sample in Lame MP3 Frame Header (Read 14293 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Finding Zeros Sample in Lame MP3 Frame Header

2009-02-15 20:14:16

Hi All,

My apology if this is a redundant topic.

I am writing a script for my own purpose to batch convert wav to mp3. In this script I use lame command line to do encoding task. As described as at LAME FAQ Tech (http://lame.sourceforge.net/tech-FAQ.txt), there will be silences at the start and the end of song for some reason.

After digging around the net, I found that lame includes some tag inside mp3 frame header. I also have read Gabriel's MP3 Info Tag (http://gabriel.mp3-tech.org/mp3infotag.html#delays) about delays and padding.

Gabriel said, that delays and padding information store in 3bytes (12bits for delays and 12bit for padding). these bits as reflection number of zeros samples which cause silence. What I still don't understand is where to find bytes (that indicates delays and padding) as Gabriel said.

I try to hexdump mp3 file from lame:

Code: [Select]

000000: FF FB 50 C4  00 00 00 00  00 00 00 00  00 00 00 00  ..P.............
000010: 00 00 00 00  00 49 6E 66  6F 00 00 00  0F 00 00 00  .....Info.......
000020: CE 00 00 A8  F9 00 03 06  08 0B 0D 10  12 15 17 1A  ................
000030: 1C 1F 21 24  26 29 2C 2F  31 34 36 39  3B 3E 40 43  ..!$&),/1469;>@C
000040: 45 48 4A 4D  4F 52 54 58  5A 5D 5F 62  64 67 69 6C  EHJMORTXZ]_bdgil
000050: 6E 71 73 76  78 7B 7D 80  83 86 88 8B  8D 90 92 95  nqsvx{}.........
000060: 97 9A 9C 9F  A1 A4 A6 A9  AC AF B1 B4  B6 B9 BB BE  ................
000070: C0 C3 C5 C8  CA CD CF D2  D4 D8 DA DD  DF E2 E4 E7  ................
000080: E9 EC EE F1  F3 F6 F8 FB  FD 00 00 00  3A 4C 41 4D  ............:LAM
000090: 45 33 2E 39  37 20 01 A5  00 00 00 00  2D FE 00 00  E3.97 ......-...
0000A0: 14 40 24 08  0A 42 00 00  40 00 00 A8  F9 E9 A2 3B  .@$..B..@......;
0000B0: 6E 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  n...............
0000C0: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
0000D0: FF FB 50 C4  00 03 C0 00  01 A4 00 00  00 20 00 00  ..P.......... ..
0000E0: 34 80 00 00  04 4C 41 4D  45 33 2E 39  37 55 55 55  4....LAME3.97UUU
0000F0: 55 55 55 55  55 55 55 55  55 55 55 55  55 55 55 55  UUUUUUUUUUUUUUUU
000100: 55 55 55 55  55 55 55 55  55 55 55 55  55 55 55 55  UUUUUUUUUUUUUUUU
000110: 55 55 55 55  55 55 55 55  55 4C 41 4D  45 33 2E 39  UUUUUUUUULAME3.9
000120: 37 55 55 55  55 55 55 55  55 55 55 55  55 55 55 55  7UUUUUUUUUUUUUUU
000130: 55 55 55 55  55 55 55 55  55 55 55 55  55 55 55 55  UUUUUUUUUUUUUUUU
000140: 55 55 55 55  55 55 55 55  55 55 55 55  55 55 55 55  UUUUUUUUUUUUUUUU
000150: 55 55 55 55  55 55 55 55  55 55 55 55  55 55 55 55  UUUUUUUUUUUUUUUU
000160: 55 55 55 55  55 55 55 55  55 55 55 55  55 55 55 55  UUUUUUUUUUUUUUUU
000170: 55 55 55 55  55 55 55 55  55 55 55 55  55 55 55 55  UUUUUUUUUUUUUUUU

Is there somebody can do me a favor by pointing me how to read this hex information ?

Thank you for response..!

Finding Zeros Sample in Lame MP3 Frame Header

Reply #1 – 2009-02-15 20:20:26

I am sorry for this mistake. This topic should be posted in MP3. I can't find the way to move this topic to MP3 by myself.

Finding Zeros Sample in Lame MP3 Frame Header

Reply #2 – 2009-02-15 21:43:03

I believe you got that wrong.

The silence at the beginning and the end of the file is not a consequence of the header, but a consequence of the methods mp3 is encoded and decoded with.

The header data helps a compatible decoder to skip the silent parts.
(yet, this only guarantees sample amount accuracy. With continuous tracks, there's the possibility of having small glitches, but less annoying than several milliseconds of silence)

Finding Zeros Sample in Lame MP3 Frame Header

Reply #3 – 2009-02-15 22:54:58

Something like this...
1 Find "Info" or "Xing"
2 Skip 120 bytes and make sure if it says "LAME"
3 Let the pointer p point to that "LAME" then:

Code: [Select]

uint32_t tmp24bits = 
          ( (uint32_t)*( p + 0x15 ) << 16 )
        | ( (uint32_t)*( p + 0x16 ) << 8 )
        | ( (uint32_t)*( p + 0x17 ) );
    Delay = (uint16_t)( tmp24bits >> 12L );
    Padding = (uint16_t)( tmp24bits & 0x0FFFL );

Quote from: [JAZ] on 2009-02-15 21:43:03

The header data helps a compatible decoder to skip the silent parts.

But non-LAME aware decoders don't, hence 1 frame delay. Especially true for almost every DirectShow decoder.

Finding Zeros Sample in Lame MP3 Frame Header

Reply #4 – 2009-02-16 18:20:00

Quote from: Liisachan on 2009-02-15 22:54:58

Something like this...
1 Find "Info" or "Xing"
2 Skip 120 bytes and make sure if it says "LAME"
3 Let the pointer p point to that "LAME" then:

Code: [Select]

uint32_t tmp24bits = 
          ( (uint32_t)*( p + 0x15 ) << 16 )
        | ( (uint32_t)*( p + 0x16 ) << 8 )
        | ( (uint32_t)*( p + 0x17 ) );
    Delay = (uint16_t)( tmp24bits >> 12L );
    Padding = (uint16_t)( tmp24bits & 0x0FFFL );

Hello Liisachan,
Thank you for responding. the script written in perl, because I don't know C.

following your suggestions,
1. Did you mean finding "Info" or "Xing" in ASCI mode? if that's true, in case my hexdump output will jump to "49", is it correct?
2. then we skip 120bytes.. and jump to "4C" and it's on "L" in ASCII..
3. would you to explain me what is the code will do?

Again, thanks!

Finding Zeros Sample in Lame MP3 Frame Header

Reply #5 – 2009-02-16 19:53:27

@chipset
I don't speak Perl very well, so i know this code sucks, but I hope you'll get the general idea from this skeleton.

Code: [Select]

my $filename = $ARGV[0];
print "Trying to parse: \"$filename\"\n\n";
open IN, "<$filename" or die $!;
binmode IN;
while( 1 ) { # this loop could be like infinite: You'll have to quit in reality if you reach the end of the 1st frame
    read( IN, $buf, 4 ) or die $!;
    if( $buf eq "Info" or $buf eq "Xing" ) {
        seek( IN, -4, 1 );
        seek( IN, 120, 1 ) or die $!;
        read( IN, $buf, 4 );
        if( $buf eq "LAME" ) {
            seek( IN, -4, 1 );
            printf "LAME Tag found at 0x%X\n", tell( IN );
            seek( IN, 21, 1 ); # 0x15
            $a = getc(IN);
            $b = getc(IN);
            $c = getc(IN);
            printf "The 3 bytes are: %02X %02X %02X\n", ord $a, ord $b, ord $c;
            $tmp24bits = (ord $a) * 256 * 256 + (ord $b) * 256 + (ord $c);
            printf "Delay=%u, Padding=%u\n", int( $tmp24bits / 4096 ), $tmp24bits % 4096;
        } else {
            print "LAME Tag not found";
        }
        last;
    }
    seek( IN, -3, 1 ) or die $!;
}
close IN;

"Info" or "Xing" as ASCII charaters, yes.

Generally, you'll have to find the first byte of the MP3 frame, using binary 111 sync... but if you know you're handling the very 1st frame,
you can skip that part.
In your dump, you have "Info" and skip 120 bytes (0x78) from there, and you'll find "LAME".
Starting from that "LAME" go to offset 0x15-17, that is 24 08 0A in your sample. so 0x240 and 0x80A are what you want.
Basically You can simply "cut" this 0x24080A into 2 halves, "0x240" and "0x80A".
In many languages, that is bit-shift operation, perhaps in Perl too, but like I said I don't know Perl very well ^^;

Finding Zeros Sample in Lame MP3 Frame Header

Reply #6 – 2009-02-17 17:49:29

Superb!... this is exactly what I am looking for.. now I have script to convert huge number of wav to mp3 and at the same time generate info time to start and end!!! thanks!!!!

Finding Zeros Sample in Lame MP3 Frame Header

Reply #7 – 2009-02-17 18:52:09

I still think you didn't get it right.

If you encode the wav files with LAME (as it is necessary for the script to actually find a LAME tag...), it would have put already the correct values in.

Don't you understand it?

Finding Zeros Sample in Lame MP3 Frame Header

Reply #8 – 2009-02-17 22:22:28

Quote from: [JAZ] on 2009-02-17 18:52:09

I still think you didn't get it right.

If you encode the wav files with LAME (as it is necessary for the script to actually find a LAME tag...), it would have put already the correct values in.

Don't you understand it?

You're right if you're talking about standalone MP3 files to be played by audio players such as fb2k.
But maybe you don't understand the full implications of this hack--while the LAME tag is trying to make the results sample accurate, yes it's a nice hack while it is working, this very hack could have very annoying side effects in some other situations: One example would be when you want to concatenate 2 MP3 files or otherwise edit MP3 files, but a much more common trap is, when you use MP3 as the audio of your movie file.

Let me clarify a few things.

1. This so-called LAME tag is not LAME's hack. Afaik, it's originally Xing's. Not only LAME, but some other encoders use the very 1st frame to store meta-info. It is also called a VBR tag, but LAME put it by default even if it's CBR or ABR.

2. Like [JAZ] said, LAME by default stores some useful info there so that the MP3 can be decoded sample-accurate, but that works ONLY IF the decoder knows this non-standard hack. Other innocent decoders decode LAME MP3 into WAV with 2 kinds of unwanted delays:
(a) The 1st frame is parsed as a normal MP3 frame, hence 1152-sample delay, that is 24 ms if 48000 Hz, 26.1 ms if 44100 Hz.
(b) Encoder Delay (the very thing LAME tag tries to correct), typically ~ 12 ms. <-- This itself is not anyone's fault. It's simply how MP3 encoders work.

Simply put, while LAME tag helps some decoders greatly, it confuses many other decoders: among other things, Windows DirectShow and VfW/ACM but also any non LAME-aware players on Linux too. You might say "I don't use such things. I use fb2k" but the fact is, you're using DirectShow or ACM when playing your movie on Windows, for example using ffdshow; or using MPC or WMP; or using the Windows' default MP3 decoder (Fraunhofer). Almost anything. Almost no one knows this hack and simply delays the audio, giving you bad AV sync.

3a: If you're not sure whether the decoder can handle LAME tag or not, you can (and probably should) easily avoid (a) when encoding by LAME -t. Use the -t switch to disable the LAME tag. Note, if you read what lame --help says, you might think -t is only for VBR, which is wrong. You must use it also for CBR.
3b: To correct (b) manually is a bit tricky. Edit the original WAV and delete the # of silent samples for Encoder Delay from the very first part (only possible if they are silent samples but usually they are), before encoding MP3. For example 1105 samples if it's 1105. You can't tell the right value unless you test-encode it once, but the typical value for me is 1105.
Note, because of pre-echo, you cannot detect this delay simply by counting the # of silent samples of the decoded MP3 and comparing it with the original # of silent samples.

4. You don't need to do (3) when you're creating standalone audio files, to be played by an "ordinary" player such as fb2k. Because they can handle the LAME tag perfectly on their own. Undisputed.

5. On the other hand, you should do (3), at least (3a), if you're makaing an MP3 file to be the audio part of a movie (AVI/MP4/MKV/OGM etc). Otherwise, you don't get the correct AV sync. I think most of you have a few, or maybe a lot of, movie files where the audio is LAME MP3 VBR or CBR. Unless that MP3 is encoded with -t, your audio is too late against the video (AV not in sync) by about 1 video frame. Most (or maybe all) of your AVI files are probably in bad sync because of this. I wish Xing had decided to store this info before the 1st MP3 frame as some kind of meta data, not as a dummy MP3 frame which would look like a normal MP3 frame for other innocent decoders. But it's too late. Most multimedia players don't support this non-standard extension (the problem 2a above), and there is also 2b: 2a+2b=typically 36 ms, while the 1 video frame is typically 40-42 ms. So, for example, the audio which should sound at video frame 100 would actually sound at video frame 101, delayed.

So anyway... there are some valid reasons one is interested in LAME's Delay/Padding parameters. For example, when you're writing a multimedia player which will understand the LAME tag and do the sample-accurate corrections.

You may think the LAME tag makes the results sample-accurate, but for movie files, it's quite opposite. Using the LAME tag makes things bad, and you can make the results sample-accurate when NOT using it. You can, however, easily reduce the problem just by disabling the LAME tag by -t, and if you wish, you could make the AV sync sample-accurate by using the 3a+3b tricks above.

Finding Zeros Sample in Lame MP3 Frame Header

Reply #9 – 2009-02-18 22:10:21

Very detailed reply, and very informative.

But..... doesn't it imply that you need to remove the frame rather than modify the values?

The enc-delay for LAME is always the same (for the same mpeg mode), and the enc-padding depends on the sample length (and doesn't matter except for joining files or sample accuracy).

Finding Zeros Sample in Lame MP3 Frame Header

Reply #10 – 2009-02-18 22:55:20

Quote from: [JAZ] on 2009-02-18 22:10:21

But..... doesn't it imply that you need to remove the frame rather than modify the values?

True, you may want to remove an existing dummy frame in some cases. For instance, the one from the 2nd MP3 when you concatenate 2 MP3 files.
If for some reason you don't want to have this dummy frame at all, then you can use LAME -t to disable it when encoding, then LAME doesn't use the first frame to store those info, and so you don't need to remove anything (it's not there already).

You shouldn't disable the VBR tag when you're making a standalone MP3. Some players including fb2k are "addicted" to this hack, and without it, they get confused and, not only being sample-accurate anymore, they can't even show the average bit rate, the duration, etc. of MP3, hence seeking doesn't work correctly either. In general audio players are confused if it does NOT exist.
On the other hand, for example if you want to use MP3 as the audio track of your AVI, you don't need this dummy frame. Most multimedia (video) players are confused if it DOES exist. Also, AVI has its own seek table called idx1.

So this frame is helpful or harmful depending on what you want to do, and you may want to use -t if it shouldn't be there.

EDIT: fix: addictive->addicted (meaning, they really need it, depending no it)

Notice