Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Header identification issues (Read 5056 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Header identification issues

I found several files where the identification of the MP3 header is possibly incorrect. At least two completely different CD rips contain two or three tracks where the files are marked to be MPEG 1 layer I, based on the first found frame header. All the other files of the rips are marked MPEG 1 layer III, as they should be.

The library I'm using is TagLib#. Other applications like mp3check and foobar2000 mark the files to be MPEG 1 layer III as they should.
TagLib# is basing all it's information on just the first found header. I know this isn't a save method, but I'd like to find out what is going wrong with the first header. Two to three tracks having this issue on multiple discs is too much of a coincidence imho.

I've analysed two different files and found the following:
  • Only the first header is marked "layer I", all others are "layer III"
  • Both first headers do not have identical other flags
  • The first header is located at an offset behind what should be the size of the ID3v2 tag
  • Most other headers, though not all, have the same bitrate
  • Only one of the first headers has the CRC bit set

Some more details of one of the files:
First header: FF FE 08 50 (11111111 11111110 00001000 01010000)
Second header: FF FB E2 04 (11111111 11111011 11100010 00000100)

Most important aspects of the headers compared:
  • layer I - layer III
  • Protected by CRC - Not protected
  • bitrate free - bitrate 320
  • 32000 Hz - 44100 Hz
  • Joint stereo (Stereo) - Stereo

Below is the data from the first found header until the second found (in uppercase on the first and last line).
Code: [Select]
...
000103c1 | ff ff FF FE 08 50 6b 40 82 46 29 22 a8 c3 08 8c c7 82 10 13 1f 3c 4c 3d 62 3c e1 08 08
000103de | 42 94 0d 94 a1 22 0f 20 6c 65 2b 5d f9 75 d0 a8 6f d2 08 24 da 41 32 89 02 37 39 8c 4a
000103fb | 2c 48 84 2d e5 e8 2d fe 97 e7 6d f6 4f a8 cc 15 1f 8e d2 d1 5d 94 ab 05 04 43 29 53 d1
00010418 | 2a c6 0d a3 a4 99 9a 31 c0 fc 8e a8 25 36 60 8c 0b 99 03 45 89 38 54 c0 a8 88 a3 d0 3a
00010435 | 2d 20 5d 68 c5 1b 8a 20 49 ca a3 3c 2b 42 44 4e 4f 6a 1b 43 c9 58 8c a6 dd b3 54 99 2b
00010452 | 04 11 46 44 55 a4 ca 30 42 8f 5e 88 a9 11 e6 d1 2a 65 19 fc 9a 5c 98 fa 6d 62 27 a4 42
0001046f | d2 f5 64 88 4e 22 58 48 81 d2 41 18 19 c7 e3 d0 37 a4 09 6d 56 c1 63 6b e3 32 9c 18 b2
0001048c | c4 08 50 d2 8c d2 03 04 ef 5d 54 2c 0e 29 34 97 56 9e b5 be 90 11 7c 62 d1 b2 e4 d8 64
000104a9 | 56 a2 86 60 ac 58 41 7a a4 26 4b 35 49 e0 b4 96 22 37 8c 42 0e 18 f3 27 89 7b 58 30 83
000104c6 | 1a b9 7b 87 27 11 4f b2 29 23 d3 c9 de a6 b0 3d c1 11 da e0 5d 13 ca 19 3b 1b a5 80 69
000104e3 | 2e aa aa 2c d3 49 73 7f dc c7 4e 71 ad 40 94 b1 68 8a 58 4b dd c9 2b c0 cd e2 17 a4 6f
00010500 | ed 2c cc cc fd 3d ee d9 d5 cb 72 bf 86 aa 4b 56 19 0a 2a bc 4b 99 14 a8 2b 39 23 85 04
0001051d | e8 d9 03 1c 2c c1 8b 89 19 83 72 25 d6 c3 68 1e 6b 12 55 a2 26 66 b2 a2 34 5e e7 46 95
0001053a | e8 57 55 82 ed ac 74 e8 98 b2 32 70 f1 31 56 26 85 31 03 31 c5 a2 89 99 a5 a9 8a da 45
00010557 | 6f 83 45 85 28 3b 97 15 97 8c fb 7a 88 52 e4 28 d0 49 eb aa 7a ce 35 48 54 66 d3 c4 5e
00010574 | 05 06 51 21 82 dd 03 48 3b 4e a5 64 4e c2 9d 32 b3 59 b9 3c ea a2 86 36 42 0d 80 92 07
00010591 | 1c d3 cd 63 58 d4 92 96 26 b2 98 b4 22 2c 24 53 20 76 d6 56 f2 4b 4e 2b b8 27 0e 97 4f
000105ae | 56 ab 30 ec a8 b5 4b 0a 8f 16 11 38 a4 a6 71 a1 d6 37 08 25 85 0e 06 f2 b1 d7 a1 90 c3
000105cb | 80 51 49 c7 4d 02 5a f4 02 39 51 fa 32 22 43 34 d1 51 08 69 5b 8c bd 23 99 6c ab 18 2d
000105e8 | 3e 2b d3 cd 33 ce ce c8 62 36 ae d2 d1 61 1f 9d a3 c2 7e bd 1c 47 54 17 6b 49 e6 05 09
00010605 | ad a4 a8 ce 18 3c de 98 65 b9 94 38 24 10 c0 b8 ae 42 93 66 13 92 c2 8c 20 7b e6 b5 89
00010622 | 10 a9 ee e0 aa 02 3b 8d ae 4a cc 99 60 69 26 98 7e 3c 3c 7a 2a 1b 63 b0 e9 1d 30 6c 9d
0001063f | 98 9c 73 9c b9 14 65 17 c9 54 0b a2 42 61 46 e1 64 8b a2 61 41 5b f5 02 90 f0 75 69 db
0001065c | b8 e9 0b 06 86 b1 27 d2 cb 29 24 d5 8a f7 71 62 0f 27 9b 83 ec 12 c9 8c 36 50 d4 12 b6
00010679 | 94 33 3c 2b 06 57 c6 3a d0 d4 4c 96 4d cc 6a cc 9f 67 2f 4a 75 e6 c2 da 86 53 96 cd 58
00010696 | f8 d6 e7 74 d3 d0 db 26 FF FB E2 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...

What is this first found "frame" about?

Can it be that a possible VBR header frame does not have the same crucial flags as the other "normal" MP3 frames?

Is it just as well a valid MP3 frame or is it something else and could it safely be discarded?

If I change the code to only retrieve information from the first header found which has an identical following one (not taken stuff like bitrate into account), it would mean the frame above would be discarded. Would I not be taking the risk to discard a VBR header, since that's supposed to be located in the first frame?

Header identification issues

Reply #1
Do you have an ID3v2 tag with album art in that file? I am guessing that what you see is not a header, but something that fools the tagger. It is not valid in a stream to switch between mpeg layers.
I don't think it's the VBR table, because the VBR table is inside a frame.

Header identification issues

Reply #2
Do you have an ID3v2 tag with album art in that file?

Yes, it has an embedded jpg, the same as the other ripped files which do get recognized correctly.
Also, the complete ID3v2 size is set to:
Code: [Select]
00 04 05 4D

Which means until an offset of 102CD. The first frame is found behind this offset, as you can see from the hex code in my initial post, thus past the ID3v2 tag, including the embedded picture.

I have removed the picture to see if it made any difference, but unfortunately it did not. The same result can be reproduced after re-adding the picture.

I don't think it's the VBR table, because the VBR table is inside a frame.

Do you mean that you can see that it is not a valid frame?

Header identification issues

Reply #3
No. My last sentence was meant to say that the VBR header is designed to be inside a frame, and in this case, the parser should not see its data as another frame.

I am unsure on what is the data you've pasted. FF FB does look as valid (i guess it starts with silence, and that's why most of the data after this is zero). The data before does not look like a vbr header, since the vbr header contains a map of 100 points onto the stream, and i see no increasing of values in there.

if you could do some more tests maybe directly with the commandline instead of using a frontend and not put any id3 data, that might help to identify the strange data.

Header identification issues

Reply #4

if you could do some more tests maybe directly with the commandline instead of using a frontend and not put any id3 data, that might help to identify the strange data.

Sure, which tests do you have in mind. Also, what do you mean by "not put any id3 data"? Removing the ID3 tag(s) completely first?

Header identification issues

Reply #5
No, “directly with the commandline instead of using a frontend and not put any id3 data” means to encode some other MP3s from the command-line without adding tags.

 

Header identification issues

Reply #6
No, “directly with the commandline instead of using a frontend and not put any id3 data” means to encode some other MP3s from the command-line without adding tags.

I think it will be nearly impossible to reproduce MP3s that will be causing the same issues.
The MP3s which have the issue are encodes from a long time back. Where one release is encoded using a Xing encoder, the other uses e.g. a FhG encoder. It is therefore not limited to a specific encoder, nor do I have a clue which version of a specific encoder I used at the time.

What I could do is rip e.g. two of the discs again to wav, and see if the latest lame encoder will produce the same issue. If this does not result in the same issue, it'll be looking for a needle in a hay stack, not to mentioned weeks of work, re-encoding my complete collection.

If I was to use the lame encoder by cli, are there any specific flags, besides the default encoding parameters, I should use to produce more data to actually have something to analyse?