bug in foo_id3v2 1.10 in foobar 0.77a

Topic: bug in foo_id3v2 1.10 in foobar 0.77a (Read 6420 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

bug in foo_id3v2 1.10 in foobar 0.77a

2004-01-06 11:35:00

foo_id3v2 1.10 doesn't read "é è ë ï ê à" etc... letters correctly

- Load a mp3 in foobar (using id3v2+id3v1 tag support)
- Tag it with TITLE = é è ë ï ê
- click "update file"
- click "reload info from file", it displays garbage like ￩￨(>squares in unicode) instead of éè

But if you load the file in winamp the tag is read correctly

The bug is only when reading tags, not when writing them.
There is no problem when you use id3v1 support only.

(using foobar 0.77a foo_id3v2 1.10 and windows2000 sp4 french version)
in foo_id3v2 config everything is unchecked (default options)

bug in foo_id3v2 1.10 in foobar 0.77a

Reply #1 – 2004-01-06 12:09:08

Tried tagging a file like you suggested but it works well.

And doesn't this belong to 3rd party plugins forum?

bug in foo_id3v2 1.10 in foobar 0.77a

Reply #2 – 2004-01-06 15:21:05

That's strange the problem is reproductible at 100% on my system, maybe it's a windows2000 problem only.

Also when I enable "Write ISO-8859-1 tags instead of UTF-16" in id3v2 plugin configuration, there is no more problem when tagging files.
It seems that the bug is only when reading id3v2 tags in UTF-16 on my system

bug in foo_id3v2 1.10 in foobar 0.77a

Reply #3 – 2004-01-11 22:44:44

Quote

That's strange the problem is reproductible at 100% on my system, maybe it's a windows2000 problem only.

Also when I enable "Write ISO-8859-1 tags instead of UTF-16" in id3v2 plugin configuration, there is no more problem when tagging files.
It seems that the bug is only when reading id3v2 tags in UTF-16 on my system

I agree - it's a "bug". The problem is that Windows NT, 2000, 98, ME and 95 don't support Unicode letters. For some reason, if you haven't checked the ISO... checkbox, foobar writes the tag as UTF. Then the characters like é ä etc don't show properly when the tag is read from the file.
I don't understand how they can be saved as ISO (so they obviously are included in the character table for the OS), but why the computer is too stupid to guess the correct symbol corresponding to the UTF representation of letters like é ä when these are stored in the UTF tag.
I agree with your solution - while you're not planning to upgrade to a newer version of Windows, you'll want to keep all your tags ISO-....

bug in foo_id3v2 1.10 in foobar 0.77a

Reply #4 – 2004-01-12 03:29:39

2000 supports Unicode:

http://support.microsoft.com/default.aspx?...&NoWebContent=1

bug in foo_id3v2 1.10 in foobar 0.77a

Reply #5 – 2004-01-13 03:28:03

Same problem here. Running foobar2k 0.7.7a and foo_id3v2 1.10 on WinXP SP1 English version.

Remember to enable ID3v2 in Preferences -> Playback -> Standard Inputs. Change tags to write to ID3v2.

Looking at the file with a hex-editor reveals the problem:

00000000 49 44 33 03 00 00 00 00 0f 76 54 49 54 32 00 00 ID3......vTIT2..
00000010 00 13 00 00 01 e9 ff 20 00 e8 ff 20 00 eb ff 20 .....éÿ .èÿ .ëÿ
00000020 00 ef ff 20 00 ea ff 00 00 00 00 00 00 00 00 00 .ïÿ .êÿ.........
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

é is U+00E9 but becomes U+FFE9 here, due to the fact that it is threated as signed char instead of unsigned!

0xE9 is -23 as a signed value, and when converting to a larger signed type (like short or int) the sign bit is extended. So it becomes 0xFFE9, which is correct for signed values but incorrect for unsigned.
All chars above 0x7F has this problem.

There are more bugs, when enabling Write byte order marker (BOM) in all strings:

00000000 49 44 33 03 00 00 00 00 0f 76 54 49 54 32 00 00 ID3......vTIT2..
00000010 00 15 00 00 01 fe ff e9 ff 20 00 e8 ff 20 00 eb .....þÿéÿ .èÿ .ë
00000020 ff 20 00 ef ff 20 00 ea ff 00 00 00 00 00 00 00 ÿ .ïÿ .êÿ.......
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

The BOM (fe ff) suggests that the text is encoded as big-endian, but the text is still encoded in little-endian!

Code: [Select]

FE FF  UTF-16, big-endian
FF FE  UTF-16, little-endian

http://www.unicode.org/faq/utf_bom.html#25

Furthermore, ID3v2 states that if no BOM is used, the default byte order should be big-endian. So that's another bug...

bug in foo_id3v2 1.10 in foobar 0.77a

Reply #6 – 2004-01-16 02:57:26

Sign extension is a fault of the Unicode text writer converting from the input String type without recasting.

As for the byte order marker, id3lib is broken.

ID3lib handles the BOM backwards, or it was designed with a big-endian platform in mind only, as it reads the characters into a big-endian format in memory, then later uses functions such as ucslen() on them. It also reads the characters in their native order if there is no BOM present.

This backwards handling led me to write a backwards writer. Now that I have corrected both reading and writing with BOM, I see that Explorer, Windows Media Player, and Winamp all handle the tags properly.

Stupid miscoded library misleading me... There is a lack of documentation on how that should be handled. ID3lib is just a huge mess.

If anybody would like to write a proper ID3v2 handling library from scratch... something that can handle multiple frames of each type, and also write the same... Be my guest! :B

New v1.11 uploaded, should fix all your UTF-16 reading AND writing problems. BOM writing option has been renamed internally and defaults to ON now, let me know if newly written tags stop working any particular software...

bug in foo_id3v2 1.10 in foobar 0.77a

Reply #7 – 2004-01-16 03:51:56

@kode54:
Confirmed with Japanese characters in UTF-16, no problem here.
Thanks for your work.

bug in foo_id3v2 1.10 in foobar 0.77a

Reply #8 – 2004-03-11 19:03:11

Quote

Same problem here. Running foobar2k 0.7.7a and foo_id3v2 1.10 on WinXP SP1 English version.

Remember to enable ID3v2 in Preferences -> Playback -> Standard Inputs. Change tags to write to ID3v2.

Looking at the file with a hex-editor reveals the problem:

00000000 49 44 33 03 00 00 00 00 0f 76 54 49 54 32 00 00 ID3......vTIT2..
00000010 00 13 00 00 01 e9 ff 20 00 e8 ff 20 00 eb ff 20 .....éÿ .èÿ .ëÿ
00000020 00 ef ff 20 00 ea ff 00 00 00 00 00 00 00 00 00 .ïÿ .êÿ.........
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

é is U+00E9 but becomes U+FFE9 here, due to the fact that it is threated as signed char instead of unsigned!

0xE9 is -23 as a signed value, and when converting to a larger signed type (like short or int) the sign bit is extended. So it becomes 0xFFE9, which is correct for signed values but incorrect for unsigned.
All chars above 0x7F has this problem.

There are more bugs, when enabling Write byte order marker (BOM) in all strings:

00000000 49 44 33 03 00 00 00 00 0f 76 54 49 54 32 00 00 ID3......vTIT2..
00000010 00 15 00 00 01 fe ff e9 ff 20 00 e8 ff 20 00 eb .....þÿéÿ .èÿ .ë
00000020 ff 20 00 ef ff 20 00 ea ff 00 00 00 00 00 00 00 ÿ .ïÿ .êÿ.......
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

The BOM (fe ff) suggests that the text is encoded as big-endian, but the text is still encoded in little-endian!

Code: [Select]
FE FF  UTF-16, big-endian
FF FE  UTF-16, little-endian
http://www.unicode.org/faq/utf_bom.html#25

Furthermore, ID3v2 states that if no BOM is used, the default byte order should be big-endian. So that's another bug...

This problem should have been addressed already in the latest id3lib. After receiving the tip from the above analysis, the culprit was identified in id3lib 3.8.3 , io_helpers.cpp:

(String is degined elsewhere as)
typedef std::basic_string<char> String;

size_t io::writeUnicodeText(ID3_Writer& writer, String data, bool bom)
{
...
unicode_t ch = (data << 8) | data[i+1];
...
}

The problem in the above statement in that data[i+1] is SIGNED! So when it's being explicitly casted into unicide_t (unsigned short), it get's sign extended as explained in the above quote. It could be simply fixed by casting data[i+1] to unsigned char.

bug in foo_id3v2 1.10 in foobar 0.77a

Reply #9 – 2004-03-11 23:01:08

Yes, it might have been addressed in the lastest id3lib, but I've made so many of my own changes, it will probably be a pain in the ass to upgrade. (Of course, I could just extract the two versions, diff, then patch my code, but there will probably be some rejects to fix, and I might have already implemented some of their fixes.)

Notice