HydrogenAudio

Lossy Audio Compression => MPC => Topic started by: tico-tico on 2014-12-18 20:25:46

Title: Error in conversion to utf-8 in tags.
Post by: tico-tico on 2014-12-18 20:25:46
I don't know if someone is still maintaining of the musepack sources, anyway here's my bug report to anyone for whom it matters.

In tags.c the authors use char as array index for the codepage conversion, which is very much signed at least in MSVC. I think the official build doesn't work with non-latin tags (cyrillic in my case) because of this error.

Code: [Select]
int
addtag ( const char*           key,             // the item key
         size_t                keylen,          // length of item key, or 0 for auto-determine
         const char*           value,           // the item value
         size_t                valuelen,        // the length of the item value (before any possible translation)
         int                   converttoutf8,   // convert flags of item value
         int                   flags )          // item flags proposal
{
    unsigned char*  p;
    unsigned char*  q;
    char   ch; <<---------------- here
    size_t          i;
Title: Error in conversion to utf-8 in tags.
Post by: r2d on 2014-12-20 14:31:56
Hello,

In tags.c the authors use char as array index for the codepage conversion, which is very much signed at least in MSVC. I think the official build doesn't work with non-latin tags (cyrillic in my case) because of this error.


My inderstanding is that the variable "ch" is used to store the character that will be translated, and not the codepage. The only assigned value is from value which is a char, so no pb here.
Do you have a case where this function fails ? If yes, please provide the inputs, the current result and the expected result.
Thanks,

Nicolas
Title: Error in conversion to utf-8 in tags.
Post by: tico-tico on 2014-12-20 17:44:05
Hello, Nicolas.

ch is used as an array index for an array which is used for the codepage conversion (from local to utf8) on the line 776
Code: [Select]
q = utf8char ( q, CP_ptr [ch] );


for the command mpcenc --title Песня 01.flac for unaltered sources i have

(http://i60.tinypic.com/25tzxut.jpg)

if i change type of ch to unsigned char:

(http://i61.tinypic.com/21lj03p.jpg)

Anyway I faced much more significant problem when I tried to convert files with german diaeresis / umlauts in the name, mpcenc just can't open them, because it uses ansi args (argc, argv). Therefore I think the only real fix is to use unicode argv, use wchar_t in file-handling functions and use WideCharToMultiByte in tags.c for converting to utf8. I did some (very) dirty hacks and this actually works.