Hydrogenaudio Forums

Hosted Forums => foobar2000 => 3rd Party Plugins - (fb2k) => Topic started by: Yirkha on 2008-10-05 16:08:08

Title: foo_chacon
Post by: Yirkha on 2008-10-05 16:08:08
I needed to fix broken tags of a bunch of files yesterday, so I've made myself this component to do that efficiently and I thought perhaps someone else might find it useful as well, so here it is.

The offered functionality is essentially similar to what the "Override charset" option in foo_infobox did, though it's accessed directly from the context menu and for any number of tracks at once.

It can be generally used to fix ID3v1 tags or cue sheets saved in a codepage different from that of your system. (And no, nobody wants to hear about those infelicitous files from shabby sources ;)

documentation & tutorial (http://wiki.hydrogenaudio.org/index.php?title=Foobar2000:Components_0.9/Chacon_(foo_chacon))
foo_chacon-v3.zip (http://yirkha.fud.cz/progs/foobar2000/foo_chacon-v3.zip) (62 kB, v3, 2010/04/07)
mirror (http://www.foobar2000.org/components/view/foo_chacon)
Title: foo_chacon
Post by: Borisz on 2008-12-10 19:59:08
I am surprised that there are no replies to this, maybe because up till now there was no reason to switch from foo_infobox.

Thanks for this plugin, it does a seldom needed function, but when it's needed, it is unmeasurably helpful. It does the exact thing why I kept infobox in foobar, but it does it so much better.
Title: foo_chacon
Post by: nevets1219 on 2008-12-11 10:01:29
Just wanted to say thanks for such a great utility.  This alone definitely made the switch from v0.8.3 all that much easier

Might I ask you to explain more in regards to "convert to local codepage first" feature?

Also, might I suggest that a filter be created so that it's easier to select all Chinese codepages or all Japanese codepages.
Title: foo_chacon
Post by: Yirkha on 2008-12-11 10:57:35
Oh noes, people have found this and started asking questions... 


"Convert to local codepage first" feature is necessary, because foobar2000 reads files with unspecified character set as in "local system codepage". That is then converted to UTF-8, as everything else in fb2k. Because we want to re-read the tags in another charset, it's usually needed to first convert them back from UTF-8 to that local system codepage, then reparse it from whatever you have selected to UTF-8 again - and the checkbox enables the first part of this process.

Note that this is also why this component is inherently unsafe - there is no guarantee that the conversion "CP_target read as CP_system => CP_UTF8 => CP_system" is fully equivalent. The proper way would be to read the tags from various file formats directly, not using the standard input modules. But it seems to work quite well so far, so let's hope this won't be needed.


Regarding the filter -
Yes, something like that could be added and it would need some additional configuration and/or custom-drawn groups. Though when I used Chacon, I usually chose one particular charset and processed many subsequent files with it easily, because the setting was remembered. When something different came, even mindlessly skimming through the whole list was not so much hassle. I tend to leave it as simple and stupid as it is, thank you.
Title: foo_chacon
Post by: nevets1219 on 2008-12-11 21:22:16
Thanks for the info on the "convert to local pages first"

Regarding the filter feature, yea it's a bit more situational and probably wouldn't save all that much effort.
Title: foo_chacon
Post by: 2E7AH on 2009-01-23 21:34:01
would it be possible to extend the component with custom chararacter remaping?

than we could easily transform latin to cyrillic or something similar
Title: foo_chacon
Post by: Yirkha on 2009-02-02 09:22:53
That would be possible. This component currently simply uses Windows routines to convert between different encodings, but adding a custom convertor wouldn't be hard and it would provide additional flexibility.

However I'm thinking about the way to store such remapping tables. To stay within the scope of "character set remapping", it would need to allow mapping arbitrary binary sequences to Unicode codepoints. Because it's not possible to use two different charsets in one user-editable text file, the data must be formatted for instance in hex - and I'd use the same format as iconv (http://www.gnu.org/software/libiconv/) for great compatibility. For example, mapping A/B/C to a/b/c:
Code: [Select]
0x41 0x61
0x42 0x62
0x43 0x63

But when you speak about transliteration, I'm not sure if that format would be as suitable for it. Some kind of list of replacements, both already in Unicode, seems much better for such usage to me. And then I'm not sure if it has much to do with character set remapping...
Title: foo_chacon
Post by: 2E7AH on 2009-02-02 17:41:57
ok, probably using $replace() is the easy way

i was thinking about simple remappings, in the same code page, and you think more globaly
i wouldn't have anything to suggest because the subject is beyond me
Title: foo_chacon
Post by: Yirkha on 2009-02-14 02:59:14
v0.0.2 is up, features one simple addition: it is possible to copy text from selected fields in the preview pane using context menu or keyboard shortcut Ctrl+C. Helps when you don't have a clue how the tags should really look - you can for instance paste them to Google and see if it yields plausible results.
Title: foo_chacon
Post by: deviantus on 2009-02-17 12:01:51
What about UTF-16 support?
Title: foo_chacon
Post by: 2E7AH on 2009-02-17 12:40:54
What about UTF-16 support?

Preferences ? Advanced ? Tagging ? MP3 ? ID3v2 writer compatibility mode

[edit] that is for writting, foobar has no problem with reading UTF 16
Title: foo_chacon
Post by: Yirkha on 2009-02-17 14:55:00
If your tags are stored in UTF-16, but read as UTF-8 or other charset, this component can't help you. It doesn't access the tags directly and such texts would get truncated or otherwise mangled before they even get there. (And that's an inherent limitation of how it works, not limited to UTF-16.)
Title: foo_chacon
Post by: neothe0ne on 2009-04-26 22:40:17
I was going to say once that Acropolis's masstagger addons component added this functionality to the masstagger in foobar (so there were components doing this before), but with v1.8 around foobar version 0.9.6.x Peter blocked third parties from attaching to it (doh).  It's not exactly the same, but if it's possible (I don't know much about codepages) it would be nice to have a function specific to converting Traditional Chinese to Simplified and vice-versa, since Acropolis's not deprecated plugin could do that.  I'd understand if you aren't able to or aren't willing though.
Title: foo_chacon
Post by: Yirkha on 2009-04-26 23:46:11
That's a bit more related to what 2E7AH suggested, again not so much about character conversion. I might add another interface for this kind of conversions or custom transliterations, basically it's not a bad idea.
Title: foo_chacon
Post by: 2E7AH on 2009-07-06 07:46:20
Yirkha, can you look here (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=73224&view=findpost&p=645158):

I tested one track converting the tags to latin-1 (ISO 8859-1) with Mp3tag, than using foo_chacon to convert it correctly in foobar, but without success. I tried with "Convert to lacal page" checked and unchecked, but same result. I'm in CP1251
It worked OK in the past, but I don't know if I was converting from this code page

Title: foo_chacon
Post by: Yirkha on 2010-04-20 22:17:04
Just FYI, version 3 of the component has been released a few days ago after I was confronted with a case of files with charset problems fixable using silly [font= "Courier New"]$replace()[/font], but not using Chacon automatically. It might have been actually something similar to the post above from last year, which I didn't read nor respond to for reasons I don't really know 

The UI has changed a bit and the plugin is generally more capable, but everything might be even more confusing if you don't know what you are doing now. You'll note the "
Title: foo_chacon
Post by: sailorh on 2010-10-06 16:28:18
I absolutely love this plugin. I needed something just like it. I haven't figured out what app is corrupting my tags, but this fixes them.

One little request, could you add the ability to fix only certain fields? Sometimes I have a garbled Artist tag but the Album tag is correctly encoded. So applying a different charset to the whole file ends up messing up the Album field.

But I love the plugin. Keep up the good work.
Title: foo_chacon
Post by: Yirkha on 2010-10-06 21:55:16
I haven't figured out what app is corrupting my tags, but this fixes them.
Huh, you have some app on your computer which randomly mangles your tags and just live with it?

One little request, could you add the ability to fix only certain fields? Sometimes I have a garbled Artist tag but the Album tag is correctly encoded. So applying a different charset to the whole file ends up messing up the Album field.
OK, even though all the tags are written in the same codepage most of the time, I see it might be useful for the kind of problems you have.
For the time being, you can leave a Properties window on the affected tracks opened and combine the values afterwards, or using clipboard etc.
Title: foo_chacon
Post by: sailorh on 2010-10-07 15:18:54
Yeah, I'm not sure how these files got mangled. But it is only certain fields in seemingly random files.

Thanks for the suggestion about leaving a properties page open. I'll try that out.
Title: foo_chacon
Post by: neothe0ne on 2010-11-29 05:56:59
I'd bet the offending app is WMP11/12.

Yirkha, I think there's a long-standing bug with this component.  If you import a CUE sheet with a local codepage encoded in ANSI and try to fix it, the component doesn't rewrite the CUE sheet in Unicode/UTF-8, which results in foobar2000 showing the correct characters but the actual text file having ? marks (which means permanent loss of text if you delete the tracks from your foobar playlist).  I've worked around this by pre-saving offending CUE sheets in UTF-8, then running "fix metadata charset", but it'd be nice if the component can do this automatically.
Title: foo_chacon
Post by: rend3r on 2010-12-15 17:24:36
Could you add ability to disable conversation certain fields (Artist name, Track Title, Album title) in mp3 tags? Adding checkboxes in "Fix Metadata Charset" window, for example.
Title: foo_chacon
Post by: amrok on 2011-01-07 20:03:20
I'm sure it is cp866, cause there is «Русский» → «.Р.......к.и.й». How can I restore original tag? What I have doing wrong?
(http://dl.dropbox.com/u/633801/2011-01-07_225912.png)
Title: foo_chacon
Post by: lvqcl on 2011-01-07 20:47:29
Copy tags and paste them here.
Title: foo_chacon
Post by: tksh on 2011-03-08 23:32:41
Feature request: allow removal of certain code page entries -- my non-unicode tags cover four languages and even then I only use a small handful of the total combinations I can choose from.

Or alternatively (and as a much more difficult request), do something similar to the language auto-detection logic in web browsers that guess the correct encoding.
Title: foo_chacon
Post by: Yoshi8765 on 2011-03-18 10:16:14
Just going to say, this is such an awesome component! Thank you thank you thank you! I'm an avid fan of Jpop and Jrock and was bummed when my songs appeared in gibberish in foobar2k, but with this component, they are fixed in literally 2 seconds! It's so easy and straightforward to use. 
Title: foo_chacon
Post by: Kovensky on 2012-01-13 12:47:31
Oh well, having issues converting tags here... (Windows is in cp932 (Shift-JIS) locale)

EDIT: Turns out one of the combinations I forgot to try worked, but the oddity of ISO-8859-1/15 producing japanese glyphs still remains. It's also very weird since CP1252 *is* ISO-8859-1-compatible (https://en.wikipedia.org/wiki/CP1252 (https://en.wikipedia.org/wiki/CP1252)), so both should have produced the same output.
(http://puu.sh/dfRi.png)
EDIT part2:  In case the error was because I didn't put "<system code page>" in preconversion for the ISO-8859-1 try, it actually doesn't matter what I pick for preconversion. The output is *always* the same, even if I pick completely nonsense preconversions; it's as if picking ISO-8859-1/15 tells it to "disable any and all charset conversion":
Spoiler (click to show/hide)

----------------------------------
Original post:

(http://puu.sh/dfLi.png)
The above image doesn't even make sense; even if ISO-8859-1 was the wrong charset (and/or I was lacking a preconversion) it'd produce random european characters, not japanese glyphs (which is the problem I'm trying to fix!). Picking ISO-8859-15 also produces the same results (well, it *is* 8859-1 + euro). Picking 1252 gives me completely different mojibake, but at least it's european mojibake:
(http://puu.sh/dfOr.png)

Someone on IRC told me I was doing it wrong and that the "right" way was this:
(http://puu.sh/dfLv.png)
But the results are the same.

For reference, here are the correct tags (fixed using `mkdir fixed; for i in *mp3; do avconv -i "$i" -f ffmetadata - | avconv -i - -i "$i" -c copy fixed/"$i"; done` (avconv assumes ISO-8859-1 on id3v1 and converts to UTF-8; I'd just have to pass through iconv if it was the wrong charset)):
(http://puu.sh/dfNh.png)

I've had similar issues when trying to fix Shift-JIS tags when my locale is cp1252 (Western); picking 932 as the original charset would still produce typical Shift-JIS-as-ISO-8859-1 artifacts, and no charset combination would help me.
Title: foo_chacon
Post by: Sandrine on 2012-01-14 09:07:58
Oh well, having issues converting tags here... (Windows is in cp932 (Shift-JIS) locale)


Undoubtedly foobar and its add-ons offer some very powerful capabilities but oftentimes the used names and strings are pretty counterintuitive as they are named by programmers. Such strings then are named technically correct, but are not very helpful for the end-user. foo_chacon is such an example, IMHO. There should be two fields for the character conversion, one named "Open as..." and the other "Save as...", so that for example if you have an ANSI 1252 codepage with the typical "♪" type of string you would "Open as" UTF-8 to get "?" and then "Save as" either UTF-8 or your local codepage. In the first case, only 1 conversion is done in total, in the latter case 2 conversions. "Preconversion" doesn't really nail it in case 1 because there is only one conversion in total, and not 2 as the "Pre-" would imply.
Title: foo_chacon
Post by: sanitysama on 2012-01-27 16:50:19
Does this plugin not affect filenames? I've got a bunch of broken filenames made from splitting improperly coded cuesheets. I'd split them in foobar and use Winamp to auto-correct the tags because the freedb query in foobar has never EVER worked prior to cuesheet splitting, or after, or on any file in general.

The result is I have proper FLAC tags, but the filename remains unaffected. I've done this to about a thousand files so I'd really prefer not having to copy/paste and rename each individual file.

http://i.imgur.com/xTzvy.png (http://i.imgur.com/xTzvy.png)
Title: foo_chacon
Post by: qazwsxedcs on 2015-07-18 18:50:59
I'm sorry to necropost but I just wanted to say that this is fantastic.
SimplePortal 1.0.0 RC1 © 2008-2020