Skip to main content

Topic: foo_chacon (Read 37469 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • Yirkha
  • [*][*][*][*][*]
  • Moderator
foo_chacon
I needed to fix broken tags of a bunch of files yesterday, so I've made myself this component to do that efficiently and I thought perhaps someone else might find it useful as well, so here it is.

The offered functionality is essentially similar to what the "Override charset" option in foo_infobox did, though it's accessed directly from the context menu and for any number of tracks at once.

It can be generally used to fix ID3v1 tags or cue sheets saved in a codepage different from that of your system. (And no, nobody wants to hear about those infelicitous files from shabby sources ;)

documentation & tutorial
foo_chacon-v3.zip (62 kB, v3, 2010/04/07)
mirror
  • Last Edit: 23 August, 2010, 12:23:00 AM by Yirkha
Full-quoting makes you scroll past the same junk over and over.

  • Borisz
  • [*][*][*][*]
foo_chacon
Reply #1
I am surprised that there are no replies to this, maybe because up till now there was no reason to switch from foo_infobox.

Thanks for this plugin, it does a seldom needed function, but when it's needed, it is unmeasurably helpful. It does the exact thing why I kept infobox in foobar, but it does it so much better.
  • Last Edit: 10 December, 2008, 02:59:56 PM by Borisz

  • nevets1219
  • [*]
foo_chacon
Reply #2
Just wanted to say thanks for such a great utility.  This alone definitely made the switch from v0.8.3 all that much easier

Might I ask you to explain more in regards to "convert to local codepage first" feature?

Also, might I suggest that a filter be created so that it's easier to select all Chinese codepages or all Japanese codepages.

  • Yirkha
  • [*][*][*][*][*]
  • Moderator
foo_chacon
Reply #3
Oh noes, people have found this and started asking questions... 


"Convert to local codepage first" feature is necessary, because foobar2000 reads files with unspecified character set as in "local system codepage". That is then converted to UTF-8, as everything else in fb2k. Because we want to re-read the tags in another charset, it's usually needed to first convert them back from UTF-8 to that local system codepage, then reparse it from whatever you have selected to UTF-8 again - and the checkbox enables the first part of this process.

Note that this is also why this component is inherently unsafe - there is no guarantee that the conversion "CP_target read as CP_system => CP_UTF8 => CP_system" is fully equivalent. The proper way would be to read the tags from various file formats directly, not using the standard input modules. But it seems to work quite well so far, so let's hope this won't be needed.


Regarding the filter -
Yes, something like that could be added and it would need some additional configuration and/or custom-drawn groups. Though when I used Chacon, I usually chose one particular charset and processed many subsequent files with it easily, because the setting was remembered. When something different came, even mindlessly skimming through the whole list was not so much hassle. I tend to leave it as simple and stupid as it is, thank you.
Full-quoting makes you scroll past the same junk over and over.

  • nevets1219
  • [*]
foo_chacon
Reply #4
Thanks for the info on the "convert to local pages first"

Regarding the filter feature, yea it's a bit more situational and probably wouldn't save all that much effort.

  • 2E7AH
  • [*][*][*][*][*]
foo_chacon
Reply #5
would it be possible to extend the component with custom chararacter remaping?

than we could easily transform latin to cyrillic or something similar

  • Yirkha
  • [*][*][*][*][*]
  • Moderator
foo_chacon
Reply #6
That would be possible. This component currently simply uses Windows routines to convert between different encodings, but adding a custom convertor wouldn't be hard and it would provide additional flexibility.

However I'm thinking about the way to store such remapping tables. To stay within the scope of "character set remapping", it would need to allow mapping arbitrary binary sequences to Unicode codepoints. Because it's not possible to use two different charsets in one user-editable text file, the data must be formatted for instance in hex - and I'd use the same format as iconv for great compatibility. For example, mapping A/B/C to a/b/c:
Code: [Select]
0x41 0x61
0x42 0x62
0x43 0x63

But when you speak about transliteration, I'm not sure if that format would be as suitable for it. Some kind of list of replacements, both already in Unicode, seems much better for such usage to me. And then I'm not sure if it has much to do with character set remapping...
Full-quoting makes you scroll past the same junk over and over.

  • 2E7AH
  • [*][*][*][*][*]
foo_chacon
Reply #7
ok, probably using $replace() is the easy way

i was thinking about simple remappings, in the same code page, and you think more globaly
i wouldn't have anything to suggest because the subject is beyond me

  • Yirkha
  • [*][*][*][*][*]
  • Moderator
foo_chacon
Reply #8
v0.0.2 is up, features one simple addition: it is possible to copy text from selected fields in the preview pane using context menu or keyboard shortcut Ctrl+C. Helps when you don't have a clue how the tags should really look - you can for instance paste them to Google and see if it yields plausible results.
Full-quoting makes you scroll past the same junk over and over.

  • deviantus
  • [*]
foo_chacon
Reply #9
What about UTF-16 support?

  • 2E7AH
  • [*][*][*][*][*]
foo_chacon
Reply #10
What about UTF-16 support?

Preferences ? Advanced ? Tagging ? MP3 ? ID3v2 writer compatibility mode

[edit] that is for writting, foobar has no problem with reading UTF 16
  • Last Edit: 17 February, 2009, 07:57:34 AM by 2E7AH

  • Yirkha
  • [*][*][*][*][*]
  • Moderator
foo_chacon
Reply #11
If your tags are stored in UTF-16, but read as UTF-8 or other charset, this component can't help you. It doesn't access the tags directly and such texts would get truncated or otherwise mangled before they even get there. (And that's an inherent limitation of how it works, not limited to UTF-16.)
Full-quoting makes you scroll past the same junk over and over.

  • neothe0ne
  • [*][*][*][*]
foo_chacon
Reply #12
I was going to say once that Acropolis's masstagger addons component added this functionality to the masstagger in foobar (so there were components doing this before), but with v1.8 around foobar version 0.9.6.x Peter blocked third parties from attaching to it (doh).  It's not exactly the same, but if it's possible (I don't know much about codepages) it would be nice to have a function specific to converting Traditional Chinese to Simplified and vice-versa, since Acropolis's not deprecated plugin could do that.  I'd understand if you aren't able to or aren't willing though.

  • Yirkha
  • [*][*][*][*][*]
  • Moderator
foo_chacon
Reply #13
That's a bit more related to what 2E7AH suggested, again not so much about character conversion. I might add another interface for this kind of conversions or custom transliterations, basically it's not a bad idea.
Full-quoting makes you scroll past the same junk over and over.

  • 2E7AH
  • [*][*][*][*][*]
foo_chacon
Reply #14
Yirkha, can you look here:

I tested one track converting the tags to latin-1 (ISO 8859-1) with Mp3tag, than using foo_chacon to convert it correctly in foobar, but without success. I tried with "Convert to lacal page" checked and unchecked, but same result. I'm in CP1251
It worked OK in the past, but I don't know if I was converting from this code page


  • Yirkha
  • [*][*][*][*][*]
  • Moderator
foo_chacon
Reply #15
Just FYI, version 3 of the component has been released a few days ago after I was confronted with a case of files with charset problems fixable using silly [font= "Courier New"]$replace()[/font], but not using Chacon automatically. It might have been actually something similar to the post above from last year, which I didn't read nor respond to for reasons I don't really know 

The UI has changed a bit and the plugin is generally more capable, but everything might be even more confusing if you don't know what you are doing now. You'll note the "
  • Convert to local code page first" checkbox is gone - if you have known configuration with it enabled or disabled, it was equivalent to selecting preconversion charset "<system code page>" or "<disabled>" respectively.

    Have fun.
Full-quoting makes you scroll past the same junk over and over.

  • sailorh
  • [*]
foo_chacon
Reply #16
I absolutely love this plugin. I needed something just like it. I haven't figured out what app is corrupting my tags, but this fixes them.

One little request, could you add the ability to fix only certain fields? Sometimes I have a garbled Artist tag but the Album tag is correctly encoded. So applying a different charset to the whole file ends up messing up the Album field.

But I love the plugin. Keep up the good work.

  • Yirkha
  • [*][*][*][*][*]
  • Moderator
foo_chacon
Reply #17
I haven't figured out what app is corrupting my tags, but this fixes them.
Huh, you have some app on your computer which randomly mangles your tags and just live with it?

One little request, could you add the ability to fix only certain fields? Sometimes I have a garbled Artist tag but the Album tag is correctly encoded. So applying a different charset to the whole file ends up messing up the Album field.
OK, even though all the tags are written in the same codepage most of the time, I see it might be useful for the kind of problems you have.
For the time being, you can leave a Properties window on the affected tracks opened and combine the values afterwards, or using clipboard etc.
Full-quoting makes you scroll past the same junk over and over.

  • sailorh
  • [*]
foo_chacon
Reply #18
Yeah, I'm not sure how these files got mangled. But it is only certain fields in seemingly random files.

Thanks for the suggestion about leaving a properties page open. I'll try that out.

  • neothe0ne
  • [*][*][*][*]
foo_chacon
Reply #19
I'd bet the offending app is WMP11/12.

Yirkha, I think there's a long-standing bug with this component.  If you import a CUE sheet with a local codepage encoded in ANSI and try to fix it, the component doesn't rewrite the CUE sheet in Unicode/UTF-8, which results in foobar2000 showing the correct characters but the actual text file having ? marks (which means permanent loss of text if you delete the tracks from your foobar playlist).  I've worked around this by pre-saving offending CUE sheets in UTF-8, then running "fix metadata charset", but it'd be nice if the component can do this automatically.

  • rend3r
  • [*]
foo_chacon
Reply #20
Could you add ability to disable conversation certain fields (Artist name, Track Title, Album title) in mp3 tags? Adding checkboxes in "Fix Metadata Charset" window, for example.

  • amrok
  • [*]
foo_chacon
Reply #21
I'm sure it is cp866, cause there is «Русский» → «.Р.......к.и.й». How can I restore original tag? What I have doing wrong?

  • lvqcl
  • [*][*][*][*][*]
  • Developer
foo_chacon
Reply #22
Copy tags and paste them here.

  • tksh
  • [*]
foo_chacon
Reply #23
Feature request: allow removal of certain code page entries -- my non-unicode tags cover four languages and even then I only use a small handful of the total combinations I can choose from.

Or alternatively (and as a much more difficult request), do something similar to the language auto-detection logic in web browsers that guess the correct encoding.

  • Yoshi8765
  • [*]
foo_chacon
Reply #24
Just going to say, this is such an awesome component! Thank you thank you thank you! I'm an avid fan of Jpop and Jrock and was bummed when my songs appeared in gibberish in foobar2k, but with this component, they are fixed in literally 2 seconds! It's so easy and straightforward to use.