Skip to main content
Topic: Identifying Case Variations in Tags (Read 285 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Identifying Case Variations in Tags

Hi.

I have a very large music library indexed using Minimserver. Minimserver distinguishes between upper and lower case, so if there is a variation, Minimserver will create multiple artists. Viz, 'The Beatles' and 'the beatles' are represented as different artists.

I'm trying to reconcile these by editing the tags in Foobar, but my filter does not recognise this difference. Is there any way to get Foobar to display all my artists so that variations are listed separately and adjacently? This will make it easier to fix variations and create uniformity for Minimserver.

Re: Identifying Case Variations in Tags

Reply #1
Instead of looking up the ones that need fixing I would just run a masstagger script using $caps or $caps2 on the entire library. Those that are fine will remain the same, those that need changing will have the first letter changed to uppercase (in the case of $caps2, uppercase letters that are already that will remain unchanged).

Re: Identifying Case Variations in Tags

Reply #2
Thanks. However, I have over 100,000 files and there are bound to be a number of false negatives. Maybe that's still the way to go, as it would be easier to rever the false negatives as they will be rare.

Re: Identifying Case Variations in Tags

Reply #3
Do it in batches of 20.000, you'll be done much faster then you would checking all artists manually. I did the same thing for 500.000 files.

Re: Identifying Case Variations in Tags

Reply #4
The most reliable way would be to install something like Facets (a case-sensitive library viewer) and show a list of artists. Then scroll through them, spot mistakes and fix them. Artists will show up two or multiple times, so shouldn't be that hard.
https://www.foobar2000.org/components/view/foo_facets

You can also experiment with filters like these to spot *some* of the mistakes automatically and take action if they weren't false positives (a lot of them will be):
Code: [Select]
$if($strcmp(%artist%,$lower(%artist%)),all lowercase,not all lowercase)
Code: [Select]
$if($strcmp(%artist%,$caps(%artist%)),already capitalized,not capitalized)
Code: [Select]
$if($strcmp(%artist%,$caps2(%artist%)),already capitalized,not capitalized)

If you want to make queries out of these you can do something like:
Code: [Select]
"$if($strcmp(%artist%,$caps(%artist%)),1,0)" IS 0

I personally would avoid batch renaming with $caps or the equivalent Capitalize function in the properties window. I found them to be largely destructive for the way I like to capitalize things (e.g. don't want every instance Of 'Of' capitalized in the middle Of names). Lots of artists have names that will trip capitalization in weird ways as well.

Re: Identifying Case Variations in Tags

Reply #5
Yes, that occurred to me to. My 'house style' is lower-case for short prepositions, though I expect I can reverse unwanted changes with $replace. I'll fiddle about with those filters, although it they do mostly bring up false negatives. Thank you both very much - that's enough to make this an easier job than going through the lot manually.

The most reliable way would be to install something like Facets (a case-sensitive library viewer) and list all values of artists. Then scroll through them, spot mistakes and fix them. Artists will show up two or multiple times, so shouldn't be that hard.
https://www.foobar2000.org/components/view/foo_facets

You can also experiment with filters like these to spot *some* of the mistakes automatically and take action if they weren't false positives (a lot of them will be):
Code: [Select]
$if($strcmp(%artist%,$lower(%artist%)),all lowercase,not all lowercase)
Code: [Select]
$if($strcmp(%artist%,$caps(%artist%)),already capitalized,not capitalized)
Code: [Select]
$if($strcmp(%artist%,$caps2(%artist%)),already capitalized,not capitalized)

I personally would avoid batch renaming with $caps or the equivalent Capitalize function in the properties window. I found them to be largely destructive for the way I like to capitalize things (don't want Of and similar things capitalized in the middle Of names).

Re: Identifying Case Variations in Tags

Reply #6
I personally would avoid batch renaming with $caps or the equivalent Capitalize function in the properties window. I found them to be largely destructive for the way I like to capitalize things (e.g. don't want every instance Of 'Of' capitalized in the middle Of names). Lots of artists have names that will trip capitalization in weird ways as well.

There are ways around that. If you set masstagger to first perform a caps action and as a second action use something like:
Code: [Select]
$replace(%artist%, Of , of )
To correct the instances where you don't want certain words within an artist name capitalized. The space before (and after) "of" makes it so artists starting off with "Of" won't get changed. Do the same for all words you don't want capitalized such as in,and,roman numerals,the,.... but always ensuring you have a space before (or/and after)the item you want to change so only words within the middle or end of an artist name gets changed.
I use this tactic to standardize titles. Here's the script I use for that (probably way overkill for artists)

Code: [Select]
$replace(%TITLE%, Iii, III, Iv, IV, Ii, II, Viii, VIII, Vii, VII, Ix, IX, Vi., VI., Of , of , The , the , In , in , A , a , An , an , And , and ,.a,.A,.b,.B,.c,.C,.d,.D,.e,.E,.f,.F,.g,.G,.h,.H,.i,.I,.j,.J,.k,.K,.l,.L,.m,.M,.n,.N,.o,.O,.p,.P,.q,.Q,.r,.R,.s,.S,.t,.T,.u,.U,.v,.V,.w,.W,.x,.X,.y,.Y,.z,.Z,-a,-A,-b,-B,-c,-C,-d,-D,-e,-E,-f,-F,-g,-G,-h,-H,-i,-I,-j,-J,-k,-K,-l,-L,-m,-M,-n,-N,-o,-O,-p,-P,-q,-Q,-r,-R,-s,-S,-t,-T,-u,-U,-v,-V,-w,-W,-x,-X,-y,-Y,-z,-Z, $char(39)a, $char(39)A, $char(39)b, $char(39)B, $char(39)c, $char(39)C, $char(39)d, $char(39)D, $char(39)e, $char(39)E, $char(39)f, $char(39)F, $char(39)g, $char(39)G, $char(39)h, $char(39)H, $char(39)i, $char(39)I, $char(39)j, $char(39)J, $char(39)k, $char(39)K, $char(39)l, $char(39)L, $char(39)m, $char(39)M, $char(39)n, $char(39)N, $char(39)o, $char(39)O, $char(39)p, $char(39)P, $char(39)q, $char(39)Q, $char(39)r, $char(39)R, $char(39)s, $char(39)S, $char(39)t, $char(39)T, $char(39)u, $char(39)U, $char(39)v, $char(39)V, $char(39)w, $char(39)W, $char(39)x, $char(39)X, $char(39)y, $char(39)Y, $char(39)z, $char(39)Z)

Re: Identifying Case Variations in Tags

Reply #7
Thank you! I've used that formulation for prepositions, but hadn't thought of Roman numerals or names with apostrophes. There's no such thing as overkill when it comes to automating that many files. I'll try and use some combination of all these filters to get something largely automated.
I personally would avoid batch renaming with $caps or the equivalent Capitalize function in the properties window. I found them to be largely destructive for the way I like to capitalize things (e.g. don't want every instance Of 'Of' capitalized in the middle Of names). Lots of artists have names that will trip capitalization in weird ways as well.

There are ways around that. If you set masstagger to first perform a caps action and as a second action use something like:
Code: [Select]
$replace(%artist%, Of , of )
To correct the instances where you don't want certain words within an artist name capitalized. The space before (and after) "of" makes it so artists starting off with "Of" won't get changed. Do the same for all words you don't want capitalized such as in,and,roman numerals,the,.... but always ensuring you have a space before (or/and after)the item you want to change so only words within the middle or end of an artist name gets changed.
I use this tactic to standardize titles. Here's the script I use for that (probably way overkill for artists)

Code: [Select]
$replace(%TITLE%, Iii, III, Iv, IV, Ii, II, Viii, VIII, Vii, VII, Ix, IX, Vi., VI., Of , of , The , the , In , in , A , a , An , an , And , and ,.a,.A,.b,.B,.c,.C,.d,.D,.e,.E,.f,.F,.g,.G,.h,.H,.i,.I,.j,.J,.k,.K,.l,.L,.m,.M,.n,.N,.o,.O,.p,.P,.q,.Q,.r,.R,.s,.S,.t,.T,.u,.U,.v,.V,.w,.W,.x,.X,.y,.Y,.z,.Z,-a,-A,-b,-B,-c,-C,-d,-D,-e,-E,-f,-F,-g,-G,-h,-H,-i,-I,-j,-J,-k,-K,-l,-L,-m,-M,-n,-N,-o,-O,-p,-P,-q,-Q,-r,-R,-s,-S,-t,-T,-u,-U,-v,-V,-w,-W,-x,-X,-y,-Y,-z,-Z, $char(39)a, $char(39)A, $char(39)b, $char(39)B, $char(39)c, $char(39)C, $char(39)d, $char(39)D, $char(39)e, $char(39)E, $char(39)f, $char(39)F, $char(39)g, $char(39)G, $char(39)h, $char(39)H, $char(39)i, $char(39)I, $char(39)j, $char(39)J, $char(39)k, $char(39)K, $char(39)l, $char(39)L, $char(39)m, $char(39)M, $char(39)n, $char(39)N, $char(39)o, $char(39)O, $char(39)p, $char(39)P, $char(39)q, $char(39)Q, $char(39)r, $char(39)R, $char(39)s, $char(39)S, $char(39)t, $char(39)T, $char(39)u, $char(39)U, $char(39)v, $char(39)V, $char(39)w, $char(39)W, $char(39)x, $char(39)X, $char(39)y, $char(39)Y, $char(39)z, $char(39)Z)

Re: Identifying Case Variations in Tags

Reply #8
There are ways around that. If you set masstagger to first perform a caps action and as a second action use something like...
Couple examples off the top of my head that will trip even that process:
k-waves LAB
a-TTTempo
dBu music
t.A.T.u.
A.m.u.

Now granted most artists tend to use sane capitalization and I'm probably in the minority with those affected by this/caring. For most people your script is probably suitable and can be a major time saver. I however don't want to hit 1-5% of my library with this. One could maintain a list of exceptions but then you might as well just maintain the library itself. I'd rather have procedures that safeguard against badly tagged stuff getting into the library than trying to retroactively fix them with a blanket command. Your mileage may vary.

Re: Identifying Case Variations in Tags

Reply #9
It's not meant to be fully fail safe but if I can get 99% right with 10% of the work as opposed to 100% right with 10 times as much work I know which one to choose. Even with that said, there are very few special cases like you mentioned so probably not hard to know them  and include them as exceptions in the script. Still way less work than going through all artists and comparing them manually, probably a fun job....if you're a monk.

Re: Identifying Case Variations in Tags

Reply #10
If he has 100k tracks that would mean (according to my not very scientific estimates based on my library) ~2000-2500 artists. I just ran through slightly more than that spotting for mistakes in maybe 5 minutes, but's let's say 10, and most certainly not more than half an hour. It's fairly quick to spot duplicates when they largely look the same with tiny bits changed (you might not even have to read the text half the time, just squint for things that look similar). So I think manual checking in this case is well within the realms of feasibility.  Doing that or a batch script largely depends on whether the person cares about that 1-5%, and that is something he showed interest in.

In either case he could also do some preliminary testing by gathering the tracks that would be affected by your script and comparing the results. This is something I always do if there's mass tagging involved. '<Right click on tracks>/Properties/Tools/Automatically fill values' and I'm guessing masstagger both provide similar previews, but I prefer to do it this way when trying to weed out false positives.

Create an autoplaylist such as this:
Code: [Select]
"$if($strcmp(<ORIGINAL STRING>,<CHANGES TO BE MADE>),1,0)" IS 0
For example:
Code: [Select]
"$if($strcmp(%artist%,$replace(%artist%, Iii, III, Iv, IV, Ii, II, Viii, VIII, Vii, VII, Ix, IX, Vi., VI., Of , of , The , the , In , in , A , a , An , an , And , and ,.a,.A,.b,.B,.c,.C,.d,.D,.e,.E,.f,.F,.g,.G,.h,.H,.i,.I,.j,.J,.k,.K,.l,.L,.m,.M,.n,.N,.o,.O,.p,.P,.q,.Q,.r,.R,.s,.S,.t,.T,.u,.U,.v,.V,.w,.W,.x,.X,.y,.Y,.z,.Z,-a,-A,-b,-B,-c,-C,-d,-D,-e,-E,-f,-F,-g,-G,-h,-H,-i,-I,-j,-J,-k,-K,-l,-L,-m,-M,-n,-N,-o,-O,-p,-P,-q,-Q,-r,-R,-s,-S,-t,-T,-u,-U,-v,-V,-w,-W,-x,-X,-y,-Y,-z,-Z, $char(39)a, $char(39)A, $char(39)b, $char(39)B, $char(39)c, $char(39)C, $char(39)d, $char(39)D, $char(39)e, $char(39)E, $char(39)f, $char(39)F, $char(39)g, $char(39)G, $char(39)h, $char(39)H, $char(39)i, $char(39)I, $char(39)j, $char(39)J, $char(39)k, $char(39)K, $char(39)l, $char(39)L, $char(39)m, $char(39)M, $char(39)n, $char(39)N, $char(39)o, $char(39)O, $char(39)p, $char(39)P, $char(39)q, $char(39)Q, $char(39)r, $char(39)R, $char(39)s, $char(39)S, $char(39)t, $char(39)T, $char(39)u, $char(39)U, $char(39)v, $char(39)V, $char(39)w, $char(39)W, $char(39)x, $char(39)X, $char(39)y, $char(39)Y, $char(39)z, $char(39)Z)),1,0)" IS 0

Then temporarily add a column to your playlist viewer to compare the results:
Code: [Select]
<ORIGINAL STRING> -- <CHANGES TO BE MADE>
For example:
Code: [Select]
%artist% -- $replace(%artist%, Iii, III, Iv, IV, Ii, II, Viii, VIII, Vii, VII, Ix, IX, Vi., VI., Of , of , The , the , In , in , A , a , An , an , And , and ,.a,.A,.b,.B,.c,.C,.d,.D,.e,.E,.f,.F,.g,.G,.h,.H,.i,.I,.j,.J,.k,.K,.l,.L,.m,.M,.n,.N,.o,.O,.p,.P,.q,.Q,.r,.R,.s,.S,.t,.T,.u,.U,.v,.V,.w,.W,.x,.X,.y,.Y,.z,.Z,-a,-A,-b,-B,-c,-C,-d,-D,-e,-E,-f,-F,-g,-G,-h,-H,-i,-I,-j,-J,-k,-K,-l,-L,-m,-M,-n,-N,-o,-O,-p,-P,-q,-Q,-r,-R,-s,-S,-t,-T,-u,-U,-v,-V,-w,-W,-x,-X,-y,-Y,-z,-Z, $char(39)a, $char(39)A, $char(39)b, $char(39)B, $char(39)c, $char(39)C, $char(39)d, $char(39)D, $char(39)e, $char(39)E, $char(39)f, $char(39)F, $char(39)g, $char(39)G, $char(39)h, $char(39)H, $char(39)i, $char(39)I, $char(39)j, $char(39)J, $char(39)k, $char(39)K, $char(39)l, $char(39)L, $char(39)m, $char(39)M, $char(39)n, $char(39)N, $char(39)o, $char(39)O, $char(39)p, $char(39)P, $char(39)q, $char(39)Q, $char(39)r, $char(39)R, $char(39)s, $char(39)S, $char(39)t, $char(39)T, $char(39)u, $char(39)U, $char(39)v, $char(39)V, $char(39)w, $char(39)W, $char(39)x, $char(39)X, $char(39)y, $char(39)Y, $char(39)z, $char(39)Z)

This will give you a nice preview of all tracks affected and in what way, you can also copy the contents of the autoplaylist to a regular playlist to be able to remove false positives as you look through them. When you feel only the relevant tracks are remaining, you can apply the actual operation.

(If someone is wondering, in my case 2% of tracks would be affected.)

Re: Identifying Case Variations in Tags

Reply #11
There is another, non destructive way also using masstagger using this:

Code: [Select]
$crc32($lower($ascii($replace(%artist%,$char(32),,$char(33),,$char(34),,$char(39),,$char(40),,$char(41),,$char(44),,$char(46),,$char(47),,$char(58),,$char(59),,$char(63),,$char(96),,$char(145),,$char(146),,$char(147),,$char(45),,$char(145),,$char(146),,$char(148),,$char(180),,’,,$char(96),,The ,,And ,&,the ,,and ,&,n$char(39) ,ng) [%DIFFERENTIATE%])))

This won't touch any of the artist tags (or title tags if applied to titles). Basically it converts artist strings to 32 bit integers. It converts everything into lower case, strips spaces (handy for hard to catch miss taggings where there's an extra space) and other small details which often get mistaken (example Guns N' Roses, Guns 'N' Roses, Guns N Roses, Guns 'N Roses or Nick Cave & The Bad Seeds, Nick Cave And The Bad Seeds). This script can the assign this to an Artist ID field When you group based on this artist id field even artist names which aren't exactly written the same but are essentially the same artist are grouped together.
The %differentiate% tag is there to fill in a value to distinguish between different artists with the same name.

I basically use this for my sqlitedatabase and create queries based on integer id fields not based on string values. For titles especially this is a life savior when I want to build queries grouped by titles.

Re: Identifying Case Variations in Tags

Reply #12
If you're not married to tagging with fb2k, Mp3tag is handy for that sort of thing. It works with most formats, not just mp3.
You can configure the tag panel to display any field you like, and if you load and select a bunch of files, the appropriate fields in the tag panel give you a drop down list containing all the variations for the selected files. You can pick one of them, or type in something new, right click on the selected files and select "save tag" and they'll be updated accordingly.

See the attached screenshot. The loaded files are sorted by Album name. There's nothing wrong with the Album names in this case, but it shows how fields appear in the tag panel (which is configurable). You can sort loaded files by any of the standard columns/fields, or by creating custom columns/fields.

 
SimplePortal 1.0.0 RC1 © 2008-2019