Wouldn't it make sense to put all the entries into a singe database with MySQL or something similar. Then you could just use a simple SQL query to find duplicates or even clean up the entire table.
Wouldn't it make sense to put all the entries into a singe database with MySQL or something similar. Then you could just use a simple SQL query to find duplicates or even clean up the entire table.
Mmm, I suppose I could use some "case-insensitive 'like'" stuff in MySQL, but why?
Perhaps you don't know what a regular expression is.
Say that you have the name of an artist in two formats:
A. "Mike Oldfield"
B. "Oldfield, Mike"
In order to match these two, you need a regular expression.
Say for instance with this:
s/([^,]*),\s(.*)/$2 $1/
Also, for some various erroneous entries in albumtitle, I have currently come up with some other patterns, which I read from a file as:
while (<album_patterns>) {
chomp;
if (!(/^$/ || /^#/)) {
my ($pattern, $replacement, $modifier) = split /\t/;
$pattern =~ s/^'(.*)'$/$1/;
$replacement =~ s/^'(.*)'$/$1/;
$modifier =~ s/^'(.*)'$/$1/;
$albumPatterns{$pattern} = $replacement;
}
}
These are the patterns, though I should note that they are not finished yet. Also, the unicode setup on my box i f'ed, so I have to devise the patterns somewhat 'tarded:
# year
#'(\D('[1-9]\d|[1-9]\d{3}))' '[YEAR: $1]'
# yearspan
#'(\D('[1-9]\d|[1-9]\d{3}))(\s*.?\s*)(('[1-9]\d|[1-9]\d{3})\D?)' '[YEARSPAN: $2$3$5]'
# volumenumber
#'[Vv]ol(ume|\.)[\W\s]?(\d*|[a-zA-Z]*)' '[VOLUMENUMBER $2]'
# volumespan
#'[Vv]ol(\.|ume)?s?[\W\s]+(\w+)(.*[Vv]ol(\.ume)?s?)?(\W+(\w+))' '[VOLUMESPAN: $2_$6]'