Hi, Alex,
thank you for all your fast responses!
I've been trying to come up with a more efficient method for testing the album examples for the stripping/no stripping scripts.
My internet connection is not very good, and I was loosing a lot of time before, just downloading images that I wouldn't even keep in the end.
So I went on to read the command line reference again to see if there was a way to disable in batch (all at once) the "Search First" option for each source. I couldn't find it, so I just wrote a command line to enable "Search First" on all but ArkivMusik (as I intended) and then another to re-enable just the ones I usually use (for when I was finished testing).
I only found the command line reference on the help menu, which is fine for me, but it got me thinking:
Why not include a txt file with this reference in the program folder?
(maybe even a .rtf or .pdf file, like these --> rapidshare.com/files/304880632/Command_Line_Reference.zip )
I also thought it could be good to include the sources names in the reference, since the users will be using them quite often.
The other thing I was searching for was a way to disable all the "Always download full size" options I previously marked on some of the individual sources. I Couldn't find it, so I did it by opening (and closing) each source and unchecking the option there (quite a few clicks, I must say...). The positive side of this was that I also needed to reset the "Limit to xx results" option, so this was a good opportunity.
It seams to me it would make sense to have both a "08:" ("Limit to xx results") and a "A:" ("Always download full size") parameter, for those options for each source. (in the same fashion of the "F:" for the "Search First" option) e.g. AlbumArt.exe /s "A:F:Album Art Exchange,08:F:Coveralia,20:GoogleImage"
It could be good also to have an abbreviation system for the various sources (it could even help prevent some errors due to misspelling on the command line). e.g.: Album Art Exchange = AAE | AllCdCover = ACD | Amazon (.ca) = ACA | Amazon (.co.uk) = AUK | Amazon (.com) = ACO | Amazon (.de) = ADE | Amazon (.fr) = AFR | Amazon (.jp) = AJP | Archambault = ARC | ArkivMusik = AMS | Buy.com = BUY | CD Baby = CBY | CD Universe = CUN | Cover-Paradies = CPA | Coveralia = COV | CoverIsland = CIS | Darktown = DTW | Discogs = DIS | Encyclopaedia Metallum = EME | Freecovers API = FRA | GoogleImage = GOO | hitparade.ch = HIT | HMV Canada = HCA | Juno Records = JUN | LastFM Artist = LAR | LastFM Cover = LCO | maniadb = MDB | Metal Library = MEL | MusicMight = MUM | Psyshop = PSY | RevHQ = RHQ | Yes24 = Y24 | YesAsia = YES
This way, we could have an absolutely complete (but not as much massive) command line: AlbumArt.exe /ac off /mn 200 /mx 2100 /o s /o a- /s "12:A:F:AAE,15:F:ACD,12:A:F:ACA,12:A:F:AUK,12:A:F:ACO,12:A:F:ADE,12:A:F:AFR,12:A:F:AJP,4:ARC,4:AMS,4:BUY,12:A:F:CBY,12:A:F:CUN,12:CPA,8:F:COV,8:CIS,8:DTW,8:A:F:DIS,4:EME,8:FRA,20:F:GOO,4:A:F:HIT,4:HCA,12:A:F:JUN,6:F:LAR,4:A:F:LCO,4:MDB,4:MEL,4:MUM,4:PSY,4:RHQ,12:A:Y24,12:A:YES" /fileBrowser
instead of this (incomplete and more massive): AlbumArt.exe /ac off /mn 200 /mx 2100 /o s /o a- /s "F:Album Art Exchange,F:AllCdCover,F:Amazon (.ca),F:Amazon (.co.uk),F:Amazon (.com),F:Amazon (.de),F:Amazon (.fr),F:Amazon (.jp),Archambault,Buy.com,F:CD Baby,F:CD Universe,Cover-Paradies,F:Coveralia,CoverIsland,Darktown,F:Discogs,Encyclopaedia Metallum,Freecovers API,F:GoogleImage,F:hitparade.ch,HMV Canada,F:Juno Records,F:LastFM Artist,F:LastFM Cover,maniadb,Metal Library,MusicMight,Psyshop,RevHQ,Yes24,YesAsia" /fileBrowser
I know it's very uncharacteristic to have such a complete command line, but when you start to fiddle and change your sources and the specifics of each source as often as I've been changing, you never know what kind of settings were left the last time. This way, it's always best to re-set everything again (just to be safe).
There's always the argument against all these changes I brought up above: very few people will probably make use of these (command line reference file on folder | abbreviation for sources | "limit result" and "always downl. full size" options for sources on Comd.Line). I just thought it was worth bringing it up anyway, just in case you might agree with any of those remarks.
...AllCDCovers... Just a reminder, I investigated this problem back in 2008. It's a leech prevention system which seems to use session hashes (or similar) which expire after about 15 minutes.
Hi, Akkurat,
I must admit that I did not think of searching the thread for this problem before I posted. Thanks for the remainder!
It's nice knowing all that. It will be much easier to avoid this problem using the 15 minute rule now.
...Maybe even some characters should be spaces and some just removed? I was thinking of things like Alisha's Attic, which would almost certainly have better results as Alishas Attic than Alisha s Attic! On the other hand, as you say Vol 1 is probably better than Vol1.
Indeed! I hadn't thought of that. Maybe we could execute the function 2 times then, one time replacing some of the characters by a single space and another just stripping the rest of them. (I thought it could be easier to pass on the replacement value in the function, rather than creating another function for different char replacements)
...arkivmusic consistently gives only false positive...
Arkivmusik is a classical music site. I'm not surprised it does pretty poorly with those searches! If they collect statistics on what people search for on their site, I imagine "Agnes / Dance Love Pop : Love Love Love" is likely to have them scratching their heads a little. :-)
That's very funny indeed! I didn't have the basic idea to go out and check ArkivMusik site... Now I get the reason for all those false positives! Still, I think it's advisable to always disable this source by default. It's usually (for most users, probably) a waste of bandwidth.
...HTML glitch...
All that information is really useful - I'll try and work up a fix to the Amazon scraper script to fix it...
...I've looked into Amazon, and believe I have an update that should fix the HTML problem...
Wow! That was lightning-fast!
...I've checked out why amazon wasn't working with the & character, and have fixed that too. I've removed the stripping from Amazon and Freecovers, the rest are undecided. With the updated util.boo, though, results for & and ? should be discarded and re-tested, as they may have been fixed.
Ok, I started a whole new test battery (and included more albums to widen the sample, including also the characters "(" and ")").
I did a slight modification of the scripts you sent, just for testing purposes. It seams to me it would be more beneficial if we could compare the effects of stripping/no-stripping: (a) including the "(" and ")" characters and (b) testing the effects of stripping different characters on all sources.
Let me explain why I decided to reintroduce stripping to Amazon and Freecovers: I am still believing that Amazon will give equal results on both stripping and no-stripping and I would like very much to see if that's true. I also believe it would be beneficial to have Amazon tested for stripping against a much wider sample than those initial 10 albums I did. And regarding Freecovers, if you inspect carefully the results from before, you'll see that it give worst results for stripping for 3 cases, and better results for 2 cases. Being like that, I would like to see the results it would provide for a wider test sample.
...AllCDCovers...
...protection...irritating, but I'm not going to spend too long trying to work around it and get into some sort of arms race.
I completely agree with you. We (users) can still wait the 15 minute grace period (as Akkurat pointed out) or just follow the link and type the captcha if we can't wait at all to get the image. Trying to implement something to go around it could be pointless, mainly because it would be probably outdated very quickly, as you said.
I'm still testing and as soon as I'm done, I'll post the results.
Thanks again for everything!