I realize that this is an old thread and a somewhat dead component but I want to ask anyway.
About a year ago maybe (?) I started to get the following added to all lyrics that I downloaded from LyricWiki using Lyricgrabber2:
[...]
I would guess that something about the LyricWiki site changed enough to make some puncuation of the LyricWiki script that the grabber used to be wrong. Since then, whenever I download lyrics from there I have to manually remove the above manually. I know, tough cookies.
I tried to figure out where the script was to see if I could modify it myself. I failed.
Anyone run into this?
Yeah, LyricWiki changed something on their site a while ago, and the built-in scraper doesn't work anymore. The great thing about this component is that it's compatible with external "scripts" so that it's easy to make/edit your own when sites change their code. Here's a replacement LyricWiki script that works for me:
# -*- coding: utf-8 -*-
import encodings.utf_8
import urllib
from xml.dom import minidom
from LevenshteinDistance import LevenshteinDistance
from unescape import unescape
from grabber import LyricProviderBase
class LyricWiki(LyricProviderBase):
def GetName(self):
return "LyricWiki"
def GetVersion(self):
return "1.0"
def GetURL(self):
return "[url=http://lyrics.wikia.com/]http://lyrics.wikia.com/[/url]"
def Query(self, handles, status, abort):
result = []
for handle in handles:
status.Advance()
if abort.Aborting():
return result
artist = handle.Format("[%artist%]")
song = handle.Format("[%title%]")
try:
string = urllib.urlopen("[url=http://lyrics.wikia.com/api.php?artist=%s&song=%s&fmt=xml]http://lyrics.wikia.com/api.php?artist=%s&...=%s&fmt=xml[/url]" % (urllib.quote(artist), urllib.quote(song))).read()
doc = minidom.parseString(string)
child = doc.getElementsByTagName("LyricsResult")[0]
url = child.getElementsByTagName("url")[0]
url = url.childNodes[0].data.encode('utf_8')
found_artist = child.getElementsByTagName("artist")[0]
found_song = child.getElementsByTagName("song")[0]
found_artist = found_artist.childNodes[0].data.encode('utf_8')
found_song = found_song.childNodes[0].data.encode('utf_8')
artist = artist.lower()
song = song.lower()
found_artist = found_artist.lower()
found_song = found_song.lower()
if (LevenshteinDistance(artist, found_artist) < 5) and (LevenshteinDistance(song, found_song) < 7):
string2 = urllib.urlopen(url).read()
start = string2.find("<div class=\'lyricbox") + 22
start = string2.find("</script>",start) + 9
end = string2.find("<!--", start)
lyric = string2[start:end].replace("
","\r\n").replace("<i>","").replace("</i>","")
lyric = unescape(lyric).encode('utf8')
if (lyric.find("<script>") == -1 and lyric.find("</noscript>") == -1 and lyric.find("Unfortunately, we are not licensed to display the full lyrics for this song at the moment") == -1):
result.append(lyric)
else: result.append('')
else: result.append('')
except:
traceback.print_exc(file=sys.stdout)
result.append('')
continue
return result
if __name__ == "__main__":
LyricProviderInstance = LyricWiki()
You should just be able to save those contents into a *.py file and place it in your pygrabber\scripts directory. You'll now have to access LyricWiki using the "scripts" submenu.
---
While typing up this response, I just noticed that copying and pasting from a codebox on this forum doesn't seem to retain the indentations, which will break the script. I've uploaded the script code here:
http://codeviewer.org/view/code:5143
Just use the dropdown in the upper-right to download the code with indentations intact. Again, rename it to *.py and place it in the appropriate directory.