Code: Alles auswählen
import urllib
from HTMLParser import HTMLParser
from htmlentitydefs import name2codepoint
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print "Start tag:", tag
for attr in attrs:
print " attr:", attr
def handle_endtag(self, tag):
print "End tag :", tag
def handle_data(self, data):
print "Data :", data
parser = MyHTMLParser()
file = urllib.urlopen("http://www.clever-tanken.de/tankstelle_liste?spritsorte=5&r=20&ort=97816&lat=&lon=")
content = file.read()
parser.feed(content)
Traceback (most recent call last):
File "tanken.py", line 20, in <module>
parser.feed(content)
File "/usr/lib/python2.7/HTMLParser.py", line 114, in feed
self.goahead(0)
File "/usr/lib/python2.7/HTMLParser.py", line 158, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.7/HTMLParser.py", line 305, in parse_starttag
attrvalue = self.unescape(attrvalue)
File "/usr/lib/python2.7/HTMLParser.py", line 472, in unescape
return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
File "/usr/lib/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in range(128)
Grund sind die Umlaute in:
<div id="tankstelle-46157" class="row price_entry" ng-init="addPoi('50,25168', '9,290797','','Globus Handelshof GmbH & Co. KG Betriebsstätte Wächtersbach',null,2209,2210,60,'1,569')">
Wie kann ich das lösen?