Seite 1 von 1

pd.read_html

Verfasst: Dienstag 28. Februar 2023, 16:22
von Monjy
Hallo zusammen,

ich habe folgenden Code

Code: Alles auswählen

sp500 = pd.read_html('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0].Symbol.to_list()
Funktionier in Colaboratory und Replit auch wunderbar. Nur lokal via PyCharm oder auf dem Homeserver nicht.
Hier bekomme ich folgende Fehlermeldung:

Traceback (most recent call last):
File "/Users/.../lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/Users/.../lib/python3.10/http/client.py", line 1282, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Users/.../lib/python3.10/http/client.py", line 1328, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Users/.../lib/python3.10/http/client.py", line 1277, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Users/.../lib/python3.10/http/client.py", line 1037, in _send_output
self.send(msg)
File "/Users/.../lib/python3.10/http/client.py", line 975, in send
self.connect()
File "/Users/.../lib/python3.10/http/client.py", line 941, in connect
self.sock = self._create_connection(
File "/Users/.../lib/python3.10/socket.py", line 845, in create_connection
raise err
File "/Users/.../lib/python3.10/socket.py", line 833, in create_connection
sock.connect(sa)
TimeoutError: [Errno 60] Operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/.../PycharmProjects/test2/main.py", line 3, in <module>
sp500 = pd.read_html('http://en.wikipedia.org/wiki/List_of_S% ... _companies')[0].Symbol.to_list()
File "/Users/...lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/Users/.../lib/python3.10/site-packages/pandas/io/html.py", line 1205, in read_html
return _parse(
File "/Users/.../lib/python3.10/site-packages/pandas/io/html.py", line 986, in _parse
tables = p.parse_tables()
File "/Users...lib/python3.10/site-packages/pandas/io/html.py", line 262, in parse_tables
tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
File "/Users/.../lib/python3.10/site-packages/pandas/io/html.py", line 821, in _build_doc
raise e
File "/Users/.../lib/python3.10/site-packages/pandas/io/html.py", line 802, in _build_doc
with urlopen(self.io) as f:
File "/Users/.../lib/python3.10/site-packages/pandas/io/common.py", line 265, in urlopen
return urllib.request.urlopen(*args, **kwargs)
File "/Users/.../lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/Users/.../lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/Users/.../lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/Users/.../lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/Users/.../lib/python3.10/urllib/request.py", line 1377, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/Users/.../lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 60] Operation timed out>

Wenn ich allerdings die Liste des DOW´s abfrage mit folgendem Code:

Code: Alles auswählen

DOW = pd.read_html('https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average')[1].Symbol.to_list()
funktioniert alles. Ist ja der gleiche Vorgang. Vor allem funktionier auf Colaboratory beides.

Hat jemand eine Idee woran das liegen könnte ?

Re: pd.read_html

Verfasst: Dienstag 28. Februar 2023, 16:39
von __blackjack__
@Monjy: Sicher das Du die exakt gleiche URL ausprobiert hast? Und nicht http*s* verwendet hast?

Re: pd.read_html

Verfasst: Dienstag 28. Februar 2023, 17:30
von Monjy
ja , ich hab das eins zu eins kopiert

Re: pd.read_html

Verfasst: Mittwoch 1. März 2023, 19:09
von sparrow
@Monjy: Weil ich glaube, dass du den Hinweis möglicherweise nicht verstanden hast:

Du verwendest zwei verschiedene Protokolle: "https" funktioniert. "http" funktioniert offensichtlich nicht.

Re: pd.read_html

Verfasst: Mittwoch 1. März 2023, 21:10
von Monjy
so ist es , ich danke euch beiden. https funktioniert ...