CSV-Datei einlesen mit Zeilenübersprung und Fehlermeldung

Kahnbein.Kai · Freitag 29. März 2019, 11:40

Hallo,
ich möchte einen CSV-Datei einlesen, dabei sollen die ersten beiden Zeilen übersprungen werden. Die CSV-Datei sieht folgendermaßen aus:
Bild

Das ist mein Code dazu,

Code: Alles auswählen

import matplotlib.pyplot as plt
import pandas as pd

Tab = pd.read_csv('C:\\Users\\Kai\\Downloads\\Datenuebergabe\\mst01.csv', skiprows=[1] , delimiter=';')

#pd.set_option("display.max_columns",999) Setzt die columns Anzahl auf 999

print(Tab.head(5)) # Zeigt den Head mit 5 Zeilen an
print(list(Tab))

print(Tab.info())
print(Tab.dtypes)

ax = plt.gca()

Tab.plot(kind='line',x='Nr. ',y='cm', label='cm', ax=ax)
Tab.plot(kind='line',x='Nr. ',y='l/s', color='red', ax=ax)

plt.xlabel('Nr.')
plt.ylabel('cm')
plt.title('Mst01')
plt.legend()

plt.show()

und das ist die Fehlermeldung dazu:

Code: Alles auswählen

Traceback (most recent call last):
  File "pandas\_libs\parsers.pyx", line 1134, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas\_libs\parsers.pyx", line 1240, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas\_libs\parsers.pyx", line 1256, in pandas._libs.parsers.TextReader._string_convert
  File "pandas\_libs\parsers.pyx", line 1494, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 2: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Kai\Documents\FH Bochum\Master\6.Semester\Informatik\Workspace\Python\Auswertung_Matplotlib.py", line 4, in <module>
    Tab = pd.read_csv('C:\\Users\\Kai\\Downloads\\Datenuebergabe\\mst01.csv', skiprows=[1] , delimiter=';')
  File "C:\Program Files\Python37\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Program Files\Python37\lib\site-packages\pandas\io\parsers.py", line 446, in _read
    data = parser.read(nrows)
  File "C:\Program Files\Python37\lib\site-packages\pandas\io\parsers.py", line 1036, in read
    ret = self._engine.read(nrows)
  File "C:\Program Files\Python37\lib\site-packages\pandas\io\parsers.py", line 1848, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
  File "pandas\_libs\parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas\_libs\parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 1094, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas\_libs\parsers.pyx", line 1141, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas\_libs\parsers.pyx", line 1240, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas\_libs\parsers.pyx", line 1256, in pandas._libs.parsers.TextReader._string_convert
  File "pandas\_libs\parsers.pyx", line 1494, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 2: invalid start byte

Es hat wohl was mit dem umwandeln von Strings in bytes zu tun, was ich nicht verstehe, woran scheitert es, "in position 2" ist damit die Datum Spalte gemeint ?

Schönes Wochenende
Kai

__blackjack__ · Freitag 29. März 2019, 12:34

@Kahnbein.Kai: Nein, mit Position 2 ist in irgendeiner Bytefolge innerhalb irgendeiner Zelle das dritte Byte gemeint was keine gültige UTF-8 Bytesequenz ist. Also ist die CSV-Datei offenbar nicht in UTF-8 kodiert und Du musst die korrekte Kodierung bei `read_csv()` als Argument angeben, damit die Bytes in Zeichenketten dekodiert werden können.

ThomasL · Freitag 29. März 2019, 13:15

Code: Alles auswählen

Tab = pd.read_csv('C:\\Users\\Kai\\Downloads\\Datenuebergabe\\mst01.csv', skiprows=[1] , delimiter=';')

Du möchtest beim Einlesen die ersten beiden Zeilen der Datei überspringen?
Dann read_csv mit dem Parameter skiprows=2 oder skiprows=[0,1] verwenden.
So wie oben wird nur die 2. Zeile ausgelassen.

Kahnbein.Kai · Mittwoch 3. April 2019, 11:23

Danke __blackjack__ und ThomasL,
ich habe die Datei im UTF-8 Format abgespeichert, jetzt funktioniert das einlesen.
Auf skiprows=[0,1] wäre ich auch nicht gekommne, Danke auch dafür!

Gruß Kai