CSV-Datei in Python plotten

ayguel1234 · Sonntag 26. Juni 2022, 13:28

Hallo zusammen,

ich bin neu und ich brauche dringend Hilfe, da ich morgen schon meine Abgabe habe und die ganze Zeit versucht habe, es selber hinzubekommen.

Also ich habe eine CSV-Datei, die ich gerne in Python als Graphen darstellen möchte.

Mein Code:

import fileNames as fileNames
import pandas as pd
# to work with structured, tabulated data (.csv, .xlsx,..)
import numpy as np
# to work fast with matrices, arrays and much more
from matplotlib import pyplot as plt
# to visualize/plot your data

PATH = 'C:\\Users\\aysebn\\Desktop\\DosePatient.csv\\'
# read in the csv-file
fileNames = [file for file in fileNames if '.csv' in file]

counter = 0

for file in fileNames:
df = pd.read_csv(PATH + file, usecols=[0, 1, 2, 3], sep=',', header=None, skiprows=8)

# read in the csv-file
outputfile_topas = 'DosePatient.csv'

# looking at the original csv-file, we know that the TOPAS simulation details
# are in first few lines (header) of the data sheet. We read in the first 8
# lines using pandas to read the csv-file and setting nrows=7, since Python
# starts counting with 0 instead of 1.
header = pd.read_csv(outputfile_topas, nrows = 7)

# Use the pandas package to read in the CSVfile as DataFrame. Skip all header
# lines, which all begin with #. Now, there is only data in the file,
# therefore header=None.
#df = pd.read_csv(outputfile_topas, comment='#', header=None)
df = pd.read_csv(outputfile_topas, names=["x_Bin", "y_Bin", "z_Bin", "Dosis"], comment='#', header=None)
df.Dosis=df.Dosis/np.max(df.Dosis)
Z=df.z_Bin*0.25+0.125 # in cm
plt.plot(Z, df.Dosis, "r.")
plt.savefig('Vorlage.pdf')

#import pandas as pd # to work with structured, tabulated data (.csv, .xlsx,..)
#import numpy as np # to work fast with matrices, arrays and much more
#from matplotlib import pyplot as plt # to visualize/plot your data

#PATH = 'C:\\Users\\Acer\\Desktop\\BAAusführung\\B2\\'
# read in the csv-file
#fileNames = [file for file in fileNames if '.csv' in file]

#counter = 0

#for file in fileNames:
# df = pd.read_csv(PATH + file, usecols=[0, 1, 2, 3], sep=',', header=None, skiprows=8)

# looking at the original csv-file, we know that the TOPAS simulation details
# are in first few lines (header) of the data sheet. We read in the first 8
# lines using pandas to read the csv-file and setting nrows=7, since Python
# starts counting with 0 instead of 1.
#header = pd.read_csv(outputfile_topas, nrows = 7)

# Use the pandas package to read in the CSVfile as DataFrame. Skip all header
# lines, which all begin with #. Now, there is only data in the file,
# therefore header=None.
#df = pd.read_csv(outputfile_topas, names=["x_Bin", "y_Bin", "z_Bin", "Dosis"], comment='#', header=None)
#df.Dosis=df.Dosis/np.max(df.Dosis)
#Z=df.z_Bin*0.25+0.125 # in cm
#plt.plot(Z, df.Dosis, "r.")
#plt.savefig('Vorlage.pdf')

Ich benutze Python 3.9 und obwohl ich das Package fileNames installiert habe, akzeptiert es es nicht.

Ich bitte um Hilfe.

Liebe Grüße, Gül

Sirius3 · Sonntag 26. Juni 2022, 14:05

Das Modul `fileNames` und die Variable `fileNames` halten sich nicht an die Namenskonvention.
Es ist ein Alarmsignal, wenn das Modul wie eine Variable heißt. Auf Moduleebene sollten nur Konstanten existieren, als FILENAMES.

Dann überschreibst Du das importierte fileNames durch ein eigenes, das ist eine potentielle Fehlerquelle. Da eh keine globalen Variablen existieren sollten, wäre das ganze mit Code in Funktionen eh nicht möglich.
Pfade setzt man nicht mit + zusammen, sondern nutzt pathlib.Path.

Du nennst eine Variable outputfile und liest draus??

Bei Anzahl der Spalten ist es egal, ob Python von 0 oder von 1 zählt. Daher verstehe ich den Kommentar nicht.

Wie ist denn die genaue Fehlermeldung, inklusive Traceback?

ayguel1234 · Sonntag 26. Juni 2022, 14:16

Hallo, erstmal lieben Dank für deine Rückmeldung.

- Die Fehlermeldung heißt:

Traceback (most recent call last):
File "/Users/aysebn/PycharmProjects/Bachelor/CAP02_Beispielcode.py", line 7, in <module>
import fileNames as fileNames
ModuleNotFoundError: No module named 'fileNames'

- Die CSV-Datei beinhaltet nicht "Spalten", sondern sie beinhaltet nur eine Spalte mit Werten wie z.B. 1, 2, 3, 4 in nur einer Spalte.
Die Intention dabei ist, Wert 3 und 4 in diesen Spalten in einen Graphen wiederzugeben.

Liebe Grüße, Gül

__deets__ · Sonntag 26. Juni 2022, 14:29

Na die Meldung ist doch sehr klar: es gibt kein Modul namens fileNames. Wo liegt das, und was ist da drin?

Der import ist obendrein komplizierter als notwendig, diese Umbenennung von fileNames mittels as in ... wieder fileNames kann man sich offensichtlich sparen.

ayguel1234 · Sonntag 26. Juni 2022, 15:22

Ich habe den Code so geschickt bekommen. Wie gesagt, ich habe versucht es selber zu schaffen. Aber es funktioniert nicht und wie du bemerkt hast, ich habe 0 Ahnung davon.
Wie könnte ich denn deiner Meinung die Datei plotten ?
Meine Abgabe ist morgen und ich benötige unbedingt den Plot aus dieser Datei. Könntest Du mir helfen ?

Liebe Grüße, Gül

Sirius3 · Sonntag 26. Juni 2022, 15:32

Was ist dieses `fileNames`? Woher hast Du das? Wo liegt das jetzt`?

ayguel1234 · Sonntag 26. Juni 2022, 15:40

Ehm, am besten von neu:

import pandas as pd # to work with structured, tabulated data (.csv, .xlsx,..)
import numpy as np # to work fast with matrices, arrays and much more
from matplotlib import pyplot as plt # to visualize/plot your data

PATH = 'C:\\Users\\Acer\\Desktop\\BAAusführung\\B2\\'
# read in the csv-file
fileNames = [file for file in fileNames if '.csv' in file]

counter = 0

for file in fileNames:
df = pd.read_csv(PATH + file, usecols=[0, 1, 2, 3], sep=',', header=None, skiprows=8)

# looking at the original csv-file, we know that the TOPAS simulation details
# are in first few lines (header) of the data sheet. We read in the first 8
# lines using pandas to read the csv-file and setting nrows=7, since Python
# starts counting with 0 instead of 1.
header = pd.read_csv(outputfile_topas, nrows = 7)

# Use the pandas package to read in the CSVfile as DataFrame. Skip all header
# lines, which all begin with #. Now, there is only data in the file,
# therefore header=None.
df = pd.read_csv(outputfile_topas, names=["x_Bin", "y_Bin", "z_Bin", "Dosis"], comment='#', header=None)
df.Dosis=df.Dosis/np.max(df.Dosis)
Z=df.z_Bin*0.25+0.125 # in cm
plt.plot(Z, df.Dosis, "r.")
plt.savefig('Vorlage.pdf')

Das ist der offizielle Code. Ich habe den laufen lassen, da hat mir die Fehlermeldung gesagt, fileNames ist nicht definiert und hat gesagt, ich soll das Package runterladen. Dann habe ich dieses Package runtergeladen und an die erste Zeile kam: import fileNames as fileNames
Nun gibt er mir immer noch die Fehlermeldung von:
Traceback (most recent call last):
File "/Users/aysebn/PycharmProjects/Bachelor/CAP02_Beispielcode.py", line 7, in <module>
import fileNames as fileNames
ModuleNotFoundError: No module named 'fileNames'

----------

Lasse ich aber den Code ohne PATH, fileNames laufen, der so aussieht:

import pandas as pd
# to work with structured, tabulated data (.csv, .xlsx,..)
import numpy as np
# to work fast with matrices, arrays and much more
from matplotlib import pyplot as plt
# to visualize/plot your data

#PATH = 'C:\\Users\\aysebn\\Desktop\\DosePatient.csv\\'
# read in the csv-file
#fileNames = [file for file in fileNames if '.csv' in file]

#counter = 0

#for file in fileNames:
# df = pd.read_csv(PATH + file, usecols=[0, 1, 2, 3], sep=',', header=None, skiprows=8)

# read in the csv-file
outputfile_topas = 'DosePatient.csv'

# looking at the original csv-file, we know that the TOPAS simulation details
# are in first few lines (header) of the data sheet. We read in the first 8
# lines using pandas to read the csv-file and setting nrows=7, since Python
# starts counting with 0 instead of 1.
header = pd.read_csv(outputfile_topas, nrows = 7)

# Use the pandas package to read in the CSVfile as DataFrame. Skip all header
# lines, which all begin with #. Now, there is only data in the file,
# therefore header=None.
#df = pd.read_csv(outputfile_topas, comment='#', header=None)
df = pd.read_csv(outputfile_topas, names=["x_Bin", "y_Bin", "z_Bin", "Dosis"], comment='#', header=None)
df.Dosis=df.Dosis/np.max(df.Dosis)
Z=df.z_Bin*0.25+0.125 # in cm
plt.plot(Z, df.Dosis, "r.")
plt.savefig('Vorlage.pdf')

habe ich als Resultat eine leere pdf-Datei.

-----------------

Meine Idee ist quasi, diese csv-Datei als Graphen darzustellen. Also ich habe 4 Werte , und ich möchte den 3. und 4. Wert plotten.
Bei anderen hätte es wohl funktioniert, aber niemand kann mir helfen und sagen, warum es nicht bei mir funktioniert :/.

__deets__ · Sonntag 26. Juni 2022, 15:54

fileNames muss eine Liste sein, in der die zu plottenden Dateinnamen stehen. Das musst du schon selbst beibringen. Niemand anders kann die kennen. Und die Anweisung, da was zu installieren/importieren ist der Verwirrung der IDE geschuldet, aber hier nicht richtig als Loesung.

__deets__ · Sonntag 26. Juni 2022, 16:02

Nachtrag: da die filen-namen ja relativ zu PATH stehen, vermute ich mal, dass das die CSV-Dateien in dem Verzeichnis sind. Die musst du dann einfach in eine Liste filenNames packen:

Code: Alles auswählen

fileNames = ["test.csv", "egal.csv"]

Und das muss an den Beginn deiner Datei.

__blackjack__ · Sonntag 26. Juni 2022, 16:20

@ayguel1234: Also Python hat Dir bei einem nicht definierten Namen ganz sicher nicht ”gesagt” Du sollst irgendein Package runterladen. Und was ist denn ”dieses Package”? Und warum hast Du das heruntergeladen? Hast Du mal eine Sekunde überlegt wofür dieser Name steht? Das sind Dateinamen von CSV-Dateien, die auf Deinem Rechner vorhanden sein müssen, wie soll denn irgendein Fremder wissen wie diese Namen lauten und Dir dann im Internet ein Package dafür zur Verfügung stellen?

Der Test ob ".csv" *im* Dateinamen vorkommt, trifft natürlich zu falls es am Ende steht, aber auch falles nicht am Ende, sondern irgendwo mitten drin vor kommt. Also Beispielsweise auch bei "toller_name.csv.bak" falls jemand eine Backupdatei von so einer Datei angelegt hat. Manche Editoren machen das beispielsweise von Haus aus.

Der Code sollte wie Sirius3 schon bemerkt hat mit `pathlib.Path`-Objekten statt mit Zeichenketten arbeiten. Und je nach dem wie `fileNames` denn nun tatsächlich zustande kommt, ist das Filtern nach Dateiendungen bereits vorher passiert.

Ein einzelnes Element aus `fileNames` ist kein `file` sondern ein `filename`. Von Objekten die `file` heissen, würde man als Leser erarten, dass die so etwas wie `read()` und/oder `write()` und `close()` als Methoden haben. Dateien halt.

`counter` wird definiert, aber nirgends verwendet. Genau so `header`.

Dann werden in einer Schleife alle CSV-Dateien aus der Liste geladen. Aber nur die Daten der letzten ist danach an den Namen `df` gebunden. Die anderen wurden alle umsonst eingelesen. (Es sei denn das diente dazu zu prüfen ob die generell ladbar sind — das sollte man dann aber dazu schreiben, und man müsste die Ergebnisse auch nicht an einen Namen binden.)

Die zuletzt geladenen Daten werden aber auch nirgends verwendet, also kann man sich das auch ganz sparen die alle zu laden.

Und wenn man das tut, hat sich auch die `PATH`-Konstante erledigt.

`outputfile_topas` ist nicht definiert.

Wenn man die Spaltennamen beim einlesen eingibt, sollte man sich entweder an die Python-Namenskonventionen halten, oder nicht als Attribut auf die Spalten zugreifen.

`numpy.max()` ist hier überflüssig weil `Series`-Objekte sinnvollerweise eine `max()`-Methode haben.

Leerzeichen um binäre Operatoren und um Gleichheitszeichen bei Zuweisungen ausserhalb von Argumentlisten, erleichtern das Lesen.

Vom ursprünglichen Code bleibt dann das hier übrig:

Code: Alles auswählen

#!/usr/bin/env python3
import pandas as pd
from matplotlib import pyplot as plt


def main():
    df = pd.read_csv(
        outputfile_topas,
        names=["x_Bin", "y_Bin", "z_Bin", "Dosis"],
        comment="#",
        header=None,
    )
    df["Dosis"] /= df["Dosis"].max()
    z_bin = df["z_Bin"] * 0.25 + 0.125  # in cm
    plt.plot(z_bin, df["Dosis"], "r.")
    plt.savefig("Vorlage.pdf")


if __name__ == "__main__":
    main()