Ausgabe bricht nicht ab

Brando · Donnerstag 15. Juli 2021, 14:54

Hallo, ich habe in einer csv Datei eine Spalte isoliert, und dann unter den vier Werten den Anteil der Werte, 1 2 3 4 ermittelt. Aber mein Code belässt es nicht bei einer Ausgabe, sondern gibt immer wieder das print dictionary aus! Mein Code ist dieser:

Code: Alles auswählen

import pandas as pd
def proportion_of_education():
    df = pd.read_csv("assets/NISPUF17.csv", sep=",", usecols=["EDUC1"])
    # print (df)
    gesamt = df.apply( lambda x : True if x[0] != 0 else False, axis = 1)
    gesamt_rows = len(gesamt[gesamt == True].index)
    # print (gesamt_rows)
    counterFunc = df.apply(
    lambda x: True if x[0] == 1 else False , axis=1)
    numOfRows_less = len(counterFunc[counterFunc == True].index)
    counterFunc = df.apply(
    lambda x: True if x[0] == 2 else False , axis=1)
    numOfRows_high = len(counterFunc[counterFunc == True].index)
    counterFunc = df.apply(
    lambda x: True if x[0] == 3 else False , axis=1)
    numOfRows_more = len(counterFunc[counterFunc == True].index)
    counterFunc = df.apply(
    lambda x: True if x[0] == 4 else False , axis=1)
    numOfRows_college = len(counterFunc[counterFunc == True].index)
    dict = {"less than high school": numOfRows_less/gesamt_rows, "high school": numOfRows_high/gesamt_rows, 
            "more than high school but not college": numOfRows_high/gesamt_rows, "college": numOfRows_college/gesamt_rows}
    print (dict)
    # your code goes here
    # YOUR CODE HERE
    return dict
    raise NotImplementedError()

Es wiederholt sich die Ausgabe:

{'less than high school': 0.10202002459160373, 'high school': 0.172352011241876, 'more than high school but not college': 0.172352011241876, 'college': 0.47974705779026877}
{'less than high school': 0.10202002459160373, 'high school': 0.172352011241876, 'more than high school but not college': 0.172352011241876, 'college': 0.47974705779026877}
{'less than high school': 0.10202002459160373, 'high school': 0.172352011241876, 'more than high school but not college': 0.172352011241876, 'college': 0.47974705779026877}
{'less than high school': 0.10202002459160373, 'high school': 0.172352011241876, 'more than high school but not college': 0.172352011241876, 'college': 0.47974705779026877}
{'less than high school': 0.10202002459160373, 'high school': 0.172352011241876, 'more than high school but not college': 0.172352011241876, 'college': 0.47974705779026877}
{'less than high school': 0.10202002459160373, 'high school': 0.172352011241876, 'more than high school but not college': 0.172352011241876, 'college': 0.47974705779026877}
usw

Warum?

sparrow · Donnerstag 15. Juli 2021, 15:03

Weil du die Funktion irgendwo in einer Schleife aufrufst?

Brando · Donnerstag 15. Juli 2021, 15:12

Ich rufe sie nicht in einer Schleife auf. Der Code steht in einer Zelle des Jupyter Notebook. Daneben gibt es keinen Code. Die Zelle wird durch asserts geprüft:

assert type(proportion_of_education())==type({}), "You must return a dictionary."
assert len(proportion_of_education()) == 4, "You have not returned a dictionary with four items in it."
assert "less than high school" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."
assert "high school" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."
assert "more than high school but not college" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."
assert "college" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."

Brando · Donnerstag 15. Juli 2021, 15:15

Ich sehe gerade, dass die Anzahl der Ausgaben gerade der Anzahl der asserts entspricht! Funktioniert das so?

einfachTobi · Donnerstag 15. Juli 2021, 16:31

Du rufst in jeder Zeile proportion_of_education auf und da steckt ein print(dict) drin. Also wird das auch jeden mal ausgegeben. Deine Zählfunktion ist auch insgesamt recht umständlich. Versuche lieber die Pandas-Funktionen zu verwenden, statt selbst irgendwas zu bauen. Da ich den Aufbau deiner Daten nicht kenne und anhand des Codes auch nicht so ganz durchschaue, tippe ich mal auf sowas:

Code: Alles auswählen

df = pd.read_csv("assets/NISPUF17.csv", sep=",", usecols=["EDUC1"])
gesamt = df[df.EDUC1 != 0].count()
# usw.

Sirius3 · Donnerstag 15. Juli 2021, 16:36

Bei jedem `assert` wird die Funktion proportion_of_education aufgerufen und bei jedem Funktionsaufruf kommt ein print-Aufruf vor.

Durch die seltsamen Zeilenumbrüche ohne Einrückungen ist der Code quasi nicht lesbar. Das `raise NotImplemented` wird nie erreicht.
`dict` ist der Name einer eingebauten Klasse und sollte nicht überschrieben werden.
Variablennamen werden komplett klein geschrieben. counter_func ist gar keine Funktion!
Warum Filterst Du den Dataframe, nimmst dann den Index nur um die Anzahl an Treffern zu ermitteln?

Ein

Code: Alles auswählen

True if x[0] != 0 else False

läßt sich vereinfachen zu

Code: Alles auswählen

x[0] != 0

Ein apply ist hier die falsche Operation:

Code: Alles auswählen

gesamt = df.EDUC1 != 0

Um die Anzahl zu bekommen, summiert man einfach über die boolschen Werte.
Bleibt also

Code: Alles auswählen

def proportion_of_education():
    df = pd.read_csv("assets/NISPUF17.csv", sep=",", usecols=["EDUC1"])
    gesamt_rows = (df.EDUC1 != 0).sum()
    number_of_rows_less = (df.EDUC1 == 1).sum()
    number_of_rows_high = (df.EDUC1 == 2).sum()
    number_of_rows_more = (df.EDUC1 == 3).sum()
    number_of_rows_college = (df.EDUC1 == 4).sum()
    result = {
        "less than high school": number_of_rows_less/gesamt_rows,
        "high school": number_of_rows_high/gesamt_rows, 
        "more than high school but not college": number_of_rows_more/gesamt_rows,
        "college": number_of_rows_college/gesamt_rows
    }
    print(result)
    return result

Statt jedes mal die Funktion aufzurufen, macht man das nur einmal! Die keys()-Aufrufe sind überflüssig.

Code: Alles auswählen

result = proportion_of_education()
assert type(result)==type({}), "You must return a dictionary."
assert len(result) == 4, "You have not returned a dictionary with four items in it."
assert "less than high school" in result, "You have not returned a dictionary with the correct keys."
assert "high school" in result, "You have not returned a dictionary with the correct keys."
assert "more than high school but not college" in result, "You have not returned a dictionary with the correct keys."
assert "college" in result, "You have not returned a dictionary with the correct keys."

__blackjack__ · Donnerstag 15. Juli 2021, 21:53

Wobei man an den ``assert``\s noch ein bisschen verbessern kann. Für Typprüfung verwendet man, wenn überhaupt, besser `isinstance()`, und die ganzen restlichen Zusicherungen lassen sich sehr leicht in *einer einzigen* Bedingung ausdrücken:

Code: Alles auswählen

    result = proportion_of_education()
    assert isinstance(result, dict)
    assert result.keys() == {
        "less than high school",
        "high school",
        "more than high school but not college",
        "college",
    }

Die Zeichenkette mit der Erklärung habe ich weggelassen weil die Tests/Bedingungen hier IMHO selbsterklärend genug sind.