Ich habe nochmal dazu eine Frage:
Die truncnorm() Function erwartet ja den Mittelwert und die Standardabweichung als Parameter für das Intervall. Wie kann ich das ausrechnen für dasspezielle Intervall in Python?Die Funktion numpy.mean() scheint nicht zu klappen, ich bekomme komische Ergebnisse, die Ergebnisse die truncnorm() zurück gibt, sind zu großen Teilen außerhalb des Intervalls. Das ist besonders bei Intervallen die sehr klein sind, wo vll die richtige Berechnung des Mittelwerts und Standardabweichung sehr wichtig ist.
Für breitere Intervalle klappt es eg ganz gut. Gibt es ein Limit wie klein ein Intervall sein darf?
Beispiel:
[0,12;0,17]--> Value 0,0937818650369 (out of range)*
Ich copy/paste auch mal meinen Forenbeitrag aus Stackoverflow, ich hoffe es ist ok, dass der Beitrag in english ist;
What I want to do is: I have an Intervall, sample me one Value, which is between the boarder of that intervall and simple it in a way of truncated normal distribution. I have an extra column and it should write the value I gain by sampling in another column. For example: Intervall [0.2;0.6] --> sample value 0.343433 I think I found a solution:
truncnorm().stats()
But I don't know why, but for the parameters I give the
truncnorm()
function, almost 50% of the values I gain are outside the boarders. What am I doing wrong?
Here is the code (a short part of the code)
Code: Alles auswählen
convert_cat=(name_convert_column,name_convert_column,_tabelle,name_convert_column,_tabelle,_tabelle,name_convert_column)
drop_view=(name_convert_column)
calculate=(name_convert_column,name_convert_column,name_convert_column,name_convert_column,name_convert_column,_tabelle,name_convert_column,name_convert_column)
cur.execute("CREATE VIEW convert_cat_%s (quotient, %s, rnum) AS SELECT (COUNT(*)/(SELECT COUNT(*) FROM %s ) ) as quotient, %s, row_number() over ( order by (COUNT(*)/(SELECT COUNT(*) FROM %s ) ) desc ) as rnum FROM %s GROUP BY %s ORDER BY quotient desc" %convert_cat)
cur.execute("Select b.ID,a.unten,a.oben, a.mean, a.sd FROM( SELECT t3.RNUM, t3.%s, lag(t3.com_Pr,1,0) OVER (order by rnum asc) as unten , t3.com_PR as oben, ((t3.com_PR +(lag(t3.com_Pr,1,0) OVER (order by rnum asc)))/2) as MEAN, ((t3.com_PR-(lag(t3.com_Pr,1,0) OVER (order by rnum asc)))/6) AS SD FROM( SELECT t1.rnum, t1.%s , SUM(t2.quotient) as com_Pr FROM CONVERT_CAT_%s t1 INNER JOIN CONVERT_CAT_%s t2 ON t1.rnum >= t2.rnum group by t1.rnum, t1.%s, t1.quotient ORDER BY RNUM asc ) t3) a INNER JOIN %s b ON b.%s = a.%s order by ID asc" %calculate)
_content_category = cur.fetchall()
add_category_number_column = (_tabelle, name_convert_column)
cur.execute("ALTER TABLE %s ADD %s_category NUMBER(15,14)" % add_category_number_column)
x=0
for ID in _content_category:
id = _content_category[0]
id_category = [j[0] for j in _content_category]
unten_category = [j[1] for j in _content_category]
oben_category = [j[2] for j in _content_category]
#mean_category = [j[3] for j in _content_category]
sd_category = [j[4] for j in _content_category]
mean, var = truncnorm.stats(unten_category[x], oben_category[x], moments='mv')
# sd = np.sqrt(var)
X = get_truncated_normal(mean= mean, sd=sd_category[x], low=unten_category[x], upp=oben_category[x])
update_cells_value = float(X.rvs(1))
category = (_tabelle, name_convert_column,update_cells_value,id_category[x])
cur.execute("UPDATE %s SET %s_category = %s WHERE ID=%s" % category)
x += 1
I tried to calculate mean and sd in the sql query with
1) ((t3.com_PR +(lag(t3.com_Pr,1,0) OVER (order by rnum asc)))/2) as MEAN
2) ((t3.com_PR-(lag(t3.com_Pr,1,0) OVER (order by rnum asc)))/6) AS SD
and with truncnorm().stats() function. Seems that with the stats function, the result are getting even worse and the values are even more out of range then before...