Pandas Gruppe, weiter aufschlüsseln bis eindeutige Gruppen (df)

chris_adnap · Freitag 23. September 2022, 09:53

Hallo zusammen,

an dieser Stelle komme ich nun wirklich gerade nicht weiter... (wär hätte das gedacht

)

Ich habe mit Pandas einzelne Gruppen erstellt, soweit so gut.
Es gibt aber die Spalte "count" in welchem mehrere gleichen Werte vorkommen können. Diese in Kombination mit der Spalte "line" sind aber einzigartige Werte.

Code: Alles auswählen

 
Dies als bsp. eine aktuelle Referenz Gruppe ...

group	count	calc	line
1	1	2,24155	9720	xxx	xxx
1	1	2,5508	9721	xxx	xxx
1	1	0,0903	9700	xxx	xxx
1	2	1,3351	9720	xxx	xxx
1	3	4,47845	9720	xxx	xxx


ich möchte nun diese Gruppe nochmals aufschlüsseln, so das ich wirklich einzigartige Gruppen habe

1	1	2,24155	9720	xxx	xxx
1	2	1,3351	9720	xxx	xxx
1	3	4,47845	9720	xxx	xxx

and...
1	1	2,5508	9721	xxx	xxx
1	2	1,3351	9720	xxx	xxx
1	3	4,47845	9720	xxx	xxx

and...

1	1	0,0903	9700	xxx	xxx
1	2	1,3351	9720	xxx	xxx
1	3	4,47845	9720	xxx	xxx

--------------------------------------------------

Was natürlich auch vorkommen kann, mehrere "count" in einer Gruppe...

group	count	calc	line
1	1	2,24155	9720	xxx	xxx
1	1	2,5508	9721	xxx	xxx
1	1	0,0903	9700	xxx	xxx
1	2	1,3351	9720	xxx	xxx
1	3	4,47845	9720	xxx	xxx
1	3	1,11111	9720	xxx	xxx
1	3	2,22222	9721	xxx	xxx

Hier wären es dann schon 9 Kombinations Möglichkeiten... etc.

Ich habe vieles gelesen und auch Probiert ...
expand, join, itertools.product, select, unique, groupby...
... nur nichts hat mich zur Lösung geführt.

Ich muss "nur" -count & line- als Einzigartig betrachten und "irgendwie" daraus, mit den schon vorhandenen einzigartigen Werten, eine neue Gruppe/df erstellen/befüllen.

Hoffe mir kann jemand, wenigstens eine mögliche Richtung, aufzeigen.

Viele Grüße
Chris

Sirius3 · Freitag 23. September 2022, 10:43

itertools.product ist schon richtig:

Code: Alles auswählen

In [38]: df
Out[38]:
   group  count     calc  line
0      1      1  2,24155  9720
1      1      1   2,5508  9721
2      1      1   0,0903  9700
3      1      2   1,3351  9720
4      1      3  4,47845  9720

In [39]: list(itertools.product(*(group.itertuples() for _, group in df.groupby('count'))))
Out[39]:
[(Pandas(Index=0, group=1, count=1, calc='2,24155', line=9720),
  Pandas(Index=3, group=1, count=2, calc='1,3351', line=9720),
  Pandas(Index=4, group=1, count=3, calc='4,47845', line=9720)),
 (Pandas(Index=1, group=1, count=1, calc='2,5508', line=9721),
  Pandas(Index=3, group=1, count=2, calc='1,3351', line=9720),
  Pandas(Index=4, group=1, count=3, calc='4,47845', line=9720)),
 (Pandas(Index=2, group=1, count=1, calc='0,0903', line=9700),
  Pandas(Index=3, group=1, count=2, calc='1,3351', line=9720),
  Pandas(Index=4, group=1, count=3, calc='4,47845', line=9720))]

Oder das ganze über Indices:

Code: Alles auswählen

In [62]: for indices in itertools.product(*df.groupby('count').groups.values()):
    ...:     print(df.iloc[list(indices)])
    ...:
   group  count     calc  line
0      1      1  2,24155  9720
3      1      2   1,3351  9720
4      1      3  4,47845  9720
   group  count     calc  line
1      1      1   2,5508  9721
3      1      2   1,3351  9720
4      1      3  4,47845  9720
   group  count     calc  line
2      1      1   0,0903  9700
3      1      2   1,3351  9720
4      1      3  4,47845  9720

chris_adnap · Freitag 23. September 2022, 11:17

Vielen vielen Dank.

Gerade eingebaut und durchlaufen lassen, sieht super aus.

Viele Grüße
Chris