Zeitraum einer Datendatei mit Pandas einschränken
Verfasst: Mittwoch 11. Mai 2022, 10:53
Hallo,
ich habe Dateien mit dem Zeilenformat:
...
2022-05-07 18:30:01 54.936426
...
Zwischen Zeit und Uhrzeit ist ein Leerzeichn und zwischen dem Wert ein TAB.
Korrekt einlesen kann ich die Daten offensichtlich.
print(dp.info()) gibt:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1079 entries, 0 to 1078
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 zeit 1079 non-null datetime64[ns]
1 gewicht 1079 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 17.0 KB
None
----------------------------------------------------------------------------------------------------------------------------------------------------------
aus. Nur beim Versuch den Zeitbereich einzuschränken mit:
dat.loc[(pd.Timestamp(dat['zeit']) > start_date) & (pd.Timestamp(dat['zeit']) < end_date)]
tritt ein TypeError s.u. auf.
Wie geht das?
----------------------------------------------------------------------------------------------------------------------------------------------------------
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
custom_date_parser = lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S")
dateCols = ['zeit']
dat = pd.read_csv("c:\\tmp\\data.dat", sep='\t', parse_dates=dateCols)
print(dat.info())
start_date = pd.to_datetime('6/5/2022 9:32')
end_date = pd.to_datetime('8/5/2018 9:32')
dat.loc[(pd.Timestamp(dat['zeit']) > start_date) & (pd.Timestamp(dat['zeit']) < end_date)]
----------------------------------------------------------------------------------------------------------------------------------------------------------
Ausgabe:
----------------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Users\hwolf\PycharmProjects\datetime\main.py", line 13, in <module>
dat.loc[(pd.Timestamp(dat['zeit']) > start_date) & (pd.Timestamp(dat['zeit']) < end_date)]
File "pandas\_libs\tslibs\timestamps.pyx", line 1399, in pandas._libs.tslibs.timestamps.Timestamp.__new__
File "pandas\_libs\tslibs\conversion.pyx", line 446, in pandas._libs.tslibs.conversion.convert_to_tsobject
TypeError: Cannot convert input [0 2022-05-07 17:50:00
1 2022-05-07 17:55:00
2 2022-05-07 18:00:01
3 2022-05-07 18:05:01
4 2022-05-07 18:10:01
...
1074 2022-05-11 11:20:01
1075 2022-05-11 11:25:01
1076 2022-05-11 11:30:01
1077 2022-05-11 11:35:01
1078 2022-05-11 11:40:01
Name: zeit, Length: 1079, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp
ich habe Dateien mit dem Zeilenformat:
...
2022-05-07 18:30:01 54.936426
...
Zwischen Zeit und Uhrzeit ist ein Leerzeichn und zwischen dem Wert ein TAB.
Korrekt einlesen kann ich die Daten offensichtlich.
print(dp.info()) gibt:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1079 entries, 0 to 1078
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 zeit 1079 non-null datetime64[ns]
1 gewicht 1079 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 17.0 KB
None
----------------------------------------------------------------------------------------------------------------------------------------------------------
aus. Nur beim Versuch den Zeitbereich einzuschränken mit:
dat.loc[(pd.Timestamp(dat['zeit']) > start_date) & (pd.Timestamp(dat['zeit']) < end_date)]
tritt ein TypeError s.u. auf.
Wie geht das?
----------------------------------------------------------------------------------------------------------------------------------------------------------
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
custom_date_parser = lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S")
dateCols = ['zeit']
dat = pd.read_csv("c:\\tmp\\data.dat", sep='\t', parse_dates=dateCols)
print(dat.info())
start_date = pd.to_datetime('6/5/2022 9:32')
end_date = pd.to_datetime('8/5/2018 9:32')
dat.loc[(pd.Timestamp(dat['zeit']) > start_date) & (pd.Timestamp(dat['zeit']) < end_date)]
----------------------------------------------------------------------------------------------------------------------------------------------------------
Ausgabe:
----------------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Users\hwolf\PycharmProjects\datetime\main.py", line 13, in <module>
dat.loc[(pd.Timestamp(dat['zeit']) > start_date) & (pd.Timestamp(dat['zeit']) < end_date)]
File "pandas\_libs\tslibs\timestamps.pyx", line 1399, in pandas._libs.tslibs.timestamps.Timestamp.__new__
File "pandas\_libs\tslibs\conversion.pyx", line 446, in pandas._libs.tslibs.conversion.convert_to_tsobject
TypeError: Cannot convert input [0 2022-05-07 17:50:00
1 2022-05-07 17:55:00
2 2022-05-07 18:00:01
3 2022-05-07 18:05:01
4 2022-05-07 18:10:01
...
1074 2022-05-11 11:20:01
1075 2022-05-11 11:25:01
1076 2022-05-11 11:30:01
1077 2022-05-11 11:35:01
1078 2022-05-11 11:40:01
Name: zeit, Length: 1079, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp