Context
When dealing with dates, I’ve sometimes problems because the source is not clean, or not all the rows have the same format. Additionally, dates can be simple (year-month-day) or really complicate like a timestamp with timezone.
Here are some code snippets for several use cases.
How to read dates with python and pandas
# import pandas
import pandas as pd
# df is a dataframe with a column 'column_with_date' with a date like this '19.01.2023 16:45:46 +01:00'
# convert date string to datetime with pd.to_datetime
pd.to_datetime(df['column_with_date'])
# sometimes the format is a bit weird and pandas cannot recognize it. In this case we give the date format as argument.
# for this format 19.01.2023 16:45:46 +01:00 we can use:
pd.to_datetime(df['column_with_date'], format='%Y-%m-%d %H:%M:%S%z')
# if your string has timezone, use utr=True
pd.to_datetime(df['column_with_date'], format='%Y-%m-%d %H:%M:%S%z', utc=True)
# sometimes your columns are as objects (strings) and numbers (floats, ints) and to_datetime cannot process it.
# You can force the type string to the whole column before giving it to to_datetime.
pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True)
# pd.to_datetime does not modify the column values in place, so you have to assign it to the same column.
df['column_with_date'] = pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True)
# or save it to another (new) column if you want to save the original value
df['column_with_date_2'] = pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True)
How to save dates with UTC (tiemzone) to an Excel-File with Pandas
Saving datetime
columns with timezones in pandas is not supported, and you will get the following error if you try:
ValueError: Excel does not support datetimes with timezones. Please ensure that datetimes are timezone unaware before writing to Excel.
To remove the timezone from a datefield
(column dtype
datetime64[ns, UTC]
) you can use .dt.date()
# remove timezones for excel
df['column_with_date'] = df['column_with_date'].apply(lambda d: pd.to_datetime(d).date())
# save te file as usual
df.to_excel('filename.xlsx')
References
- https://stackoverflow.com/a/63238008/624088
- https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes -> format codes for datetime strings
- try and error :)
Top comments (0)