EDS 217 Cheatsheet

Introduction

This cheatsheet covers essential datetime and time series operations in pandas needed for environmental data analysis.

Setup

Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducible examples
np.random.seed(42)

# Create sample time series data
dates = pd.date_range('2020-01-01', '2023-12-31', freq='ME')
temperature = np.random.normal(15, 5, len(dates)) + 10 * np.sin(np.arange(len(dates)) * 2 * np.pi / 12)
sample_df = pd.DataFrame({
    'date': dates,
    'temperature': temperature,
    'co2': 400 + np.cumsum(np.random.normal(0.1, 0.5, len(dates)))
})

Loading Time Series Data

Reading CSV with Date Parsing

Code

# Parse dates while reading CSV (most efficient method)
# df = pd.read_csv('data.csv', parse_dates=['Date'])

# For demonstration, save and reload sample data
sample_df.to_csv('temp_data.csv', index=False)
df = pd.read_csv('temp_data.csv', parse_dates=['date'])

print(f"Date column dtype: {df['date'].dtype}")
print(f"Data shape: {df.shape}")

Date column dtype: datetime64[ns]
Data shape: (48, 3)

Converting Strings to Datetime

Code

# Convert year-only data to datetime (common in environmental datasets)
year_data = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'annual_temp': [14.5, 15.2, 14.8, 15.6]
})

year_data['date'] = pd.to_datetime(year_data['year'], format='%Y')
print(year_data.head())

   year  annual_temp       date
0  2020         14.5 2020-01-01
1  2021         15.2 2021-01-01
2  2022         14.8 2022-01-01
3  2023         15.6 2023-01-01

Setting DateTime Index

Code

# Set date column as index for time series operations
df_indexed = df.set_index('date')
print(f"Index is datetime: {pd.api.types.is_datetime64_any_dtype(df_indexed.index)}")
print(df_indexed.head())

Index is datetime: True
            temperature         co2
date                               
2020-01-31    17.483571  400.271809
2020-02-29    19.308678  399.490289
2020-03-31    26.898697  399.752331
2020-04-30    32.615149  399.659790
2020-05-31    22.489487  399.421329

Extracting Date Components

Code

# Extract date components using .dt accessor
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['quarter'] = df['date'].dt.quarter

# Create decade grouping (useful for climate analysis)
df['decade'] = (df['date'].dt.year // 10) * 10

print(df[['date', 'year', 'month', 'decade']].head())

        date  year  month  decade
0 2020-01-31  2020      1    2020
1 2020-02-29  2020      2    2020
2 2020-03-31  2020      3    2020
3 2020-04-30  2020      4    2020
4 2020-05-31  2020      5    2020

Creating Seasonal Groupings

Code

# Define seasonal groupings as constants
SEASON_MAP = {
    12: 'Winter', 1: 'Winter', 2: 'Winter',
    3: 'Spring', 4: 'Spring', 5: 'Spring', 
    6: 'Summer', 7: 'Summer', 8: 'Summer',
    9: 'Fall', 10: 'Fall', 11: 'Fall'
}

def get_season(date):
    """Assign season based on month (Northern Hemisphere)."""
    return SEASON_MAP[date.month]

# Apply season function
df['season'] = df['date'].apply(get_season)

print(df[['date', 'season']].head())

        date  season
0 2020-01-31  Winter
1 2020-02-29  Winter
2 2020-03-31  Spring
3 2020-04-30  Spring
4 2020-05-31  Spring

Time Series Resampling

Basic Resampling

Code

# Resample monthly data to annual averages
annual_avg = df_indexed.resample('YE').mean(numeric_only=True)
print("Annual averages:")
print(annual_avg.head())

# Resample to quarterly data
quarterly_data = df_indexed.resample('QE').mean(numeric_only=True)
print("Quarterly averages:")
print(quarterly_data.head())

Annual averages:
            temperature         co2
date                               
2020-12-31    16.479777  400.298642
2021-12-31    12.043841  401.834578
2022-12-31    14.033324  403.788055
2023-12-31    13.337112  404.590108
Quarterly averages:
            temperature         co2
date                               
2020-03-31    21.230315  399.838143
2020-06-30    24.644651  399.636095
2020-09-30    13.575204  400.713224
2020-12-31     6.468936  401.007104
2021-03-31    13.893025  401.201634

Handling Mixed Data Types

When your DataFrame contains both numeric and non-numeric columns, use numeric_only=True with aggregation functions to prevent errors.

Common Frequency Codes

Code	Description	Example
D	Daily	2023-01-01, 2023-01-02, …
ME	Month End	2023-01-31, 2023-02-28, …
QE	Quarter End	2023-03-31, 2023-06-30, …
YE	Year End	2023-12-31, 2024-12-31, …

Time Series Calculations

Year-over-Year Changes

Code

# Calculate year-over-year changes
annual_data = df_indexed.resample('YE').mean(numeric_only=True)
annual_data['temp_change'] = annual_data['temperature'].diff()
annual_data['co2_change'] = annual_data['co2'].diff()

print("Year-over-year changes:")
print(annual_data[['temperature', 'temp_change', 'co2', 'co2_change']])

Year-over-year changes:
            temperature  temp_change         co2  co2_change
date                                                        
2020-12-31    16.479777          NaN  400.298642         NaN
2021-12-31    12.043841    -4.435936  401.834578    1.535936
2022-12-31    14.033324     1.989483  403.788055    1.953477
2023-12-31    13.337112    -0.696212  404.590108    0.802053

Rolling Averages

Code

# Calculate 12-month rolling average
df_indexed['temp_rolling'] = df_indexed['temperature'].rolling(window=12).mean()

print("Original vs rolling average:")
print(df_indexed[['temperature', 'temp_rolling']].head(15))

Original vs rolling average:
            temperature  temp_rolling
date                                 
2020-01-31    17.483571           NaN
2020-02-29    19.308678           NaN
2020-03-31    26.898697           NaN
2020-04-30    32.615149           NaN
2020-05-31    22.489487           NaN
2020-06-30    18.829315           NaN
2020-07-31    22.896064           NaN
2020-08-31    13.837174           NaN
2020-09-30     3.992374           NaN
2020-10-31     7.712800           NaN
2020-11-30     4.022657           NaN
2020-12-31     7.671351     16.479777
2021-01-31    16.209811     16.373630
2021-02-28    10.433599     15.634040
2021-03-31    15.035665     14.645454

Time Series Filtering

Selecting Date Ranges

Code

# Select data from specific year
data_2022 = df_indexed[df_indexed.index.year == 2022]
print(f"Data from 2022: {len(data_2022)} records")

# Select data from date range
recent_data = df_indexed.loc['2023-01-01':'2023-06-30']
print(f"First half 2023: {len(recent_data)} records")

# Filter using datetime conditions
modern_data = df[df['date'] >= pd.to_datetime('2022-01-01')]
print(f"Modern data (2022+): {len(modern_data)} records")

Data from 2022: 12 records
First half 2023: 6 records
Modern data (2022+): 24 records

Filtering by Date Components

Code

# Filter by month (e.g., all January data)
january_data = df[df['date'].dt.month == 1]

# Filter by season
winter_data = df[df['season'] == 'Winter']

# Filter by decade
recent_decade = df[df['decade'] == 2020]

print(f"January records: {len(january_data)}")
print(f"Winter records: {len(winter_data)}")
print(f"2020s decade: {len(recent_decade)}")

January records: 4
Winter records: 12
2020s decade: 48

Merging Time Series Data

Code

# Create second dataset with different frequency
daily_dates = pd.date_range('2022-01-01', '2022-12-31', freq='D')
np.random.seed(123)
daily_data = pd.DataFrame({
    'precipitation': np.random.exponential(2, len(daily_dates))
}, index=daily_dates)

# Resample to monthly to match original data frequency
monthly_precip = daily_data.resample('ME').sum()

# Merge with existing data
merged_data = pd.merge(df_indexed, monthly_precip, 
                      left_index=True, right_index=True, how='inner')

print("Merged time series data:")
print(merged_data.head())

Merged time series data:
            temperature         co2  temp_rolling  precipitation
2022-01-31    12.278086  403.644014     11.716197      51.491886
2022-02-28    20.554613  404.526336     12.559615      56.312308
2022-03-31    17.905286  403.316464     12.798750      51.965009
2022-04-30    26.878490  403.827415     13.189577      61.195394
2022-05-31    20.657061  403.970938     13.361324      69.480568

Key Reminders

Always parse dates when loading data using parse_dates parameter
Set datetime as index for time series operations with set_index()
Use .dt accessor to extract date components from datetime columns
Use numeric_only=True when resampling mixed data types
Use consistent frequency codes for resampling operations