graph TD A[Week 1: Data Foundations] --> B[Week 2: Attention & Blocks] B --> C[Week 3: Full Encoder] C --> D[Week 4: MAE Pretraining] D --> E[Week 5: Training Loop] E --> F[Week 6: Eval & Viz] F --> G[Week 7+: Interop & Scale]
Week 0 Overview
Welcome to GEOG 288KC: Geospatial Foundation Models and Applications! This week focuses on defining our course goals, the architecture of the class and the code we will be working with, and getting everyone set up with a consistent computational environment.
Key Activities
Basics of Foundation Models and LLM/GFM Comparisons
GFM Architecture Cheatsheet
This is a quick-reference for how our Geospatial Foundation Model (GFM) is organized, what each part does, and where to look during the course. We implement simple, readable components first, then show how to swap in optimized library counterparts as needed.
Roadmap at a glance
Minimal structure you’ll use
geogfm/
core/
config.py # Minimal typed configs for model, data, training
data/
loaders.py # build_dataloader(...)
datasets/
stac_dataset.py # Simple STAC-backed dataset
transforms/
normalization.py # Per-channel normalization
patchify.py # Extract fixed-size patches
modules/
attention/
multihead_attention.py # Standard MHA (from scratch)
embeddings/
patch_embedding.py # Conv patch embedding
positional_encoding.py # Simple positional encoding
blocks/
transformer_block.py # PreNorm block (MHA + MLP)
heads/
reconstruction_head.py # Lightweight decoder/readout
losses/
mae_loss.py # Masked reconstruction loss
models/
gfm_vit.py # GeoViT-style encoder
gfm_mae.py # MAE wrapper (masking + encoder + head)
training/
optimizer.py # AdamW builder
loop.py # fit/train_step/eval_step with basic checkpointing
evaluation/
visualization.py # Visualize inputs vs reconstructions
# Outside the package (repo root)
configs/ # Small YAML/JSON run configs
tests/ # Unit tests
data/ # Datasets, splits, stats, build scripts
What each part does (one-liners)
- core/config.py: Typed configs for model/data/training; keeps parameters organized.
- data/datasets/stac_dataset.py: Reads imagery + metadata (e.g., STAC), returns tensors.
- data/transforms/normalization.py: Normalizes channels using precomputed stats.
- data/transforms/patchify.py: Turns large images into uniform patches for ViT.
- data/loaders.py: Builds PyTorch DataLoaders for train/val.
- modules/embeddings/patch_embedding.py: Projects image patches into token vectors.
- modules/embeddings/positional_encoding.py: Adds position info to tokens.
- modules/attention/multihead_attention.py: Lets tokens attend to each other.
- modules/blocks/transformer_block.py: Core transformer layer (attention + MLP).
- modules/heads/reconstruction_head.py: Reconstructs pixels from encoded tokens.
- modules/losses/mae_loss.py: Computes masked reconstruction loss for MAE.
- models/gfm_vit.py: Assembles the encoder backbone from blocks.
- models/gfm_mae.py: Wraps encoder with masking + reconstruction for pretraining.
- training/optimizer.py: Creates AdamW with common defaults.
- training/loop.py: Runs epochs, backprop, validation, and simple checkpoints.
- evaluation/visualization.py: Plots sample inputs and reconstructions.
From-scratch vs library-backed
- Use PyTorch for Dataset/DataLoader, AdamW, schedulers, AMP, and checkpointing.
- Build core blocks from scratch first: PatchEmbedding, MHA, TransformerBlock, MAE loss/head.
- Later, swap in optimized options:
torch.nn.MultiheadAttention
, timm ViT blocks, FlashAttention, TorchGeo datasets, torchmetrics.
Quick start (conceptual)
from geogfm.core.config import ModelConfig, DataConfig, TrainConfig
from geogfm.models.gfm_vit import GeoViTBackbone
from geogfm.models.gfm_mae import MaskedAutoencoder
from geogfm.data.loaders import build_dataloader
from geogfm.training.loop import fit
= ModelConfig(architecture="gfm_vit", embed_dim=768, depth=12, image_size=224)
model_cfg = DataConfig(dataset="stac", patch_size=16, num_workers=8)
data_cfg = TrainConfig(epochs=1, batch_size=8, optimizer={"name": "adamw", "lr": 2e-4})
train_cfg
= GeoViTBackbone(model_cfg)
encoder = MaskedAutoencoder(model_cfg, encoder)
model = build_dataloader(data_cfg)
train_dl, val_dl fit(model, (train_dl, val_dl), train_cfg)
What to notice: - The encoder and MAE wrapper are separate, so we can reuse the encoder for other tasks later. - Data transforms (normalize/patchify) are decoupled from the model and driven by config.
MVP vs later phases
- MVP (Weeks 1–6): files shown above; single-node training; basic logging and visualization.
- Later (Weeks 7–10): interop with existing models (e.g., Prithvi), task heads (classification/segmentation), inference tiling, Hub integration.
Comprehensive course roadmap (stages, files, libraries)
This table maps each week to the broader stages (see the course index), the key geogfm
files you’ll touch, and the primary deep learning tools you’ll rely on.
Week | Stage | Focus | You will build (geogfm) | Library tools | Outcome |
---|---|---|---|---|---|
1 | Stage 1: Build GFM Architecture | Data Foundations | core/config.py ; data/datasets/stac_dataset.py ; data/transforms/{normalization.py, patchify.py} ; data/loaders.py |
torch.utils.data.Dataset /DataLoader , rasterio , numpy |
Config-driven dataloaders that yield normalized patches |
2 | Stage 1 | Attention & Blocks | modules/embeddings/{patch_embedding.py, positional_encoding.py} ; modules/attention/multihead_attention.py ; modules/blocks/transformer_block.py |
torch.nn (compare with torch.nn.MultiheadAttention ) |
Blocks run forward with stable shapes; unit tests green |
3 | Stage 1 | Complete Architecture | models/gfm_vit.py ; modules/heads/reconstruction_head.py |
torch.nn (timm as reference) |
Encoder assembled; end-to-end forward on dummy input |
4 | Stage 2: Train Foundation Model | MAE Pretraining | models/gfm_mae.py ; modules/losses/mae_loss.py |
torch masking utilities, numpy |
Masking + reconstruction; loss decreases on toy batch |
5 | Stage 2 | Training Optimization | training/optimizer.py ; training/loop.py |
torch.optim.AdamW ; schedulers, AMP optional |
Single-epoch run; basic checkpoint save/restore |
6 | Stage 2 | Evaluation & Analysis | evaluation/visualization.py ; (optional) evaluation/metrics.py |
matplotlib ; torchmetrics optional |
Recon visuals; track validation loss/PSNR |
7 | Stage 2 | Integration w/ Pretrained | (light) core/registry.py ; interoperability/huggingface.py stubs |
huggingface_hub , transformers optional |
Show mapping to Prithvi structure; plan switch |
8 | Stage 3: Apply & Deploy | Task Fine-tuning | tasks/{classification.py|segmentation.py} (light heads) |
torch.nn.CrossEntropyLoss ; timm optional |
Head swap on frozen encoder; small dataset demo |
9 | Stage 3 | Deployment & Inference | inference/{tiling.py, sliding_window.py} |
numpy , rasterio windows |
Sliding-window inference on a small scene |
10 | Stage 3 | Presentations & Synthesis | Project deliverables (no new geogfm code required) |
— | Present MVP builds, analysis, transition plan |
Next Week Preview
Week 1 will start building our model, beginning with fundamental data loaders and transformers needed to use geospatial data in deep learning.