Data Engineering for Spatial Systems
Databases, pipelines, and platforms for earth and place data at scale
0.1 What this book is for
Spatial data does not sit still. Satellites produce imagery every day. Sensors stream readings every second. Weather models output grid fields every hour. Climate archives span decades. Urban sensor networks pulse continuously. The data exists. The question is whether you can move it, store it, query it, transform it, and serve it — reliably, at scale, under real conditions.
This is data engineering. And spatial data engineering has specific requirements that generic data engineering books don’t address: coordinate reference systems, spatial indexing, tiled raster formats, OGC standards, geographic query planning, and the particular challenges of time-series data that is also spatially structured.
This book is built for engineers who need to solve these problems in practice.
0.2 Who this book is for
Students in the Earth System Sciences and Data Engineeringand anyone building production pipelines. It assumes you know how to write Python and SQL. It also assumes you understand spatial data concepts — coordinate reference systems, vector and raster formats, map projections. If those are unfamiliar, Computational Geography Part 0 and Part 3 cover them.
0.3 How this book is structured
Eight chapters. Each chapter ends with a “build this” section — a minimal, running implementation of the chapter’s central concept. The eight components together form a small but real spatial data platform. Build all eight and you have built something.
| Chapter | What you build |
|---|---|
| Ch 1: Architecture | A data catalogue for a multi-source spatial project |
| Ch 2: Spatial Databases | A PostGIS database with spatial queries and indexes |
| Ch 3: Pipelines and ETL | A transformation pipeline from raw GeoJSON to normalised tables |
| Ch 4: Cloud Infrastructure | A cloud-native GeoTIFF served from object storage with STAC metadata |
| Ch 5: Streaming | A sensor data ingestion pipeline with a 5-minute aggregation |
| Ch 6: APIs | A FastAPI service exposing a spatial query as an OGC-compatible endpoint |
| Ch 7: Data Quality | A dbt project with spatial integrity tests |
| Ch 8: ML Platform | A feature store and model serving endpoint for a spatial ML model |
0.4 What this book does not cover
GIS analysis. Spatial analysis methods — viewsheds, cost-distance, spatial joins for analytical purposes — are covered in Computational Geography.
Machine learning theory. The models that the platforms in Ch 8 serve are designed and trained using the frameworks in Mathematics for Data Science and AI.
Vendor-specific cloud services. The book uses open-source tooling throughout. Where AWS, GCP, or Azure are mentioned, open-source equivalents are always shown.