layout: article.njk

How to Clip Rasters to Irregular Polygon Boundaries in Python

To clip rasters to irregular polygon boundaries in Python, use rasterio.mask.mask() with crop=True and all_touched=True. This combination reads only the intersecting raster window, generates a precise boolean mask from your vector geometry, and outputs a georeferenced TIFF that exactly follows complex, non-rectangular edges without rectangular padding artifacts. Whether you’re processing multispectral satellite imagery, DEM derivatives, or LiDAR intensity grids, mastering how to clip rasters to irregular polygon boundaries eliminates manual preprocessing bottlenecks and preserves spectral or elevation integrity along jagged feature edges.

Environment & Compatibility Requirements

Component Requirement
Python 3.9–3.12
Core Libraries rasterio>=1.3.0, geopandas>=0.14.0, shapely>=2.0.0
GDAL Backend ≥3.4.0 (required for modern CRS transformations and all_touched rasterization)
Installation Prefer conda-forge on Windows/macOS to bypass C-extension compilation failures. pip install rasterio geopandas works on Linux but often requires libgdal-dev system packages.
CRS Rule Raster and polygon must share identical EPSG codes. Rasterio does not auto-project during masking; mismatched coordinate systems return empty arrays or raise ValueError.

Production-Ready Implementation

The following function handles single or multi-band rasters, validates coordinate systems, and writes a clean output with updated geospatial metadata.

import rasterio
from rasterio.mask import mask
import geopandas as gpd
from pathlib import Path

def clip_raster_to_polygon(raster_path: str, polygon_path: str, output_path: str) -> Path:
    # Load polygon and ensure single unified geometry
    gdf = gpd.read_file(polygon_path)
    if len(gdf) > 1:
        gdf = gdf.dissolve()
    polygon_geom = gdf.geometry.values[0]

    with rasterio.open(raster_path) as src:
        # Strict CRS validation
        if not src.crs.equals(gdf.crs):
            raise ValueError(
                f"CRS mismatch: Raster {src.crs} != Polygon {gdf.crs}. "
                "Reproject vector to raster CRS before clipping."
            )

        # Perform clipping
        out_image, out_transform = mask(
            src,
            [polygon_geom],
            crop=True,
            all_touched=True,
            nodata=src.nodata if src.nodata is not None else 0,
            filled=True
        )

        # Update metadata for output
        out_meta = src.meta.copy()
        out_meta.update({
            "driver": "GTiff",
            "height": out_image.shape[1],
            "width": out_image.shape[2],
            "transform": out_transform,
            "count": out_image.shape[0]
        })

        # Write clipped raster
        with rasterio.open(output_path, "w", **out_meta) as dest:
            dest.write(out_image)

    return Path(output_path)

Parameter Breakdown for Complex Geometries

Understanding rasterio.mask.mask() arguments is critical when working with non-convex or highly detailed boundaries. The official rasterio.mask documentation details the full signature, but these four parameters drive accurate irregular clipping:

  • crop=True: Extracts only the bounding box of the polygon, drastically reducing memory footprint and disk I/O. Without it, the output retains the original raster dimensions with masked-out pixels set to nodata.
  • all_touched=True: Essential for jagged, narrow, or highly detailed polygons. By default, rasterization only marks pixels whose centers fall inside the geometry. Enabling all_touched includes any pixel intersected by the polygon edge, preventing artificial gaps or stair-stepping artifacts along irregular boundaries.
  • filled=True: Ensures the output array is fully populated. Pixels outside the polygon are replaced with the nodata value rather than returning a sparse or masked array. This guarantees compatibility with downstream GIS tools that expect dense NumPy arrays.
  • nodata: Explicitly defines the background fill value. If your source raster lacks a defined nodata, defaulting to 0 or -9999 prevents downstream software from misinterpreting transparent areas as valid data.

Critical Workflow Considerations

Coordinate System Alignment

Rasterio’s masking routine operates in pixel space after projecting geometries to the raster’s CRS. If your vector uses a geographic coordinate system (e.g., EPSG:4326) and the raster uses a projected system (e.g., EPSG:32633), the mask will fail silently or produce distorted outputs. Always reproject vectors using gdf.to_crs(src.crs) before masking. Refer to the GeoPandas projections guide for robust transformation workflows.

Memory Management for Large Rasters

Clipping high-resolution imagery (e.g., 10m Sentinel-2 or sub-meter drone orthomosaics) can exhaust RAM if the bounding box is large. To mitigate this:

  1. Enable crop=True to limit the read window.
  2. Use rasterio.windows.Window if you need to process the raster in tiles before masking.
  3. Set pad=False (default) to avoid allocating extra border pixels that increase array size.

Nodata Preservation & Compression

The output TIFF inherits the source’s data type (dtype) and compression settings. If your source uses DEFLATE or LZW compression, copy src.compression into out_meta to maintain storage efficiency. Always verify the output nodata matches your analysis pipeline’s expectations; mismatched background values corrupt NDVI, slope, or classification calculations.

Integration into Automated Pipelines

When scaling this routine across hundreds of scenes, wrap the function in a multiprocessing pool or Dask array workflow. Pairing this masking logic with broader Satellite Processing Workflows & Index Pipelines ensures consistent CRS handling, metadata propagation, and error logging across distributed compute nodes. For teams standardizing batch operations, integrating this step into Automated Image Clipping and Cropping frameworks reduces manual QA overhead and guarantees reproducible spatial extents.

Validation & Quality Checks

After clipping, verify the output with:

with rasterio.open(output_path) as check:
    print(f"Dimensions: {check.width}x{check.height}")
    print(f"CRS: {check.crs}")
    print(f"Nodata: {check.nodata}")
    assert check.crs.equals(gdf.crs), "CRS drift detected"

This quick sanity check catches silent failures before downstream analytics. For production deployments, log the bounding box coordinates and pixel count to track data volume changes across clipping iterations.