layout: article.njk
Using pystac-client to filter Sentinel-2 imagery by date
To filter Sentinel-2 imagery by date using pystac-client, instantiate a Client object pointing to a public STAC API endpoint, pass a datetime parameter formatted as an RFC 3339 interval (YYYY-MM-DD/YYYY-MM-DD), and specify the target collection ID (sentinel-2-l2a for surface reflectance or sentinel-2-l1c for top-of-atmosphere). The library automatically handles API pagination, returns an ItemSearch iterator, and yields Item objects ready for metadata inspection or direct ingestion into raster I/O pipelines.
This workflow is the standard approach when Querying STAC Catalogs Programmatically for time-series analysis, change detection, or cloud-free compositing.
Environment & Setup
| Component | Requirement | Notes |
|---|---|---|
| Python | ≥3.9 |
pystac-client drops support for EOL versions. Use 3.10+ for production. |
| pystac-client | ≥0.7.0 |
Aligns with STAC API v1.0.0. Includes improved pagination and max_items support. |
| STAC Catalog | v1.0.0 compliant |
Microsoft Planetary Computer, AWS Earth, or ESA DIAS endpoints work out of the box. |
Install dependencies via pip:
pip install pystac-client>=0.7.0
Complete Working Example
The following function queries the Microsoft Planetary Computer STAC API, applies a temporal filter, optionally constrains by bounding box, and returns a list of validated Item objects.
import pystac
import pystac_client
import logging
logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
def fetch_sentinel2_by_date(
start_date: str,
end_date: str,
bbox: list[float] | None = None,
max_cloud_cover: float = 20.0,
max_items: int = 100
) -> list[pystac.Item]:
"""
Query a public STAC API for Sentinel-2 L2A imagery within a date range.
Filters by cloud cover and returns a list of STAC Items.
"""
stac_url = "https://planetarycomputer.microsoft.com/api/stac/v1"
try:
client = pystac_client.Client.open(stac_url)
except Exception as e:
raise ConnectionError(f"Failed to connect to STAC API: {e}") from e
# RFC 3339 interval format required by STAC API spec
dt_range = f"{start_date}/{end_date}"
search_params = {
"collections": ["sentinel-2-l2a"],
"datetime": dt_range,
"query": {"eo:cloud_cover": {"lt": max_cloud_cover}},
"limit": 50 # Page size; max_items caps total results
}
if bbox:
if len(bbox) != 4:
raise ValueError("bbox must contain exactly 4 values: [min_lon, min_lat, max_lon, max_lat]")
search_params["bbox"] = bbox
try:
search = client.search(**search_params)
# max_items safely caps the iterator to prevent memory exhaustion
items = list(search.items(max_items=max_items))
logging.info(f"Retrieved {len(items)} Sentinel-2 items for {dt_range}")
return items
except Exception as e:
logging.error(f"Search failed: {e}")
return []
if __name__ == "__main__":
results = fetch_sentinel2_by_date(
start_date="2023-06-01",
end_date="2023-06-30",
bbox=[-122.5, 37.6, -122.0, 37.9],
max_cloud_cover=15.0,
max_items=50
)
if results:
first = results[0]
print(f"Item ID: {first.id}")
print(f"Acquisition: {first.properties['datetime']}")
print(f"Cloud Cover: {first.properties['eo:cloud_cover']}%")
print(f"Red Band: {first.assets['red'].href}")
Parameter Breakdown & STAC Compliance
1. Temporal Filtering (datetime)
The STAC API requires RFC 3339-compliant intervals. Use YYYY-MM-DD/YYYY-MM-DD for closed ranges, YYYY-MM-DD/.. for open-ended future queries, or ../YYYY-MM-DD for historical lookbacks. The API interprets these as inclusive boundaries.
2. Collection Targeting
Sentinel-2 is typically split into two collections:
sentinel-2-l2a: Bottom-of-atmosphere surface reflectance (recommended for analysis)sentinel-2-l1c: Top-of-atmosphere radiance (useful for atmospheric correction pipelines)
3. Cloud Cover Filtering
Raw temporal queries return all scenes, including heavily obscured ones. The query parameter leverages the eo:cloud_cover extension to filter server-side, drastically reducing payload size. This is documented in the official STAC API specification.
4. Pagination & Memory Management
pystac-client automatically follows next links across paginated API responses. Without a cap, large spatiotemporal queries can exhaust local memory. Always pass max_items to the .items() iterator or slice results explicitly.
Production Best Practices
- Validate Bounding Boxes: Ensure coordinates follow
[min_lon, min_lat, max_lon, max_lat]order. Crossing the antimeridian requires specialized handling or polygon queries. - Use Signed URLs: When downloading assets from cloud providers, leverage the catalog’s signing endpoint (e.g., Planetary Computer’s
pystac_client.Client.get_signing_url()) to avoid 403 errors on private buckets. - Defer Asset Loading: STAC
Itemobjects are lightweight metadata containers. Only resolve.assets["band"].hrefwhen ready to stream data intorasterio,xarray, orodc-stac. - Handle Missing Assets: Not all Sentinel-2 tiles contain every band at every resolution. Check
item.assets.get("band_name")before accessing to avoidKeyError.
Integrating with Raster Workflows
Once filtered, STAC items map directly to array operations. Understanding how spatial metadata, coordinate reference systems, and band ordering translate from JSON to NumPy arrays is essential for reproducible pipelines. Refer to Core Raster Fundamentals & STAC Mapping for detailed guidance on aligning STAC assets with GDAL/rasterio conventions.
For bulk loading, combine pystac-client with stackstac or odc-stac:
import stackstac
import rasterio
# Convert STAC Items to an xarray DataArray
da = stackstac.stack(results, assets=["red", "green", "blue"], epsg=32610)
# Or stream directly into rasterio for windowed reads
with rasterio.open(results[0].assets["red"].href) as src:
profile = src.profile
window = src.read(1, window=rasterio.windows.Window(0, 0, 512, 512))
The official pystac-client documentation provides additional examples for advanced query composition, authentication handling, and catalog crawling.