RasterStack Data Model¶
In titiler-openeo, the RasterStack data model serves as the foundational data structure for handling Earth Observation datasets. This document explains the enhanced RasterStack concept, its implementation with time dimension support, and the performance benefits it provides.
Overview¶
The RasterStack is a dictionary-like structure that organizes raster data along multiple dimensions, primarily time and spectral bands. Each entry in the RasterStack contains an ImageData object representing a multi-band image (2D or 3D) at a specific time point.
# Example of time-organized RasterStack structure
RasterStack = {
"2023-01-01": ImageData(...), # Multi-band image for first date
"2023-01-15": ImageData(...), # Multi-band image for second date
"2023-02-01": ImageData(...), # Multi-band image for third date
}
Dimensional Model¶
The RasterStack defines a clear dimensional hierarchy:
- Time Dimension (Primary): RasterStack keys represent temporal organization
- Spectral Dimension (Secondary): Each ImageData contains multiple bands
- Spatial Dimensions: Each band contains 2D spatial data (height, width)
# Time dimension: multiple temporal observations
temporal_stack = {
"2023-01-01": ImageData(array.shape=(4, 512, 512)), # 4 bands
"2023-02-01": ImageData(array.shape=(4, 512, 512)), # 4 bands
}
# Each ImageData represents multi-band observations at one time point
single_observation = temporal_stack["2023-01-01"]
# single_observation.array.shape = (bands, height, width) = (4, 512, 512)
ImageData vs RasterStack¶
- ImageData: Multi-band raster data for a single time point with spatial extent, CRS, and band metadata
- RasterStack: Time-organized collection of ImageData objects, enabling temporal analysis and processing
LazyRasterStack with Temporal Intelligence¶
The LazyRasterStack extends the basic RasterStack concept with sophisticated time-aware lazy loading and concurrent execution capabilities:
# LazyRasterStack with timestamp support
raster_stack = LazyRasterStack(
tasks=tasks,
key_fn=lambda asset: asset["id"],
timestamp_fn=lambda asset: asset["datetime"], # Enable temporal features
max_workers=5 # Concurrent execution
)
# Temporal access and grouping
temporal_groups = raster_stack.groupby_timestamp()
single_date_data = raster_stack.get_by_timestamp(datetime(2023, 1, 1))
Key Features of Enhanced LazyRasterStack¶
- Temporal Organization: Automatic sorting and grouping by timestamps for time-series analysis
- Concurrent Execution: Parallel loading of data using ThreadPoolExecutor for improved performance
- Timestamp-based Access: Direct access to observations by time periods
- Intelligent Caching: Per-key caching to avoid redundant computations
- Lazy Evaluation: Data loaded only when accessed, reducing memory footprint
- Multi-band Support: Each temporal observation can contain multiple spectral bands
Temporal Processing Capabilities¶
# Access specific time periods
jan_data = raster_stack.get_by_timestamp(datetime(2023, 1, 1))
# Process data chronologically
for timestamp in raster_stack.timestamps():
temporal_group = raster_stack.get_by_timestamp(timestamp)
# Each temporal_group contains all bands for that time point
# Efficient time-series operations
first_observation = raster_stack[raster_stack.keys()[0]] # Earliest
last_observation = raster_stack[raster_stack.keys()[-1]] # Latest
Advantages of the Enhanced RasterStack Model¶
- Temporal Consistency: Standardized time-first organization for all Earth Observation workflows
- Multi-dimensional Support: Explicit handling of time and spectral dimensions
- Concurrent Performance: Parallel data loading reduces processing time for large datasets
- Memory Efficiency: LazyRasterStack with intelligent caching minimizes memory usage
- Scalability: Efficient handling of time-series data with hundreds of temporal observations
- Predictability: Standardized multi-dimensional structure across all operations
Dimensional Processing Patterns¶
Temporal Dimension Operations¶
# Reduce across time (e.g., temporal mean)
from titiler.openeo.processes.implementations.reduce import reduce_dimension
temporal_mean = reduce_dimension(
data=raster_stack,
reducer=mean_reducer,
dimension="temporal"
)
# Result: Single ImageData with time-averaged bands
Spectral Dimension Operations¶
# Reduce across bands (e.g., NDVI calculation)
ndvi_stack = reduce_dimension(
data=raster_stack,
reducer=ndvi_calculator,
dimension="spectral"
)
# Result: RasterStack with single-band NDVI for each time point
Combined Processing¶
# Apply pixel selection across temporal groups
from titiler.openeo.processes.implementations.reduce import apply_pixel_selection
# Mosaic overlapping observations at each time point
mosaicked_stack = apply_pixel_selection(
data=raster_stack,
pixel_selection="first" # Uses temporal grouping automatically
)
How Multi-dimensional Processing Works¶
Load Phase - Temporal Organization¶
Data is loaded and organized temporally into a LazyRasterStack:
# Process graph example - loads multi-band time series
{
"process_id": "load_collection",
"arguments": {
"id": "sentinel-2-l2a",
"spatial_extent": {...},
"temporal_extent": ["2023-01-01", "2023-03-01"],
"bands": ["B02", "B03", "B04", "B08"] # Blue, Green, Red, NIR
}
}
# Results in LazyRasterStack with temporal keys, each containing 4-band ImageData
Process Phase - Dimension-aware Operations¶
Operations are applied respecting dimensional structure:
# Spectral processing within each time point
{
"process_id": "normalized_difference",
"arguments": {
"x": {"from_node": "load_collection", "band": "B08"}, # NIR
"y": {"from_node": "load_collection", "band": "B04"} # Red
}
}
# Produces single-band NDVI for each temporal observation
Temporal Analysis¶
# Time-series analysis across the temporal dimension
{
"process_id": "reduce_dimension",
"arguments": {
"data": {"from_node": "ndvi_calculation"},
"reducer": {"process_id": "mean"},
"dimension": "temporal"
}
}
# Produces temporal mean NDVI (collapses time dimension)
Code Examples¶
Working with Temporal RasterStacks¶
# Create a time-aware LazyRasterStack
from titiler.openeo.processes.implementations.data_model import LazyRasterStack
# Tasks with temporal metadata
tasks = [
(load_task, {"id": "s2_20230101", "datetime": datetime(2023, 1, 1)}),
(load_task, {"id": "s2_20230115", "datetime": datetime(2023, 1, 15)}),
]
raster_stack = LazyRasterStack(
tasks=tasks,
key_fn=lambda asset: asset["id"],
timestamp_fn=lambda asset: asset["datetime"] # Enable temporal features
)
# Access by time
january_data = raster_stack.get_by_timestamp(datetime(2023, 1, 1))
# Temporal iteration
for timestamp in raster_stack.timestamps():
temporal_group = raster_stack.get_by_timestamp(timestamp)
print(f"Time {timestamp}: {len(temporal_group)} observations")
Multi-band Processing¶
# Access spectral bands within temporal observations
observation = raster_stack["s2_20230101"] # Multi-band ImageData
bands = observation.band_names # ["B02", "B03", "B04", "B08"]
nir_band = observation.array[3] # NIR band (B08)
red_band = observation.array[2] # Red band (B04)
# Calculate NDVI for this time point
ndvi = (nir_band - red_band) / (nir_band + red_band)
Utility Functions¶
from titiler.openeo.processes.implementations.data_model import to_raster_stack
# Convert single ImageData to temporal RasterStack format
img_data = ImageData(...)
raster_stack = to_raster_stack(img_data) # {"data": img_data}
# Efficient access to temporal extremes
first_observation = get_first_item(raster_stack) # Earliest in time
last_observation = get_last_item(raster_stack) # Latest in time
Performance Benefits¶
The enhanced LazyRasterStack implementation provides significant performance improvements:
Concurrent Execution¶
- Parallel Loading: ThreadPoolExecutor enables concurrent data loading within timestamp groups
- Configurable Workers: Adjustable
max_workersparameter for optimal resource utilization - Timestamp Grouping: Efficient parallel processing of observations at the same time point
Memory Optimization¶
- Lazy Evaluation: Only loads data when explicitly accessed or processed
- Per-key Caching: Intelligent caching prevents redundant task execution
- Selective Loading: Timestamp-based access loads only relevant temporal subsets
Computational Efficiency¶
- Early Termination: Pixel selection and reduction operations can stop early when sufficient data is found
- Temporal Ordering: Pre-sorted temporal access eliminates runtime sorting overhead
- Exception Resilience: Graceful handling of failed tasks without blocking entire workflows
Scalability Improvements¶
- Large Time Series: Efficiently handles datasets with hundreds of temporal observations
- Multi-band Support: Optimized processing of high-dimensional spectral data
- Memory Footprint: Reduced memory usage for large Earth Observation collections
- Processing Speed: Concurrent execution significantly reduces wall-clock time
Best Practices¶
When working with the enhanced RasterStack data model:
Temporal Organization¶
- Use timestamp functions: Always provide
timestamp_fnfor time-series data to enable temporal features - Leverage temporal grouping: Use
get_by_timestamp()andgroupby_timestamp()for time-based processing - Respect temporal order: Take advantage of automatic temporal sorting for chronological processing
Performance Optimization¶
- Configure concurrency: Adjust
max_workersbased on your system resources and data characteristics - Use dimension reduction: Apply
reduce_dimension()to collapse unnecessary dimensions early in processing - Employ early termination: Use pixel selection methods that can terminate early ("first", "mean") when possible
Multi-dimensional Processing¶
- Design dimension-aware workflows: Structure processes to operate on appropriate dimensions (temporal vs spectral)
- Maintain dimensional consistency: Ensure operations preserve or appropriately transform dimensional structure
- Use utility functions: Leverage
get_first_item(),get_last_item(), andto_raster_stack()for consistent handling
Error Handling and Resilience¶
- Configure exception handling: Set appropriate
allowed_exceptionsfor robust data loading - Handle temporal gaps: Design workflows that gracefully handle missing temporal observations
- Test with diverse data: Validate performance with various temporal resolutions and band combinations