Core Concepts¶
This document explains the core concepts and data models used in openEO by TiTiler.
Data Model¶
In openEO, a datacube is a fundamental concept and a key component of the platform. While traditional openEO implementations use multi-dimensional arrays for data representation, openEO by TiTiler simplifies this concept by focusing on image raster data that can be processed on-the-fly and served as tiles or as light dynamic raw data.
Raster Data Model¶
The backend uses three primary data structures for efficient processing:
- ImageData: Most processes use
ImageData
objects provided by rio-tiler for individual raster operations. This object was initially designed to create slippy map tiles from large raster data sources and render these tiles dynamically on a web map. Each ImageData object inherently has two spatial dimensions (height and width).
- RasterStack: A dictionary mapping names/dates to ImageData objects, allowing for consistent handling of multiple raster layers. This is our implementation of the openEO data cube concept, with some key characteristics:
- An empty data cube is represented as an empty dictionary (
{}
) - When there is at least one raster in the stack, it has a minimum of 2 dimensions (the spatial dimensions from the raster data)
- Additional dimensions (like temporal or bands) can be added, but they must be compatible with the existing spatial dimensions
-
Spatial dimensions are inherent to the raster data and cannot be added separately
-
LazyRasterStack: An optimized version of RasterStack that lazily loads data when accessed. This improves performance by only executing processing tasks when the data is actually needed.
Dimension Handling¶
The data cube implementation in openEO by TiTiler follows these principles for dimension handling:
-
Spatial Dimensions: Every raster in the stack has two spatial dimensions (height and width) that are inherent to the data. These dimensions cannot be added or removed through processes, as they are fundamental to the raster data structure.
-
Additional Dimensions: Non-spatial dimensions can be added to the data cube:
- Temporal dimension: For time series data (e.g., "2021-01", "2021-02")
- Bands dimension: For spectral bands (e.g., "red", "green", "blue")
-
Other dimensions: For any other type of categorization
-
Dimension Compatibility: When adding dimensions to a non-empty data cube, the new dimension must be compatible with the existing spatial dimensions. This means any ImageData added to the stack must match the height and width of existing rasters.
-
Empty Data Cubes: An empty data cube (
{}
) can receive any non-spatial dimension. The first raster data added to the cube will establish the spatial dimensions that all subsequent data must match.
Data Reduction¶
The ImageData object is obtained by reducing as early as possible the data from the collections. While the traditional load_collections
process is implemented, it's recommended to use the load_collection_and_reduce
process to immediately get an imagedata
object.
The reduce process includes a parameter to choose the pixel selection method:
first
(default): selects the first pixel valuehighest
,lowest
: selects extreme valuesmean
,median
,stddev
: statistical measureslastbandlow
,lastbandhigh
,lastbandavg
: band-specific selectionscount
: number of valid pixels
Collections and STAC Integration¶
openEO by TiTiler integrates with external STAC API services to provide collections. It uses pystac-client
to proxy the STAC API, configured through the TITILER_OPENEO_SERVICE_STORE_URL
environment variable.
OpenEO Process Graph to CQL2-JSON Conversion¶
The backend automatically converts OpenEO process graphs to CQL2-JSON format for STAC API filtering. Supported operators include:
- Comparison operators (
eq
,neq
,lt
,lte
,gt
,gte
,between
) - Array operators (
in
,array_contains
) - Pattern matching operators (
starts_with
,ends_with
,contains
) - Null checks (
is_null
) - Logical operators (
and
,or
,not
)
Example conversion:
// OpenEO process graph
{
"cloud_cover": {
"process_graph": {
"cc": {
"process_id": "lt",
"arguments": {"x": {"from_parameter": "value"}, "y": 20}
}
}
}
}
// Converted to CQL2-JSON
{
"op": "<",
"args": [{"property": "properties.cloud_cover"}, 20]
}
Performance Considerations¶
The backend is optimized for on-the-fly processing and serving of raster data. Key considerations:
- Processing time increases with the extent of data
- Larger extents may lead to timeouts
- The backend can be easily replicated and scaled
- No additional middleware required for deployment