You are here: Home Naiad Project Data indexing

What is data indexing?

Data indexing is the core aspect of Naiad that allows more advanced and focused queries in a reasonable amount of time and without needing huge storage or bandwith resources.

 Data tiles

The whole idea is that data content of earth observation products can be described and catalogued in the same way as houses or appartments in an estate agency (described py location, year of construction, price, included facilities such as garage, parking lot, etc...), which makes then possible to search for objects with respect to precise or loose criteria.

So what would be an house or an appartment in the context of satellite earth observation? It would be a subset of the considered observation source (a data archive, which can span over a large period of time and contain several Tb of data). Such a subset, which we refer to as a tile in our semantic, can be defined as a bounded set of spatially and temporally continuous single observations, matching a specific biogeophysical event or feature (such as a storm, an algae bloom, a slick,...) or simply an area or a section of a swath. This means that indexes can reference the complete content of a dataset (coverage oriented tiles) or just a part of the content depending on the focus of the index (feature oriented tiles).

Feature oriented tiles

Floyd largeFeature oriented tiles only reference subsets of a datasets, where a specific type of event or feature (such as storm in a surface wind vector dataset) are detected. This allow to minimize the volume of data to download and also prevent the user from the burden of detecting such events in a massive data collection. Setting up such an index requires to apply dedicated datamining algorithm and image processing methods in order to extract and classify these features. Some properties about these features can be computed from the raw data or external sources (such as an hurricane history catalogue), and inserted into the index, allowing therefore advanced selection filters when querying the data (for instance searching for all the observations of a known hurricane). Typical feature indexes are the StormWatch indexes processed over several archives of scatterometer data.

Coverage oriented tiles

Ascat-swath-coverage largeCoverage oriented tiles are intended for a wide range of queries. Typically, complete archives of datasets are parsed : the orbit files are splitted into regular and homogeneous tiles (usually about 500x500km wide) registered into the index. Properties are computed for each tile, describing the content (such as the cloud contamination fraction for infra-red radiometers, the mean sea surface temperature (SST) for SST datasets, the mean wind speed, etc...) to allow usage of selection filters when searching for the data. This is usually the default index provided for any dataset since it does not assume anything on user interest, providing just a few relevant properties for filtering, and covers the complete datasets. It makes at least possible a fast data selection on space and time range criteria.



Using indexes in advanced queries

The purpose of indexes, as explained above is to offer finer search filters, specifying some constraints on the content of the searched data. These constraints have to be specified in the search query. Practical selection and usage of the search filters is detailed in the tutorial section. Here are presented some examples on the benefit of using advanced query criteria.

Limiting the amount of useless data

Some products may present a poor amount of valid or useful data. For instance sea surface temperature or ocean color are highly affected or masked by clouds and there is no need to download images or orbit files over cloudy areas since no (or very few) data will be usable. Applying a filter on cloud contamination will limit the amount of extracted data to subsets presenting a sufficient rate of valid observations.

SST-without-filtering-on-cloud-contaminationTiles of an AVHRR swath retrieved during a search without application of a filter on cloud contamination
SST-with-cloud-contamination-filterTiles retrieved during a search with application of a filter on cloud contamination (set to be lower than 60%)

Selecting particular data

Data indexes can be used also when focusing on a specific type of data pattern or situation, or some features. Users have to specify the conditions on content that they want the retrieved data to match. Only the tiles matching these criteria will be selected. For instance one may search for observations of a given hurricane (using a StormWatch type of index) or swell wave observation matching a given range of wavelength.

Swell-without-filteringSwell observation from ENVISAT ASAR without applying any selection filter
Swell-with-filteringSwell observations applying a selection filter on the measured wave length (between 400 and 500 meter) in order to keep storm-generated waves