1 Introduction
The raster model is a logical model widely used for the representation of data in Geographic Information Systems (GIS) Worboys and Duckham (2004); Rigaux et al. (2002) and for the storage of images in general. It is mainly used in GIS to store information of continuous variables, that cover the whole space and for which a specific value at each point in space may exist. Essentially the raster model represents this information as matrices of values. A matrix is built by dividing the space into fixedsize cells, so each cell represents the value of the spatial feature in the corresponding region. Raster image representations store the value of each pixel in a cell of the matrix.
The raster model is frequently used in GIS to store data related to natural geographic phenomena like temperature, wind speed, rainfall level, land elevation, atmospheric pressure, etc. Other not naturerelated information, such as land use, is also suitable to be represented by this model. The alternative model, the vector model, usually represents discrete variables that have welldefined boundaries, using a collection of points and segments. This is a good fit for the representation of information related to humanmade constructions, rivers, boundaries of lakes and forests, etc., but not for others that cannot be described with a few points and lines.
In this paper, we focus on the efficient representation of raster data. As stated before, a raster is essentially a matrix, so an uncompressed raster representation would use much space (for instance, a raster image with a resolution of just 0.5 km and worldwide coverage would require a matrix, or around 13 GB to store an integer per cell; modern highresolution raster imagery can reach much higher spatial resolution, and therefore require much larger storage space). Because of this, plain raster representations usually have to be stored in secondary memory. Compressed raster representations exist, but they are mainly designed to reduce storage, and do not provide efficient access. Most of them are based on wellknown compression techniques such as runlength encoding or LZW Welch (1984). In these compressed solutions the space requirements become much smaller, due to the locality of raster datasets (spatial continuity): close cells tend to have similar values. However, in most of them the full file, or at least large chunks of the file, must be decompressed even to display a small region of the space. A wellknown technique, called tiling Shekhar et al. (2017), divides the raster in smaller, fixedsize tiles and compresses each tile independently, providing some level of direct access and taking advantage of the locality of values to improve compression. For example, the TIFF image format and its extension for geographical information ^{2}^{2}2http://trac.osgeo.org/geotiff/ support this partition into tiles with different compression techniques including LZW. Still, tiles must be relatively large to enable compression.
When data collections are stored by a GIS in a compressed format, such as the ones we describe before, some of the processing tasks that involve the complete raster can be performed by simply decompressing the data. However, many operations would benefit from direct access to regions (e.g., to display a local map), or the ability to find the cells whose value is within some range. A classic example of this is the visualization of pressure or temperature bands Zhang and You (2010), where the raster is filtered to display in a different way the cells according to the range of values to which they belong. Another example involves retrieving the regions of a raster with an elevation above a given threshold, to find zones with snow alert, or below a value, to find regions with risk of floods Martinis et al. (2009). Regular compressed raster representations lack the indexing capabilities on the values stored in the raster that would be required to answer this type of queries. Therefore, these representations need to traverse the complete raster in order to return the cells that contain a given value, even when the results may be restricted to a small subset of the cells.
There are several approaches to provide direct access to values in a raster dataset. For instance, we can consider the raster as a 3dimensional matrix and use computational geometry solutions to answer any query involving spatial ranges or ranges of values by means of range reporting queries Chan et al. (2011). However, these solutions require superlinear space and therefore they are not suitable to the large datasets involved. Other representations of raster data that aim at efficient querying are usually based on quadtrees Finkel and Bentley (1974), particularly variations of the linear quadtree Gargantini (1982), a data structure originally devised for secondary memory. There exist other quadtree representations Chang and Chang (1994); Lin (1997a) that can work in internal memory, and they are very efficient for processing complete rasters, but they usually lack query capabilities to access specific regions or cells with specific values. An extension of the quadtree to 3 dimensions, or octtree Samet (1984), could support those queries in a similar way to the computational geometry solutions. This structure does not require superlinear space, but does not provide compression either.
Compact data structures have been a very active research topic for the last few decades. They aim to represent any kind of information (texts, permutations, trees, graphs, etc.) in compressed space, while supporting query and processing algorithms that are able to work over the compact representation. This allows compact data structures to improve the efficiency of classical data structures, thanks to being stored in upper levels of the memory hierarchy. However, regarding spatial information, and more precisely, raster data representation and querying, most of the previous work based on compact representations lacks in advanced query support Chang and Chang (1994); Lin (1997b).
A simple solution to store raster data using compact data structures could be achieved by reading the raster rowwise and storing the sequence of values. We could use any compressed sequence representation Grossi et al. (2003); Golynski et al. (2006); Barbay et al. (2010) to return the cells with a given value (or a range of values Grossi et al. (2003); Navarro (2012)) efficiently, but in this kind of approach restricting the search to a spatial range becomes difficult. Furthermore, these sequence representations achieve at best the zeroorder entropy space of the sequence, and this is not a significant space reduction in many cases, since it cannot fully exploit the spatial locality of values in raster data.
In this paper, we propose several compact data structures for raster data that efficiently support different queries, particularly those combining spatial indexing (filtering cells in a spatial window) with filters on values (retrieving cells with a specific value or in a range of values). We build on existing compact data structures that represent sets of points in a kind of compressed linear quadtree, and upgrade them to efficiently store and query raster data in different forms: simple binary images, general raster matrices, and even timeevolving raster data. We experimentally test our proposals to demonstrate their low space requirements and good performance in these new application domains. Notice that our data structures can be conceived as a compact representation for any kind of matrix. Nevertheless, they rely on the locality of values to achieve compression, so we focus our evaluation on raster data that displays spatial continuity.
2 Previous Concepts
2.1 The tree
The tree Brisaboa et al. (2014b) is a compact data structure for the representation of sparse binary matrices, that was initially devised to represent the adjacency matrix of Web graphs, and later applied to compression of social networks Claude and Ladra (2011) or RDF databases ÁlvarezGarcía et al. (2015). Given a binary matrix of size , the tree conceptually represents it as a ary tree, for a given ^{3}^{3}3The size of the matrix is assumed to be a power of . If it is not, the matrix is expanded to the next power of filling the new cells with 0s.. The root of the conceptual tree corresponds to the complete matrix. Then, the matrix is partitioned into equalsized submatrices of size , and each of them (taken from left to right and top to bottom) is represented as a child of the root node. A single bit is associated to each node: a 1 is used if the submatrix associated to the node contains at least one 1; otherwise, the the bit is set to 0. The subdivision is applied recursively for each node with value 1, until we reach a matrix full of 0s or the cells of the original matrix. The conceptual tree is then stored using two bitmaps: stores all the bits in the upper levels of the tree, following a levelwise traversal, and stores only the bits in the last level. Figure 1 shows an example of tree.
To navigate the tree a rank structure over T is built. This structure is used to compute the number of ones in the bitmap up to any position ( operation) in constant time, using sublinear space Munro (1996). The tree exhibits a property that provides simple navigation over the conceptual tree using only the bitmaps and the rank structure: given a value 1 at any position in , its children will start at position of . When the last level is reached, , so the excess determines their position in . A tree can answer single cell queries, queries reporting a complete row/column or general range queries (i.e., retrieve all the 1s in a range) using only rank operations to traverse the tree, by visiting all the necessary subtrees.
Some improvements have been proposed by the original authors of the tree Brisaboa et al. (2014b) to enhance its compression and query efficiency. For instance, a tree uses different values of in the upper and lower levels of decomposition. Other techniques use statistical compression of the bitmap . A dynamic variant of the tree, called dtree, has also been proposed Brisaboa et al. (2017). The dtree is based on a custom implementation of dynamic bitmaps for and . By supporting update operations over and , in addition to rank and select operations, the dtree is able to handle changes in the bits of the binary matrix, as well as insertion of new rows/columns at the end of the matrix.
2.2 The Itree
The Interleaved tree ÁlvarezGarcía et al. (2017) (Itree) is a data structure based on the tree and devised to deal with RDF triples. Given a ternary relation , the Itree uses vertical partitioning to decompose into binary relations , one for each different value . Hence, each adjacency matrix will store the pairs that are related with . The dimension is called partitioning variable. After this transformation, each of the binary relations could simply be stored in a separate tree, but the Itree is able to represent all those matrices simultaneously in the same tree, providing indexing capabilities also on .
Conceptually, building an Itree is equivalent to building a collection of trees and merging the equivalent branches of the conceptual trees into a single tree, where each node will store the bits of all the trees. This means that the children of the root node will always have bits, but nodes at lower levels of the tree have as many bits as 1s exist in their parent node (i.e., as many bits as trees contain that node). The conceptual tree is stored in two bitmaps and , exactly like a tree. Figure 2 displays an example of Itree, for . Note that the fourth node at the first level of the Itree () has 3 bits, one per matrix; its first bit is 0, because the bottomright submatrix of matrix is full of 0s, and the second and third bits of are set to 1; therefore, its children have 2 bits each.
The Itree can be navigated in a similar fashion to trees: at the root level we have nodes of bits each; given a node at position , with bits, its children are located at position in , where is a fixed correction factor; each children of the node will have bits, where is the number of bits set to 1 in the current node. Observe also that, thanks to having the bits for all the together in the same node, it is possible to restrict traversal of the tree to a specific value in the partitioning dimension. However, pruning the tree by the dimension requires more complex operations than filtering branches on the other dimensions, so the structure is usually limited to domains where a partitioning variable of small size can be selected. See ÁlvarezGarcía et al. (2017) for a further details on the implementation of query operations. in the Itree.
2.3 The treap
The treap Brisaboa et al. (2014a) is another proposal based on the tree, and also inspired by the treap Seidel and Aragon (1996). It is specifically designed to answer range topk queries on multidimensional grids (e.g. OLAP cubes). Given a matrix and a spatial window inside the matrix, a range top query asks for the location of the highest values in the query window.
Starting with a matrix , where each cell can be empty or store a numeric value, the treap follows a recursive partition of the matrix into submatrices, similar to the tree. The decomposition works as follows: the root of the tree stores the coordinates of the cell whose weight is the maximum value in the matrix, as well as the cell value. Then, the cell is marked as empty and removed from the matrix. Then, the resulting matrix is subdivided into submatrices and we add the corresponding children nodes to the root of the tree. The assignment process is repeated for each child, taking the cell with the maximum value and its coordinates from the corresponding submatrix, and deleting the value of the cell before continuing. Decomposition eventually stops when a completely empty submatrix is found or the cells or the original matrix are reached.
The conceptual treap is stored using three elements: a sequence coords per level, keeping the coordinates of the local maxima, and stored as relative offset to the origin of the current submatrix (note that empty nodes are dismissed, and coordinates are not needed in the last level of the tree since submatrices are of size 1); the values of the local maxima (i.e. their weights), also differentially encoded with respect to their parent node and compressed using DACs Brisaboa et al. (2013) (notice that a small array is also stored to mark the offset in the array where each level of the tree starts); and the tree topology, stored like a tree with a single bit array, , with rank support. This change is necessary since rank operations are also needed in the last level of the tree in the treap.
Figure 3 shows a treap construction, for . The top of the image shows the state of the matrix at each level of decomposition, and the cells selected as local maxima at each level are highlighted, except in the last level where all the cells are local maxima. Empty submatrices are represented in the tree with the symbol “”.
The treap provides support for cell access, basic range and topk queries, and also interval queries regarding cell weights. A detailed description of the structure and navigation algorithms is presented in Brisaboa et al. (2016).
3 Representation of binary rasters
Binary images can be considered as the simplest form of raster data. In a binary image we store a matrix that uses a single bit per cell, to determine whether a single feature is present or not within the region of the space corresponding to that cell. Hence, a binary raster is essentially a simplified version of a general raster, limiting the range of possible values to two. Several GIS applications make use of this simple technique to represent binary attributes of the space. Examples of this would be information of events like oil spills, plagues or cloud cover in their simplest version, as well as simple rasterized representations of vectorial data.
Due to the simplicity of these binary images, their representation usually requires specific techniques to achieve the best compression and query performance. The tree, introduced in Section 2.1, is an example of those. However, it was devised to compress Web graphs, so it works well mostly on binary matrices that are very sparse.
In this section we propose a solution, that we call ones, based on the tree, designed to efficiently compress the kind of binary images that usually appear in GIS applications. Essentially, our technique is devised to overcome the limitation of the tree to sparse matrices: our technique is designed to efficiently compress binary matrices with a large percentage of 1s, as long as there is some clusterization of the values, which is typical of most realworld raster data.
Our proposal is based on the same decomposition of the binary matrix used by the tree, but we recursively divide the matrix until we reach any uniform region, be it full of 0s or 1s. This means that in our ones we have 3 possible types, or “colors”, of node, following the usual naming of quadtrees: black and white nodes are regions full of ones and zeros, respectively; the internal nodes, that are regions with ones and zeros, are gray. Note that the main difference of our proposal with a tree is that we are able to represent large regions of ones with a single node, instead of using a full subtree.
The ones can efficiently answer cell retrieval queries, as well as row/column or range queries, simply by performing a topdown traversal of the tree branches that intersect the region of interest. The only consideration is that when a black node is found, we need to output all the cells of the region of interest that fall within the submatrix covered by that node.
The goal of the ones is to be efficiently traversed in a similar manner to the tree, using just rank operations. To achieve this purpose, we devised a small set of implementation alternatives that store the conceptual tree in different ways. Figure 4 shows all our implementation variants for the same conceptual tree. For each variant we will describe how the components are built, and how the basic traversal operations are implemented, since query algorithms are based on the same conceptual traversal of the tree in all cases.
3.1 Naive 2bit coding: ones
A simple representation for the new conceptual tree, where three kinds of nodes exist, is just to use 2 bits per node instead of one. According to this idea, we use the following encoding: internal (gray) nodes are encoded with 10; white nodes with 00; black nodes with 01. Then, the first bit of each node is stored in a bitmap , and the second bit in a second bitmap . With this setup, the bitmap marks internal nodes with 1 and leaves with 0, and stores the type of leaf. Note that this is not necessary in the last level of the tree, where no internal nodes can exist, so in the last level we use a bitmap like in regular trees.
The ones can be efficiently navigated, much like a tree, as follows: given an internal node at position , we can compute the position where its children start as , since each 1 in is an internal node that yields exactly children. If we find a leaf node (), we check to determine whether it is a black or white node. Notice that the rank structure to perform traversal in constant time is only needed in but not in or .
3.2 Improved 2bit encoding: ones
The ones uses 2 bits per node to represent only three possible node types in the tree. We can use a more spaceefficient encoding using just 1 bit for internal nodes. In this variant, internal nodes are stored with a bit 1 in , but do not have a second bit in . Again, the last level is stored with a single bitmap .
In the ones we can compute the children of a node using the same formula of the previous approach, since and are identical. The only difference is that, when we reach a leaf node at position , the corresponding bit in will not be located at position but at .
3.3 Navigable DFexpression: ones
Following the same ideas of the previous variants of using two bitmaps and , we propose a variant based on the DFexpression encoding Kawaguchi and Endo (1980). In this variant, we encode internal nodes with 10, white nodes with 0 and black nodes with 11. We use the same bitmaps and for the first and second bits of each node, and a single bitmap in the last level.
The encoding used by the ones has been suggested to be more space efficient than the previous ones, but it is not as efficient in our implementation since it requires more complex computations. Particularly, to compute the children of a gray node we need to count the number of internal nodes up to the current position, as . This increases complexity and forces us to add a rank structure not only to but also to bitmap in order to perform the previous computation. Notice also that, unlike in previous variants, we now need to check the bitmap to know if the current node is internal or a (black) leaf.
3.4 An asymmetric approach: ones
Our last proposal aims at storing our conceptual tree, with three types of nodes, using the same data structure of the original tree, and almost identical encoding. Internal nodes are encoded with 1 and white leafs with 0, like in the original tree. Black leaves are encoded as a small subtree: an internal node with white leaves as children. We take advantage of this configuration, that is not possible in a tree, to mark black nodes using (typically 5) bits.
The ones can be traversed exactly like a tree. The only difference is that, when we are performing traversal, if we reach a node encoded with 1, we need to check its children: if all of them are white, the current node is black. In practice, this can be performed when checking the node, or we can simply traverse it like an internal node in the tree, and in the next step check whether it was indeed an internal or black node.
The ones uses a very asymmetric encoding for the nodes, requiring bits for black nodes and only 1 bit for white nodes. However, it also has an interesting property: since it is identical to a tree where regions of ones are encoded using a shorter subtree, this approach will never exceed the space of the original tree.
3.5 Experimental evaluation
In this section we compare the ones with the original tree. We focus on two different types of data, with fundamentally different characteristics: Web graph datasets, that are very sparse, and raster data, where there can be a large percentage of ones. Table 1 shows the datasets used. The Web graph datasets^{4}^{4}4Provided by the Laboratory for Web Algorithmics (LAW) at http://law.di.unimi.it/datasets.php are very sparse datasets (less than 0.005% of ones). The raster datasets have been extracted from the Digital Land Model (MDT05) of the Spanish Geographic Institute^{5}^{5}5http://www.cnig.es. They are highresolution (cells of meters) elevation rasters. We took several fragments of the overall dataset, numbered as shown in Table 1. Datasets and are built by combining several adjacent pieces to build larger rasters. Note that the original datasets store decimal values; we select a reference value and build binary matrices by selecting all cells with value below the given threshold.
Type  Dataset  #Size 

Web graph  cnr  
eu  
indochina  
uk  
Raster  mdt200  
mdt400  
mdt500  
mdt600  
mdt700  
mdt900  
mdtA  
mdtB 
We run all the experiments on an AMDPhenomII X4 955@3.2 GHz, with 8GB DDR2 RAM, running Ubuntu 12.04.1. Our implementations are written in C and compiled with gcc version 4.6.2. with O9 optimizations.
3.5.1 Space analysis
We evaluate our ones implementation variants comparing them with original trees. We use for the comparison Web graphs and raster datasets, with a threshold set to have 50% of ones. For all the approaches we use a hybrid version, where in the first three levels of decomposition and in the remaining levels.
Table 2 shows the compression achieved by our techniques and the original tree. We highlight the best compression results for each dataset. In the first four datasets, Web graphs, the ones slightly improves the compression of the original tree, thanks to being able to exploit slightly larger clusters of ones that appear in most Web graphs. However, the sparsity of the datasets makes all of our other variants larger than the tree. In raster datasets, due to the much higher percentage of ones, the tree becomes much less efficient than our variants: our techniques are roughly 10 times smaller than the tree in all the datasets. The ones achieves the best compression results in all of them, but all the variants are relatively close.
Dataset  tree  ones  ones  ones  ones 

cnr  3.15  4.36  3.79  3.69  3.14 
eu  3.81  5.17  4.50  4.47  3.79 
indochina  2.03  2.60  2.25  2.26  1.92 
uk  2.95  4.02  3.49  3.43  2.91 
mdt200  0.25  0.04  0.03  0.03  0.04 
mdt400  0.22  0.02  0.01  0.02  0.02 
mdt500  0.23  0.03  0.02  0.02  0.03 
mdt600  0.22  0.01  0.01  0.01  0.01 
mdt700  0.23  0.02  0.02  0.02  0.02 
mdt900  0.24  0.04  0.03  0.04  0.04 
To better demonstrate the difference in performance when compared with the tree, we extended the evaluation to binary rasters with varying percentage of ones. Figure 5 (left) displays the compression obtained for the dataset , with thresholds set to get between 1% and 90% of ones. Results show that all the ones variants are already smaller than the tree baseline with a 1% of ones in the dataset, due to the larger size of the clusters of ones. The righthand plot in Figure 5 focuses on the differences among our proposals. All of them achieve similar results and evolve almost in parallel, but the ones is the best variant in general and the ones is the worst. The ones, being asymmetric, is slightly worse when the percentage of ones is around 50%.
3.5.2 Query times
In this section we focus on the query performance of the ones, particularly compared to that of the tree. Specifically, we measure performance on cell retrieval queries, that involve the traversal of a single branch of the tree to locate a cell, so they provide a clearer comparison of the differences in traversal cost among variants. We perform tests using Web graphs and binary rasters, that are again generated using a threshold over the original datasets to get binary images with 50% and 10% of ones, respectively. To compare the techniques, we measure query times to answer cell retrieval queries, i.e. returning the value of a given cell. We use a set of 10 million random queries for each dataset, and show the average query times in s/query.
Family  Dataset  tree  ones  ones  ones  ones 

cnr  0.46  0.47  0.52  0.66  0.49  
Web  eu  0.43  0.43  0.48  0.61  0.45 
graphs  indochina  0.50  0.51  0.58  0.75  0.53 
uk  0.58  0.60  0.66  0.89  0.61  
mdt200  0.54  0.37  0.41  0.58  0.42  
mdt400  0.50  0.29  0.33  0.43  0.34  
Raster  mdt500  0.53  0.33  0.37  0.51  0.38 
(50%)  mdt600  0.54  0.27  0.30  0.40  0.31 
mdt700  0.51  0.29  0.32  0.43  0.33  
mdt900  0.55  0.36  0.41  0.56  0.41  
mdt200  0.27  0.24  0.27  0.34  0.26  
mdt400  0.25  0.22  0.24  0.30  0.24  
Raster  mdt500  0.28  0.25  0.28  0.36  0.27 
(10%)  mdt600  0.26  0.23  0.25  0.32  0.25 
mdt700  0.19  0.15  0.17  0.20  0.17  
mdt900  0.30  0.28  0.31  0.41  0.31 
Table 3 shows the results for all the datasets, grouped by family. In Web graphs, the tree achieves the best query times, due to the simpler navigation required. Our encodings obtain higher query times than original trees. Nevertheless, the overhead of our fastest variant, the ones, is very low. The ones is also very efficient, whereas the ones and especially the ones are slower, due to the extra rank operations required. In the raster datasets, our fastest solutions are always more efficient than the tree, due to the improved access to regions full of ones. Again, the ones is the fastest variant and the ones the slowest. There is also a difference in performance depending on the percentage of ones in the dataset: the ones and ones are very similar in both cases, but the ones is slightly better when the percentage of ones is lower. Considering that the ones achieves the best compression in all the datasets, we consider it to yield the best spacetime tradeoff overall for any dataset. The ones can offer slightly better query times sacrificing space, whereas the ones can be an alternative when the number of ones is expected to be relatively low.
3.6 Comparison with linear quadtrees
The decomposition of the space in submatrices used in the ones is a generalization of the quadrant decomposition used by generic quadtrees. Hence, our technique can be seen as a compact quadtree representation, since the conceptual tree we are representing in our variants, for , can also be stored as a classical quadtree.
The linear quadtree Gargantini (1982) is a representation devised to work efficiently from secondary storage. In the linear quadtree, the quadrants are numbered 03 from left to right and top to bottom. Each entry in the matrix (i.e. each 1 in binary matrices) will be represented by a sequence representing the quadrant chosen at each decomposition step to reach the corresponding cell. These sequences, called quadcodes, can be sorted and stored in a BTree in secondary memory. Cell retrieval queries can be implemented as a simple search for the corresponding quadcode in the BTree.
Our ones variants are in practice more similar in space to compact quadtree representations designed for main memory, but those are usually designed for operations involving the full raster, whereas our techniques still retain the ability to efficiently access a subregion of the space, something that can be easily performed with linear quadtrees but not with other compact representations. In this section we compare the performance to answer cell retrieval queries of our techniques against linear quadtree implementations. We implemented an inmemory version of the linear quadtree, that uses a BTree maintained in main memory. Additionally, since the linear quadtree is a dynamic data structure that allows efficient modifications, we perform different comparisons for a static and dynamic setup. In the static comparison, we use our ones, and compare it with a linear quadtree that stores quadcodes in an array in main memory, using binary search to answer queries. In the dynamic comparison, we use a linear quadtree with a regular BTree, fully in main memory. We use a dynamic version of the ones, that is a straightforward adaptation of the existing dtree data structure to properly handle the new semantics for regions of ones. The machine and configuration of our variants are the same as in Section 3.5.
Dataset  Static  Dynamic  

ones  Quadtree  ones  Quadtree  
mdt600  0.02  0.25  0.04  0.31 
mdt700  0.02  0.17  0.04  0.23 
mdtA  0.01  0.22  0.02  0.23 
cnr  3.14  41.32  4.95  41.46 
eu  3.81  49.92  5.86  50.07 
Table 4 shows the compression, in bits per one, achieved by the ones and the corresponding static and dynamic linear quadtrees (QT). We only show results for a subset of the collections, since results are similar among all Web graphs, and among all raster datasets. Results show that our variants are around 10 times smaller than linear quadtrees in all the datasets.
Dataset  Static  Dynamic  

ones  Quadtree  ones  Quadtree  
mdt600  0.25  0.84  0.56  0.89 
mdt700  0.28  0.88  0.61  0.92 
mdtA  0.26  0.98  0.71  1.23 
cnr  0.77  2.08  2.55  2.28 
eu  1.10  2.62  3.80  2.94 
Table 5 displays a comparison of query times. We measure the average query time over a query set with 1 million random cell retrieval queries. As shown, our query times are still 23 faster than the linear quadtree in the static setup. In the dynamic setup, the overhead required by the dynamic implementation of our structure causes it to become 23 times slower than the static version, so query times become similar to those of linear quadtrees. Due to this, we are faster than linear quadtrees in the raster datasets, but slower in Web graphs. We consider the raster datasets to be more significant to the actual performance of the solutions, since they are designed for this kind of data, but even the worse query times obtained in Web graphs are easily compensated by the much better (8x) compression.
4 Representation of general rasters and spatiotemporal data
In this section we introduce solutions based on the ones that can handle more complex raster data. Particularly, we focus on the representation of general raster data and temporal raster data. In general rasters we have a matrix of nonbinary values in which each cell contains a numeric value. Temporal rasters store the evolution of a raster data along time. We will describe the usual problems for both kinds of raster data and then introduce our proposals to store them.
In our representations for general raster data we aim at providing support for queries involving not only the spatial dimension of the dataset, but also the possible values stored. For instance, the values above a given threshold in an elevation raster can be selected to yield snow alerts in a given region. Our solutions are designed to efficiently answer this kind of queries, combining a spatial constraint with a filter on the possible values, as well as simpler queries involving constraints only on space or values.
Due to their characteristics, some of the data structures we introduce for integer rasters can also be adapted to the representation of spatiotemporal data, or timeevolving regional data. We consider temporal rasters containing the evolution of a binary raster dataset along time. Hence, we essentially have a collection of rasters corresponding to the same feature in different time points. In these datasets, we also have two ways to filter the data: spatial constraints, to obtain values in a region, and temporal constraints, to obtain values in a given time interval. We consider the following temporal constraints:

Timeinstant, or timeslice, queries refer to a single point in time.

Timeinterval queries refer to a time interval. We consider three different types of interval: standard queries just return all the results found, possibly with multiple occurrences for the same cell; weak queries will return the set of cells that fulfilled the query constraints at any point in the interval (e.g., in a cloud cover raster, find the regions that were covered at any time); strong queries return the set of cells that fulfilled the constraints during the full interval.
4.1 Our proposals
The proposals we introduce next are tree variants, in most cases built from our ones implementations. For general rasters, we assume that our input is a matrix , of size whose cells contain integer values in the range . Note that this implies the assumption that the number of different values is not too large, and raster dataset with floatingpoint values can either be rounded or mapped to an integer range. For temporal rasters, we assume that we have a collection of binary rasters of the same size. Most of our proposals can be applied to both cases, with adjusted algorithms to answer the relevant queries.
4.1.1 Multiple ones: Mones
The Mones uses a collection of ones to store the original data. If we see the input matrix as a collection of binary matrices , one for each possible value, the representation of is reduced to the representation of a collection of binary rasters. The Mones simply stores each (i.e. the cells with each possible value) in a different ones .
In this approach, queries involving cells with a given value can be answered by checking a single ones. Queries involving a range of values, however, require checking all the trees in the range, so they become less efficient. The worst performance, therefore, is expected in queries with no constraints on values, where all the trees have to be checked.
The same approach can be used for temporal raster data: we use a different tree per time instant. Timeinstant queries are executed on a single tree but timeinterval queries require a synchronized traversal of several trees. Note that in standard timeinterval queries we can just return all the results querying each tree separately, but for weak and strong queries we need to traverse all the trees simultaneously and compute the or or and operation of their corresponding bits to filter out branches that do not fulfill the query semantics.
4.1.2 Cumulative ones: CMones
Our second proposal, the CMones, is based on the same idea of building a tree per value, but uses a cumulative approach: the first tree will store the cells with the minimum value; each consecutive tree will store the cells with the next value, plus all the cells stored in previous trees. Figure 7 shows the CMones representation, for the same input matrix of Figure 6.
In this approach, the trees store a much larger number of ones. However, taking advantage of the ability of the ones to store large regions of ones, the space of the final structure is not expected to increase too much with respect to the previous approach. In some raster datasets, where values tend to form concentric curves, the use of cumulative values can even improve compression by generating larger clusters.
The CMones can answer any query involving a single value, or range of values, using the same strategy: for a range , we compute the results for value and subtract those of value (in practice, we can traverse both trees simultaneously to filter out branches as soon as possible). Hence, its performance is independent of the length of the range. Additionally, it can answer queries not involving value constraints more efficiently: to find the value of a single cell, instead of checking every tree, we can use binary search to look for the leftmost tree that contains the cell.
The CMones relies on the fact that the leftmost tree containing a 1 for the cell yields the actual value of the cell. This approach cannot be used for timeevolving data, where the same cell can change value several times.
4.1.3 tree
The tree is a straightforward extension of the tree to three dimensions. The conceptual decomposition of a bidimensional matrix can be extended to any number of dimensions, creating submatrices at each step to build a tree. Navigation of the tree is similar, just considering constraints in the new dimensions and adjusting the formulas to nodes with children.
Our approach uses a tree to store the complete raster matrix. Particularly, it will store a 3dimensional binary matrix, where the third dimension is the value of the cell. Hence, for each coordinate the only 1 in the third dimension will correspond to the value of that cell.
Retrieval algorithms in the tree are quite simple: to get the value of a cell, we simply traverse the conceptual tree looking at all the branches for that coordinate; to find cells with a given value or range of values, we fix the range in the third dimension and search for all the ones in the corresponding slice of the matrix.
The tree can also be applied to temporal raster data. Considering the third dimension as time, we can combine all the raster datasets in a single 3dimensional matrix. Timeinstant and standard timeinterval queries are similar to queries on values. Weak and strong timeinterval queries can be processed as standard queries, filtering out repeated values during or after traversal.
4.1.4 Iones
The Itree has been shown to improve the performance of a collection of trees in other application domains. Therefore, our next proposal is an adaptation of the same data structure to work with our ones. This just requires adjustments in the data structures and basic navigation operations similar to those performed in individual ones. For instance, using the variant based on the ones, a second bitmap must be added, and additional operations are defined to check the color of a node and traverse the tree to reach its children.
Figure 8 shows the Iones, for the same input matrix used in previous examples. We display the actual bits used by the ones encoding, and the final bitmaps generated. Notice that the bits of each node correspond to the concatenation of the corresponding bits in the equivalent Mones representation.
The Iones can answer queries involving a single value or a range value using the same traversal techniques of the original Itree. Even if navigation is slower than in individual ones, making simple queries slower, the ability to combine all the trees into one provides a much more efficient way to perform checks in queries involving ranges of values or not involving value constraints.
The Iones can also be adapted to temporal raster data. Particularly, most timeinterval queries can be efficiently answered by keeping track of the corresponding limits of the range for each node: in weak queries, if the current node contains at least a one in our interval, we can confirm the result immediately; in strong queries, if a node has at least a 0 in the interval, we can discard the result.
4.2 Experimental Evaluation for General Rasters
We test the performance of our proposals using the real elevation rasters described in Section 3
. Since the values stored are floatingpoint values obtained from interpolation, we round the values to a precision of 1m.
We compare the compression of our techniques with a GeoTIFF^{6}^{6}6http://trac.osgeo.org/geotiff/ representation of the same datasets. tiff simply stores the matrix rowwise, using a 16bit integer per cell; tiff uses the default compression options: the matrix is partitioned in tiles of size , and LZW compression is applied to each tile.
To measure the query efficiency of our proposals, we compare them with GeoTIFF using the libtiff library, version 4.0.3. All time measurements correspond to CPU time. We consider the following representative queries: cell retrieval queries, that ask for the value of a given cell; singlevalue queries, that ask for all the cells with a given value; and combined queries, that ask for cells within a spatial region and with values in a given range.
Dataset  #values  M  CM  tiff  tiff  

mdt500  578  5.43  2.75  2.21  1.83  2.53  16.01  1.52 
mdt700  472  4.39  2.07  2.30  1.38  1.84  16.01  1.12 
mdtA  978  5.86  3.24  2.83  1.94  3.10  16.01  1.52 
mdtB  2,142  5.32  3.15  4.36  1.62  3.12  16.00  1.35 
Table 6 shows the compression obtained for different raster datasets. For each dataset, we show the number of different values existing in the dataset, as well as the zeroorder entropy of the matrix, read in row order. The best space results are obtained by the compressed TIFF representation, and the best of our proposals is the tree, that is only 1020% larger. Note that tiff is designed mainly for compression, and it does not provide support for efficient access.
Dataset  M  CM  tiff  tiff  

mdt500  123.6  7.1  2.2  30.7  2.6  491.7 
mdt700  65.8  6.1  1.6  27.5  2.7  461.9 
mdtA  131.9  10.2  2.8  46.2  5.2  499.0 
mdtB  421.0  11.1  2.9  75.6  87.9  494.8 
Table 7 shows the results obtained for cell retrieval queries. Our best approach, the tree, is much faster than the tiff variant, and even faster than the plain version (this is an artifact due to the nature of the library, that is not designed to access specific cells and always processes the data in chunks). Among our techniques, the CMones variant is several times slower than the tree, but still efficient. The Mones and Iones variants are much less efficient in this simple query, in the first case due to the need for a sequential search in all the trees, and in the second case because of the added complexity of the structure.
Dataset  M  CM  tiff  tiff  

mdt500  3.9  5.8  9.4  5.9  39.5  221.4 
mdt700  3.0  6.0  7.3  4.5  37.5  199.5 
mdtA  8.2  13.6  18.9  12.7  142.6  799.0 
mdtB  110.2  255.1  196.6  173.5  3,838.9  19,913.4 
Table 8 displays the query times to retrieve all cells with a given value. This query demonstrates the indexing capabilities of our techniques, all of them being much faster than the TIFFbased implementations, because we can filter results by value while they have to traverse the complete dataset. The Mones is the fastest technique, since it has a specific structure per value. The CMones, as expected, is roughly two times slower. The Iones is also inefficient, due to the more complex navigation of the structure. Finally, the tree is now slightly slower than the other techniques, due to the locality of values: many regions with values close to the target generate branches in the tree that have to be checked but will be discarded later.
Dataset  Window  Range  M  CM  tiff  tiff  
size  length  
mdt500  10  10  9.0  1.9  1.8  25.9  33.0  533.0 
50  43.1  2.1  2.6  27.9  25.0  528.0  
50  10  13.5  3.4  5.0  29.0  119.0  694.0  
50  69.7  5.9  16.0  41.2  120.0  695.0  
mdt700  10  10  9.6  2.1  1.7  24.1  32.0  506.0 
50  45.3  2.1  2.3  25.1  25.0  496.0  
50  10  13.5  4.0  4.4  29.2  123.0  649.0  
50  68.5  5.4  13.4  37.7  123.5  649.0  
mdtA  10  10  9.9  2.6  2.0  37.2  81.0  548.0 
50  43.6  2.8  2.6  38.6  47.0  532.0  
50  10  13.4  3.8  4.2  39.2  228.0  703.0  
50  62.2  4.9  11.0  46.9  229.0  697.0  
mdtB  10  10  11.6  3.9  2.3  55.7  1,329.0  1,265.0 
50  56.9  3.9  2.5  58.1  881.0  892.0  
50  10  14.5  4.5  3.2  59.1  2,007.0  1,237.0  
50  49.8  5.5  21.2  89.0  5,715.0  2,038.0 
Table 9 shows the query times obtained for combined queries involving different spatial windows and value ranges. Results confirm that all our proposals are again faster than the TIFFbased solutions, that are unable to filter small subsets of data. The CMones is now the fastest of our techniques in most cases, thanks to its ability to efficiently compute the difference between any two values. The tree also achieves good query times overall, and is the fastest technique in some of our tests, thanks to its ability to efficiently filter in the 3 dimensions at the same time. The Mones is very inefficient, especially with longer ranges, whereas the Iones is also inefficient but scales better to longer ranges.
4.3 Experimental Evaluation for Temporal Rasters
Next we test the application of our proposals to temporal raster data. We perform an experimental evaluation on real and synthetic datasets. CFCA and CFCB contain cloud fractional cover data^{7}^{7}7Obtained from the Satellite Application Facility on Climate Monitoring, at http://www.cmsaf.eu, covering the whole world with a resolution of 0.25 degrees. CFCA uses data from years 1982 to 1985, and CFCB data from 2007 to 2009. Our threshold to determine the value of the raster is a cover value above 50%. RegionsA and RegionsB are synthetic datasets created by randomly grouping circles and altering their borders to build random but generally smooth and connected regions. Time evolution in these datasets simulates slow movement and changes/deformations of the original shapes.
The experiments in this section were run in a machine with 4 Intel Xeon E5520 cores at 2.27 GHz and 72 GB of RAM memory, running Ubuntu 9.10. Our code is compiled with gcc 4.4.1, with O9 optimizations.
Dataset  Size  #snaps.  %  M  I  Quadcodes  

ones  base  diff  
CFCA  1,111  67.6  1.11  0.71  0.55  6.73  5.01  
CFCB  918  58.4  1.37  0.83  0.65  7.53  5.77  
RegionsA  1,000  23.7  0.04  0.09  0.06  0.64  0.16  
RegionsB  1,000  24.2  0.03  0.08  0.06  0.53  0.13 
Table 10 displays the spatial size, number of time instants and percentage of ones in each dataset. The remaining columns of the table show the compression results obtained by our proposals. As a baseline, we show the space that would be necessary to store the quadcodes of the corresponding raster datasets with two approaches: using a separate representation per time instant (base); and using a differential approach where only the changes are stored at each time instant (diff). The latter corresponds to the minimum space that would be required by a linear quadtree that uses differential encoding, like the OLQ Tzouramanis et al. (2004). Results show that our techniques are much more spaceefficient than the baseline. The Mones and the Iones, that do not take advantage of similarities between consecutive time instants, achieve the best results in the CFCA and CFCB datasets. However, in RegionsA and RegionsB the tree is much more efficient. This is due to the change rate of the datasets: in the CFC datasets a large fraction of values change between consecutive time instants, whereas in our Regions datasets changes are more gradual. Therefore, the tree can take advantage of similarities between consecutive time instants in the latter, but is not able to do it in the former.
In order to confirm the effect of the change rate, we build smaller datasets taking subsets of 100 snapshots from RegionsA. We build datasets taking every time instant, every second time instant, and so on, hence representing the same temporal raster with different time granularity. We also create a new dataset, built like RegionsA but with , and generate a similar group of subsets from it.
Figure 9 shows the compression results obtained in the datasets built from RegionsA (left) and in the datasets built from the larger raster (right). Each plot displays how the compression obtained by our structures evolves as the change rate (measured as the percentage of ones that change on average between consecutive time instants) increases. The tree is the most efficient of our proposals for datasets with very small change rate, but when the number of changes reaches a given threshold, the implementations that ignore similarity between snapshots (Mones and Iones) become more efficient. Hence, the tree is the best alternative for slowly changing datasets, but it is not able to exploit similarities when changes exceed a relatively slow percentage. Notice that datasets with high change rate would also be difficult to compress using any other stateoftheart techniques based on exploiting this similarity, like OLQs.
We compare the query performance of our proposals using snapshot queries and timeinterval queries. We select different window sizes and time interval lengths, and build random query sets for each of them. For timeinstant queries, our query sets for each configuration contain 1,000 random queries. For timeinterval queries, we also consider different interval lengths, and build query sets with 10,000 random queries per window size, interval length and dataset. In all cases, we measure CPU times, and average the times over a number of repetitions of the full query set to obtain precise results.
Figure 10 shows the results obtained for all the datasets in snapshot queries, for different spatial window sizes. Results are consistent with those in the previous section: the Mones is the fastest technique, since it only has to query one tree. The Iones is around two times slower, but still faster than the tree, that must traverse many branches corresponding to time instants close to the target.
Figure 11 shows the query times for standard timeinterval queries (i.e. queries returning all occurrences for the same cell). We display results for a representative window of size 32, and interval lengths 1 to 40. The tree is the most efficient technique for long intervals, whereas the Iones is competitive in shorter intervals. Notice that the Mones is only the fastest for snapshot queries.
In addition to standard timeinterval queries we also check weak and strong queries. The results are shown in Figures 12 and 13 respectively. The evolution of query times is significantly different for these queries: the Mones technique still achieves query times roughly proportional to the length of the interval, since it must perform a search in all the trees involved. However, the tree and the Iones are much less affected by the interval length. The Iones obtains similar times for any interval length, and is the best solution in general in this case, since it has the ability to efficiently check any time interval at any node of the conceptual tree. The tree, on the other hand, cannot improve the query times of the standard query algorithm, being forced to check all the branches and then removing duplicates, so it becomes much slower than the Iones. In strong interval queries, in which many search branches could be potentially filtered checking the intervals, the tree is the slowest technique in general, especially in the CFC datasets, due to their higher change rate.
5 Top range queries in raster data
In this section we describe how to apply the same ideas devised in previous sections to obtain structures that solve top range queries, i.e. given a spatial window, queries that retrieve the cells with maximum values inside it. The treap, introduced in Section 2.3, is able to answer this kind of queries in general matrices. We introduce next two variants that extend the original treap to efficiently handle raster matrices where values are highly clustered. Then, we compare our proposals with a naive technique based on the Mones, that simply searches for cells in the tree corresponding to the maximum value, and keeps searching in consecutive trees until the desired number of results is obtained.
5.1 treap variants
Our first variant, called treapuniform (treap), is built in a similar manner to the original treap. Yet, like in our ones, the decomposition of the matrix stops whenever a “uniform” submatrix is found. This can happen when an empty region is identified or when the same value is shared by all its cells. Figure 14 shows an example of this tree decomposition. Matrices to display the consecutive steps of the treap construction, where the top cells (cells with the maximum value) for each step are highlighted. Observe that any dataset can be represented in a more compact way if similar values are present on many of its submatrices. Notice also that in uniform nodes all the cells in the submatrix share the same values, so we do not have to keep the coordinates of the cell with the maximum.
After these changes in the conceptual tree, we use a ones to store the tree shape. Uniform nodes are marked as black nodes were in a binary raster, and empty nodes as white nodes. Using the same techniques explained for binary matrices, we can easily check whether a node is empty or uniform. In empty nodes we stop traversal, and in uniform nodes we can immediately output all the cells in the submatrix with the same value. The actual representation uses, in addition to the ones, the arrays , and , that work essentially like in the original treap.
Only minor adjustments are required to traverse the conceptual tree in our variant. Unlike node values, which are kept for all the nodes in the treap, coordinates are just stored for nonleaf nodes. Therefore, we can use the formula to get the offset in the list of coordinates corresponding to the current position and level in the tree. To compute the offset of the node in the list of values, we also have to consider uniform nodes (marked with a 1 in ) in our formula: (i.e., the number of internal nodes and uniform nodes that exist up to the current position, respectively).
Our second proposal, called treapuniformorempty (treap), tries to improve compression even more, at the expense of increasing query times. This approach slightly differs from the previous one. Here, we stop decomposition at any node as long as all the values in the corresponding submatrix are equal (even if some cells have a value and others are empty). For instance, in Figure 14, the bottomleft quadrant in becomes uniform with this new definition. This variant essentially builds the same treap representation, but taking into account that these regions are now also considered as uniform. Figure 15 depicts an example of this new approach.
This proposal will cut many branches earlier during the construction of the tree. Even so, it has a drawback: since we cannot tell apart uniform and empty regions easily, some results may be emitted more than once. For instance, if cell had the maximum value in the matrix, it will be emitted at the root of the tree. But when traversing the bottomleft quadrant, if we identify that region as “uniform”, it may be emitted a second time. Hence, to solve top queries we use an additional data structure to keep track of already emitted results (any binary search tree or hash table suffices for this purpose). The additional overhead may become significant in space and/or time for large , providing a space/time tradeoff between this proposal and the treap.
5.2 Experimental evaluation
To test the query efficiency of our proposals, we compare them with the Mones representation of raster data. Notice that, despite its simplicity, the Mones can efficiently answer top queries by querying the individual trees, starting from the one corresponding to the highest value, so it should be relatively efficient for this kind of queries.
Table 11 shows the compression results obtained by our treap variants and the Mones for different raster datasets. Our first variant, the treap, is larger than the Mones, but the treap achieves better compression. Both variants obtain reasonable results in terms of space, at least comparable to the solutions described for general raster data, so they are a viable alternative if top queries are relevant.
Dataset  Mones  treap  treap 

mdt500  2.75  3.18  2.87 
mdt700  2.07  2.19  1.98 
mdtA  3.24  3.51  3.20 
Table 12 shows the query times for top queries obtained by all the tested data structures. For each dataset several window sizes and values of are tested, by generating sets of random square windows within the bounds of the raster. Results show that the treap exhibits a good performance, regardless of the window size or value, as it is the alternative that achieves the best results in most of the cases. The treap, the most compact of the treap variants, still behaves well for small values of , but when increases the overhead of keeping track of previous results dominates the query cost. Also, observe that for larger values of , the Mones becomes more competitive with the treap data structures, since if the query involves many accesses to the tree retrieving cells from one or more trees, it requires less computation than extracting values one by one from the treap.
Dataset  Window  Mones  treap  treap  

mdt500  100  10  164.2  15.3  15.8 
100  178.3  40.8  49.8  
1,000  279.8  233.7  385.5  
500  10  131.3  15.8  16.0  
100  143.5  41.8  50.5  
1,000  217.0  230.0  372.5  
1000  10  125.5  15.8  16.3  
100  134.5  41.5  51.0  
1,000  200.0  219.8  358.0  
mdt700  100  10  357.0  12.3  12.8 
100  381.5  31.3  37.3  
1,000  455.3  185.0  284.8  
500  10  309.0  15.8  16.0  
100  346.3  41.3  49.0  
1,000  495.8  244.5  383.0  
1000  10  281.0  16.8  17.3  
100  318.0  46.3  54.0  
1,000  514.0  281.8  446.3  
mdtA  100  10  493.6  20.4  19.2 
100  478.0  43.2  50.8  
1,000  665.0  239.2  376.0  
500  10  426.8  23.2  21.6  
100  422.0  50.4  56.0  
1,000  581.2  253.6  396.0  
1000  10  422.4  22.4  22.4  
100  419.2  52.8  60.0  
1,000  547.2  260.0  408.0 
6 Conclusions
We have presented several compact data structures for the representation of general raster data with advanced query support. Our representations store real raster datasets in small space and provide efficient access not only to regions of the raster, but also advanced query capabilities, such as selecting cells with a particular value or range of values, queries that involve spatiotemporal restrictions, or even top queries.
Most of the proposals are based on variants of the tree. We propose a representation, called ones, that enhances the tree so that we can efficiently compress any kind of clustered binary matrix. Building over this, we propose compact and indexed solutions for different application domains. Additionally, most of the approaches introduced can be transformed into dynamic solutions using a dynamic tree.
Overall, our proposals obtain good compression results and are able to answer a variety of interesting queries. In our experiments we show that our proposals are very compact, several times smaller than stateoftheart representations based on linear quadtrees, and still able to store and query large datasets in main memory. We evaluate our representations for general raster data, showing their relative strengths and drawbacks: the tree obtains very good space results, being close to a compressed GeoTIFF representation, and shows competitive times in most cases, but the variant with independent ones obtains the best time results to retrieve all the cells with a given value, and the variant with cumulative ones obtains the best results in most of the queries involving ranges of values. Nevertheless, the results of our proposals are clearly better than the representations based on GeoTIFF images. We also apply some of the proposals to the representation of timeevolving raster data. Results show again relative strengths among our proposals: a tree is the best solution for slowlychanging datasets, but as soon as the change rate increases the approaches based on multiple ones become smaller. Finally, we also test new proposals to answer top queries in raster data. Our experiments confirm the space efficiency of the treap variants, that are competitive in space with our other representations of raster data and faster to answer top queries.
We show the scalability of our representations to efficiently represent rasters with several thousands of different values. Nevertheless, the space efficiency of most of our proposals will degrade if the number of different values in the raster becomes too high. An assumption in our proposals is that the number of different values in the dataset is not too high. We claim that in many realworld datasets, even though the values actually stored may have a high precision, that precision does not add quality or accuracy after a given threshold: when measuring features such as temperature, elevation, pressure, etc. the actual measurements may have highprecision but the interpolation of values, or even the simple averaging of measurements, distorts the precision of the measurements, so for many purposes we can safely reduce the precision of the values significantly without reducing the quality of the dataset.
The preliminary version of this work inspired several other research lines. In particular, limitations to handling large ranges of values were recently addressed in followup research Ladra et al. (2017), that extends our original work to support higherprecision datasets. Our representations are preferable when highresolution values are not available or not relevant (e.g., in some applications, highresolution values are just interpolations), as well as in domains where the number of different values is small (e.g., landuse rasters). Additionally, we have extended our proposals to efficiently store and query timeevolving data, a challenging problem where other solutions are difficult to apply due to the particularities of spatiotemporal queries.
References
 A succinct data structure for selfindexing ternary relations. Journal of Discrete Algorithms 43, pp. 38 – 53. Cited by: §2.2, §2.2.
 Compressed vertical partitioning for efficient RDF management. Knowledge and Information Systems 44 (2), pp. 439–474. Cited by: §2.1.
 Alphabet partitioning for compressed rank/select and applications. In Proc. of the 21st International Symposium on Algorithms and Computation (ISAAC 2010), pp. 315–326. Cited by: §1.
 Compressed representation of dynamic binary relations with applications. Information Systems 69, pp. 106 – 123. Cited by: §2.1.
 Aggregated 2d range queries on clustered points. Information Systems, pp. 34–49. Cited by: §2.3.
 Treaps: range top queries in compact space. In Proc. 21st Int. Symp. on String Processing and Information Retrieval (SPIRE 2014), pp. 215–226. Cited by: §2.3.
 DACs: bringing direct access to variablelength codes. Information Processing and Management 49 (1), pp. 392–404. Cited by: §2.3.
 Compact representation of web graphs with extended functionality. Information Systems 39 (1), pp. 152–174. Cited by: §2.1, §2.1.
 Orthogonal range searching on the RAM, revisited. In Proc. of the 27th International Symposium on Computational Geometry (SoCG 2011), pp. 1–10. External Links: ISBN 9781450306829 Cited by: §1.
 Fixed binary linear quadtree coding scheme for spatial data. Visual Communications and Image Processing, pp. 1214–1220. External Links: Document, Link Cited by: §1, §1.
 Practical representations for web and social graphs. In Proc. 20th ACM Int. Conf. on Information and Knowledge Management (CIKM 2011), pp. 1185–1190. Cited by: §2.1.
 Compact querieable representations of raster data. In Proc. of the 20th Int. Sym. on String Processing and Information Retrieval (SPIRE 2013), pp. 96–108. Cited by: footnote 1.
 New data structures and algorithms for the efficient management of large espatial datasets. Ph.D. Thesis, Department of Computer Science, University of A Coruña, Spain. Cited by: footnote 1.
 Quad trees: a data structure for retrieval on composite keys. Acta Informatica 4, pp. 1–9. Cited by: §1.
 An effective way to represent quadtrees. Communications of the ACM 25 (12), pp. 905–910. Cited by: §1, §3.6.
 Rank/select operations on large alphabets: a tool for text indexing. In Proc. of the 17th ACMSIAM Symposium on Discrete Algorithms (SODA 2006), pp. 368–373. External Links: ISBN 0898716055 Cited by: §1.
 Highorder entropycompressed text indexes. In Proc. of the 14th ACMSIAM Symposium on Discrete Algorithms (SODA 2003), pp. 841–850. External Links: ISBN 0898715385 Cited by: §1.
 On a method of binarypicture representation and its application to data compression. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI2 (1), pp. 27 –35. External Links: Document, ISSN 01628828 Cited by: §3.3.
 Scalable and queryable compressed storage structure for raster data. Information Systems 72, pp. 179 – 204. Cited by: §6.
 Set operations on constant bitlength linear quadtrees. Pattern Recognition 30 (7), pp. 1239–1249. External Links: ISSN 00313203 Cited by: §1.
 Set operations on constant bitlength linear quadtrees. Pattern Recognition 30 (7), pp. 1239–1249. Cited by: §1.
 Towards operational near realtime flood detection using a splitbased automatic thresholding procedure on high resolution terrasarx data. Natural Hazards and Earth System Sciences 9, pp. 303–314. External Links: Document Cited by: §1.
 Tables. In Proc. of the 16th Foundations of Software Technology and Theoretical Computer Science Conference (FSTTCS 1996), pp. 37–42. Cited by: §2.1.

Wavelet trees for all.
In
Proc. of the 23rd Annual Symposium on Combinatorial Pattern Matching (CPM 2012)
, pp. 2–26. Cited by: §1.  Spatial databases  with applications to GIS. Elsevier. External Links: ISBN 9781558605886 Cited by: §1.
 The quadtree and related hierarchical data structures. ACM Comput. Surv. 16 (2), pp. 187–260. External Links: ISSN 03600300, Link, Document Cited by: §1.
 Randomized search trees. Algorithmica 16 (4/5), pp. 464–497. Cited by: §2.3.
 Encyclopedia of GIS. Springer. External Links: Link, Document, ISBN 9783319178844 Cited by: §1.

Benchmarking access methods for timeevolving regional data.
Data & Knowledge Engineering
49 (3), pp. 243–286. Cited by: §4.3.  A technique for highperformance data compression. Computer 17 (6), pp. 8–19. External Links: ISSN 00189162 Cited by: §1.
 GIS: a computing perspective, 2nd edition. CRC Press, Inc.. External Links: ISBN 0415283752 Cited by: §1.
 Supporting webbased visual exploration of largescale raster geospatial data using binned minmax quadtree. In Scientific and Statistical Database Management, M. Gertz and B. Ludäscher (Eds.), Berlin, Heidelberg, pp. 379–396. External Links: ISBN 9783642138188 Cited by: §1.