Hierarchical Information Quadtree: Efficient Spatial Temporal Image Search for Multimedia Stream

06/09/2018 ∙ by Chengyuan Zhang, et al. ∙ Central South University 0

Massive amount of multimedia data that contain times- tamps and geographical information are being generated at an unprecedented scale in many emerging applications such as photo sharing web site and social networks applications. Due to their importance, a large body of work has focused on efficiently computing various spatial image queries. In this paper,we study the spatial temporal image query which considers three important constraints during the search including time recency, spatial proximity and visual relevance. A novel index structure, namely Hierarchical Information Quadtree(), to efficiently insert/delete spatial temporal images with high arrive rates. Base on an efficient algorithm is developed to support spatial temporal image query. We show via extensive experimentation with real spatial databases clearly demonstrate the efficiency of our methods.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Due to the rapid development of modern web services and popularization of mobile smart devices, massive amount of multimedia data that contain both text information and geographical location information are being generated at an unprecedented scale. For instance, Facebook, the most popular social networks service, reports 350 million photos uploaded daily as of November 2013. More than 400 million tweets containing texts and images have been generated by 140 million Twitter users everyday. Tweets, each containing up to 140 characters, can be associated with locations, which may be coordinates (latitude and longitude) or semantic locations. Flickr, the largest photo sharing web site, had a total of 87 million registered members and more than 3.5 million new images uploaded daily in March 2013. 100 hours of video are uploaded to YouTube every minute, resulting in more than 2 billion videos totally by the end of 2013. Other social networks applications such as WeChat, Instgram, Weibo, Pinterest, generate vast amount of multimedia data day by day and are shared over the world. Mobile smart devices, such as smartphone, tablet and smart watch which are equiped with GPS module and wireless communication module, can take photos, make videos or post messages to social platforms. These multimedia data generally contain timestamps and geographical information. For another example, check-ins or reviews in location based social networks (e.g., Foursquare) contain both text descriptions and locations of points of interest (POIs). The emergence of massive multimedia lead to the new requirement such as temporal spatial multimedia data searching.

Top- temporal spatial image queries are intuitive and constitute a useful tool for many applications. It aims to find temporal spatial image objects that attend the three criteria simultaneously: they are similar or relevant in the aspect of visual content, they are inside the spatial area of interest and they are inside the query time constraints. Fig illustrates the spatial temporal images in a three dimension space: longitude , latitude and time . However, processing top- temporal spatial image queries efficiently is complex and requires a hybrid index combining information retrieval and spatial indexes. Besides, It is costly and involves accessing a huge amount of temporal geo-tagged multimedia data before finding the result set. To the best of our knowledge, we are the first to study this important issue and the previous works Focused on top- spatial image search without temporal information. The state-of-the-art approaches proposed by Alfarrarjeh et al. DBLP:journals/corr/AlfarrarjehS17 employ a hybrid index that evaluate both spatial and visual features in tandem.

Challenges. There are two key challenges in efficiently processing spatial temporal image queries over spatial temporal multimedia streams. Firstly, a massive number of geo-temporal images, typically in the order of millions, are posted in many applications, and hence even a small increase in efficiency results in significant savings. Secondly, the streaming geo-temporal images may continuously arrive in a rapid rate which also calls for high throughput performance for better user satisfaction.

Based on the above challenges, we propose a novel index technique, Hierarchical Information Quadtree(HI-Quadtree for short), to effectively organize continuous spatial temporal multimedia streams. In a nutshell, HI-Quadtree consists of two parts temporal segment and inverted quadtree. temporal segment makes sure that the newly incoming spatial temporal images are always inserted in the most recent segment, and provides timely answers to spatial temporal image queries. The inverted quadtree is essentially an quadtree, each node of which is enriched with reference to an inverted file for the images contained in the sub-tree rooted at the node. Through inverted quadtree, we takes spatial and visual dimensions into consideration synchronously during query processing. Extensive experiments show that our HI-Quadtree based spatial temporal image search algorithm achieves very substantial improvements over the nature extensions of existing techniques due to strong filtering power.

Contributions. The principle contributions of this paper are summarized as follows.

  • We formulate the problem of spatial temporal image query and identify its applications.

  • To facilitate the spatial temporal image search, we propose a novel indexing structure namely HI-Quadtree to effectively organize spatial temporal images.

  • Based on HI-Quadtree, we develop an efficient spatial temporal image search algorithm.

  • Comprehensive experiments show that our new matching algorithm achieves substantial improvements (up to three to five times speed up) over the nature extensions of state-of-the-art techniques.

Roadmap. The rest of this paper is organized as follows. Section 1.1 introduces the related work. Section 2 first gives an overview of architecture of spatial temporal image search system, then formally defines the problem of spatial temporal image search. Baseline technique are presented in Section 3. We introduce the techniques should be adopted in Section 4. Extensive experiments are reported in Section 5. Section  6 concludes the paper.

1.1 Related Work

In this section, we review the techniques of top-

spatial keywords query, temporal spatial keywords query and content-based image retrieval, which are related to our work.

Top- Spatial Keywords Query. Spatial keywords query is a hot issue attracting a lot of researchers in the community of database and information retrieval. A spatial keyword query takes a user location and user-supplied keywords as arguments and returns web objects that are spatially and textually relevant to these arguments DBLP:conf/er/CaoCCJQSWY12 . Many efficient indexing techniques have been proposed such as R-tree DBLP:conf/sigmod/Guttman84 , R-tree DBLP:conf/sigmod/BeckmannKSS90 , IR DBLP:journals/pvldb/CongJW09 . For top- spatial keyword query problem, Jo˜ao B. Rocha-Junior DBLP:conf/ssd/RochaGJN11 et al. propose a novel index called Spatial Inverted Index (S2I) which maps each distinct term to a set of objects containing the term. Based on S2I, they designed efficient algorithms named SKA and MKA to improve the performance of top- spatial keyword queries. Li et al. DBLP:journals/tkde/LiLZLLW11 proposed IR-tree that together with a top- document search algorithm facilitates four major tasks in document searches. Zhang et al. DBLP:conf/edbt/ZhangTT13 proposed a scalable integrated inverted index named which adopts the Quadtree structure to hierarchically partition the data space into cells. Besides, they presented a new storage mechanism for efficient retrieval of keyword cell and preserve additional summary information to facilitate pruning. In their other works DBLP:conf/sigir/ZhangCT14 , they proposed an effective approach to address the top- distance-sensitive spatial keyword query by modeling it as the well-known top- aggregation problem. Moreover, a novel and efficient approach named Rank-aware CA (RCA) algorithm is designed by them to improve the effectiveness of pruning. Zheng et al. DBLP:conf/icde/ZhengSZSXLZ15 studied interactive top- spatial keyword (ITSK) query and desinged a three-phase solution focusing on both effectiveness and efficiency. In order to solve the problem of top spatial keyword search (TOPK-SK) efficiently, Zhang et al. DBLP:journals/tkde/ZhangZZL16 proposed a novel index structure called inverted linear quadtree (IL-Quadtree) which is designed to utilize both spatial and keyword based pruning techniques to effectively reduce the search space.

João B. Rocha-Junior et al. DBLP:conf/edbt/Rocha-JuniorN12 solved the problem of processing top- spatial keyword queries on road networks for the first time. In this type of problem, the distance between the query location and the spatial object is the shortest path, rather than Euclidean distance. They presented novel indexing structures and algorithms that are able to process such queries efficiently. Guo et al. DBLP:journals/geoinformatica/GuoSAT15 studied continuous top- spatial keyword queries on road networks for the first time. They presented two methods that can monitor such moving queries in an incremental manner and reduce repetitive traversing of network edges for better performance.

The approaches above-mentioned is to search spatial objects with spatial information and keywords. However, they are not adequately suitable to solve the problem of top- temporal spatial image search.

Temporal Spatial Keywords Query. The aforementioned approaches consider only the spatial information and textual content of objects. However, temporal information is another significant dimension which should be considered in the processing of query. Temporal spatial keywords query is another important problem concerned by many researchers in recent years. Mehta et al. DBLP:conf/gis/MehtaSSV16 proposed a novel type of spatial-temporal-keyword query named CD-STK query which combines keyword search with the task of maximizing the spatio-temporal coverage and diversity of the returned top- results. Furthermore, an efficient approach which utilizes a hybrid spatial-temporal-keyword index is introduced by them to substantially improve the efficiency of query. Nepomnyachiy et al. DBLP:conf/gir/NepomnyachiyGJM14 introduced a search framework named 3W for geo-temporal stamped documents. Their system can efficiently processes multi-dimensional queries over text, space, and time. Chen et al. DBLP:conf/icde/ChenCCT15 consider the temporal spatial-keyword top- Subscription (TaSK) query. The TaSK query takes into account three aspects of objects: text relevance, spatial proximity and recency. They introduced a new concept Conditional Influence Region (CIR) to represent the TaSK query and proposed an algorithm for making use of the filtering conditions (of each group of queries on each spatial cell) to efficiently address the problem of TaSK. However, these approaches mentioned are just suitable to textual query, rather than image retrieval.

Content-Based Image Retrieval. Recently, Content-based image retrieval (CBIR for short) DBLP:journals/pami/JingB08 ; DBLP:journals/tomccap/LewSDJ06 ; DBLP:conf/mm/WangLWZ15 ; DBLP:journals/tip/WangLWZ17 ; DBLP:conf/mm/WangLWZZ14 ; DBLP:conf/sigir/WangLWZZ15 is widely noted in multimedia community, which is one of the fundamental research challenges. CBIR aims to search for images through analyzing their visual contents, and thus image representation DBLP:conf/mm/WanWHWZZL14 ; DBLP:journals/tip/WangLWZZH15 ; YANGTCYB ; YANGNeurocomputing ; YANGKAIS ; YangPAKDD14 ; LYACMMM13 ; YANGINS ; DBLP:conf/cikm/WangLZ13 ; DBLP:journals/tnn/WangZWLZ17 ; DBLP:journals/pr/WuWLG18 . Local feature representations such as the bag-of-visual-words (BoVW) models DBLP:conf/iccv/SivicREZF05 ; NNLS2018 ; DBLP:journals/ivc/WuW17 applying local feature descriptors such as SIFT DBLP:conf/iccv/Lowe99 ; DBLP:journals/ijcv/Lowe04 ; DBLP:journals/cviu/WuWGHL18 and SURF DBLP:conf/eccv/BayTG06 ; TC2018

. BoVWs represents an image by a vector of visual words which is constructed by vector quantization of feature descriptors 

DBLP:conf/iccv/SivicZ03 . Many researches worked for this issue over the years. For example, Irtaza et al. DBLP:journals/mta/IrtazaJAC14

proposed a neural network based architecture for content based image retrieval. In order to improve the capabilities of their approach, they designed an efficient feature extraction algorithm based on the concept of in-depth texture analysis. Bunte et al. 

DBLP:journals/pr/BunteBJP11 used two different methods to learn favorable feature representations Limited Rank Matrix Learning Vector Quantization (LiRaMLVQ) and Large Margin Nearest Neighbor (LMNN). Zhao et al. DBLP:conf/mm/ZhaoYYZ14 studied affective image retrieval and the performance of different features on different kinds of images. Xie et al. DBLP:conf/icmcs/XieYH14 presented a hypergraph-based framework integrating image content, user-generated tags and geo-location information into image ranking problem. Zhu et al. studied the problem of content-based landmark image search and proposed multimodal hypergraph (MMHG) to characterize the complex associations between landmark images. Based on it, they designed a novel content-based visual landmark search system to facilitate effective image search. However, these methods do not consider the temporal information and geographical proximity of images. Thus they are not suitable to the problem of temporal spatial image search.

2 System Overview

This section first gives an overview of architecture of spatial temporal image search system, then introduces the problem definition of spatial temporal image search.

2.1 System Architecture

The proposed spatial temporal image search system consists of three components, namely, preprocess, update, and query modules, as what is showed in Figure  1.

Figure 1: Spatial temporal image system architecture

Preprocess Module. This module used to receive the incoming spatial temporal image, extracts the location of each geo-temporal image, and forwards each geo-temporal image along with its extracted location to the update module with the form: (id, location, timestamp, visual word list)which describes the geo-temporal image’s identifier, geo-location, issuing time, and image contents. Location is either a precise latitude and longitude coordinates or the center of a Minimum Bounding Rectangle. This module will not be further discussed in the following sections, because we directly use existing preprocessing programme to extract the location information from public datasets.

Update Module. The update module ensures all incoming spatial temporal image can be inserted to in-memory indexes as soon as possible, and all incoming spatial temporal image queries can be answered accurately from the in-memory indexes with minimum possible memory consumption. This is done through two main tasks: (1) Inserting newly coming spatial temporal image into the latest in-memory index structure; (2) Deleting expire spatial temporal image from the most forward in-memory index structure without sacrificing the query answer quality

Query Module. Given a spatial temporal image search query, the query module employs spatio-temporal vusial pruning techniques that reduce the number of visited images to return the final answer. As the query module just retrieves the images in the index, the accuracy of result is mainly decided by the decisions taken at the update module on which spatial temporal image will expire from the in-memory index.

2.2 Problem Statement

In this section, we present problem definition and necessary preliminaries of top spatial temporal image search. Table 1 below summarizes the mathematical notations used throughout this section.

Notation Definition
  s geo-temporal image (query)
  a set of visual words used to describe I (query q)
  location of the image I (query q)
  timestamp of the image I (query q)
  a visual word in
  the number of query visual word in
  the number of results should be returned
  the preference parameter to balance the spatial proximity, visual relevance and temporal recency
  the preference parameter to balance the spatial proximity, visual relevance and temporal recency
  the preference parameter to balance the spatial proximity, visual relevance and temporal recency
  the spatial proximity between and
  the temporal recency between and
  the visual relevance between and
  the spatial temporal visual ranking score between and
Table 1: Notations

In this section, denotes a sequence of incoming stream geo-temporal images. A geo-temporal image is an image message with geo-location and timestamp, such as geo-temporal photos in Flickr. Formally, a geo-temporal image is modeled as = , where denotes a set of distinct visual words from a visual vocabulary set , represents a geo-location with latitude and longitude, and represented the creation timestamp of the image.

Definition 1 (Top- Spatial Temporal Image Query)

A top- spatial temporal image query is defined as q = , where is a set of distinct visual words extracted from query image, is the query location, is the user submitted timestamp, is the number of the result user expected.

Definition 2 (Spatial Proximity )

Let denote the maximal distance in the space, the spatial relevance between the query and the image , denoted by , is defined as .

Similar to  DBLP:journals/pvldb/CongJW09 , we adopt the language model based function to measure the visual words’ relevance of the image regarding the query , which is defined as follows.

Definition 3 (Visual Relevance )

Let denote the weight of the visual word regarding image , and


where and are the term frequency of in and respectively. Here, represents the visual word information of all images in the whole dataset and is a smoothing parameter. Then the visual relevance between and is defined as follows.


where is used for normalization.

Definition 4 (Temporal Recency )

The temporal recency between the query and the image , denoted by , is calculated by the following exponential decay function:


where is base number that determines the rate of the recency decay. The function is monotonically decreasing with . It is introduced in  DBLP:conf/sigmod/LuLC11 and is applied (e.g.,  DBLP:conf/cikm/AmatiAG12 ) as the measurement of recency for stream data. Based on the experimental studies  DBLP:conf/sigir/EfronG11 , the exponential decay function has been shown to be effective in blending the recency and text relevancy of objects. Thus, we use the exponential decay function to blend the recency and visual relevancy of image.

Based on the spatial proximity, visual relevance and temporal recency between the query and the spatial temporal image, the Spatial-temporal Ranking Score of an image regarding the query can be defined as follows.

Definition 5 (Spatial Temporal Visual Score )

Let , , be the preference parameter specifying the trade-off among the spatial proximity, visual relevance and temporal recency, and we have


Note that the images with the small score values are preferred (i,e., ranked higher).

Definition 6 (Spatial Temporal Image Search)

Given a set of geo-textual Image and a spatial temporal image query , we aim to find the top geo-temporal images with smallest spatial temporal visual Score score.

In addition, we require that only images with can be returned as query results, so as to avoid giving completely irrelevant answers to users’ queries. This implies that and should have at least one common term.

In the section hereafter, we abbreviate the geo-temporal image and the geo-temporal query as and respectively, if there is no ambiguity. We assume there is a total order for visual words in , and the words in each query and image are sorted accordingly.

3 Baseline

Before proceeding to present the proposed solution, we discuss the possibility of using conventional techniques for the processing of spatial temporal image queries. In the following, we develop two baselines by utilizing existing techniques: Inverted File Append (IFA for short) and Spatial Temporal Visual Inverted Index(STV2I).

3.1 Inverted File Append

A nice property of the inverted indexing structure is that, for a given query , only the objects containing at least one query keyword will be involved in the search. Thus, inverted file is widely used to process textual queries efficiently[14].

To adopt inverted file for spatial temporal image search, the simplest approach is to treat each geo-temporal image as a document, and sort the entries in each posting list in ascending order of the corresponding geo-temporal images’ timestamps. This approach is highly efficient in terms of geo-temporal image insertions, as new geo-temporal image can be easily appended to the ends of posting lists without affecting the ordering of entries. In terms of query efficiency, however, the aforementioned approach faces a great challenge. This is because our spatial temporal visual ranking score evaluates a geo-temporal image based on three factors: its spatial proximity , visual relevance , and temporal recency . If the entries in a posting list are sorted in ascending order of timestamps, the corresponding geo-temporal image would be in ascending order of temporal recency, regardless of their spatial proximity or visual relevance. In other words, the entry order does not provide any hint on the overall score of each geo-temporal image. As a consequence, when answering a query , we have to examine most of entries in all posting lists relevant to , since the omission of any entry may render the query results incomplete.

3.2 Spatial Temporal Visual Inverted Index

To the best of our knowledge, there is no index in the literature that can filter objects taking into account the three criteria: spatial, temporal and visual. Thus, in this section, we present an hybrid index, namely spatial temporal visual inverted index(STV2I), which may filter geo-temporal images taking into account the spatial, temporal and visual information simultaneously.

The STV2I is similar to traditional spatial keyword search index S2I. The major difference is how to use the spatial index. S2I uses a R-tree to select objects that are spatially relevant, but STV2I employs a 3D-Rtree DBLP:conf/icmcs/TheodoridisVS96 , which takes spatial and temporal dimensions into consideration synchronously.

Obviously, the 3D-Rtree can filter geo-temporal that are spatially and temporally unrelated in the early phase of the query processing. Unfortunately, it may meets poor performance in some situation. The minimum bounding regions of 3D-Rtree are 3D rectangles, because the geo-temporal images stored in 3D-Rtree has three dimensions: time, latitude, and longitude. Meanwhile, space and time are not correlated dimensions. Thus, the minimum bounding regions of 3D-Rtree may cover large areas of the space, which may result in large area overlap. In this scenario, long periods of time or large spatial regions may give rise to poor performance.

4 Spatial Temporal Visual Indexing

Due to massive amount of spatial temporal images and queries are being generated at an unprecedented scale, three main objectives have to be satisfy in our spatial temporal visual indexing. First, the proposed index has to be able to handle high arrival rates of incoming spatial temporal images. Second,expired objects can be deleted from its index with the approximative rate as insertion. Third, a large number of unpromising spatial temporal images can be filtered at a cheap cost.

4.1 Our proposed: Hierarchical Information Quadtree

Based on the above requirement, in this sub-section, we present a Hierarchical Information Quadtree (HI-Quadtree for short) that supports update at high arrive rate and provides the following required functions for spatial temporal image search and ranking: I)temporal filtering: all the temporally irrelevant trees, nodes and images have to be accessed as late as possible to follow the chronological order; II)spatial filtering: all the spatially irrelevant nodes have to be filtered out as early as possible to shrink the search space; III)visual word filtering: all the visually irrelevant trees, nodes and images have to be discarded as early as possible to cut down the search cost; and IV)relevance computation and ranking: since only the top- images are returned and is expected to be much smaller than the total number of similar images, it is desirable to have an incremental search process that integrates the computation of the joint relevance, and image ranking seamlessly so that the search process can stop as soon as the top- images are identified. Figure  2 shows two levels of HI-Quadtree, namely, temporal segment and inverted quadtree.

Figure 2: Example of hierarchical information quadtree

Temporal Segment

To achieve fast insertion and deletion, all spatial temporal images are temporally partitioned into successive disjoint index segments. For example, each segment only indexes the data of hours. Thus, it makes sure that the newly incoming spatial temporal images are always inserted in the most recent segment. Once the segment spans hours of data, the segment is terminated and a new empty segment is generated to insert the new spatial temporal image.

Inverted Quadtree

To support high arrival rates of incoming objects, space-partitioning index (e.g., Quadtree DBLP:journals/cacm/Gargantini82 ; DBLP:conf/pakdd/WangLZW14 ; DBLP:journals/corr/abs-1708-02288 ; DBLP:conf/icde/ZhangZZL13 ; DBLP:journals/tkde/ZhangZZL16 ; TSMCS2018 , and Pyramid DBLP:conf/pods/ArefS90 ; DBLP:conf/ijcai/WangZWLFP16 ; DBLP:journals/pr/WuWGL18 ; TII2018 ) is more famous than object-partitioning index (e.g., R-tree). As space-partitioning index is more suitable to high update system because of its disjoint space decomposition policy, while the shape of object-partitioning index is highly affected by the rate and order of incoming data, which may trigger a large number of node splitting and merging. Meanwhile, inverted file, which is the most efficient index for text information retrieval, can easily extend to visual words. Thus, we propose a hybrid indexing structure, namely inverted quadtree, that utilizes both indexing structures in a combined fashion.

The inverted quadtree is essentially an quadtree, each node of which is enriched with reference to an inverted file for the images contained in the sub-tree rooted at the node. In particular, each node of an inverted quadtree contains all spatial, temporal, and visual words information; the first is in the form of a rectangle, the second is in the form of timestamp, and the last is in the form of an inverted file.

More formally, the leaf node of Inverted Quadtree has the form (, , ). where refers to a set of images belonged to current node, is the area covered by current node, and is the latest timestamp aggregated from the images. A leaf node also contains a pointer to a visual inverted file for the visual words of the images being indexed.

An visual inverted file consists of a vocabulary for all distinct visual words in a collection of images and a set of posting lists related to this vocabulary. Each posting list is a sequence of visual pairs , where refers to a image containing visual words , and is the weight of term in image .

An inner node contains a number of entries of the form (, , ). are the address of the children nodes, is the area covered by current node, and

is the latest timestamp aggregated from its children nodes. A inner node also contains a pointer to a visual inverted file which is aggregated from the visual inverted files from its child node. This inverted file includes all images in the entries of current node, enabling us to estimate a bound of the visual relevancy to a query of all images contained in the subtree rooted at current node. The weight of each visual word

in the inverted index is the maximum weight of the visual word in the images contained in the subtree rooted at current node. Fig depicts an Inverted Quadtree indexing structure.

4.2 Processing of Spatial Temporal Image Queries

0:   the spatial temporal image query, the number of image return, current HI-Quadtree index
0:   top- query result results
1:  ; ,
2:   new a min first heap
3:  build frequency signature for query
4:  .Enqueue()
5:  while   do
6:       the node popped from
7:      if  is a leaf node then
8:          for  each image in node  do
9:              if  then
11:                 update by
12:              end if
13:          end for
14:      else
15:          for  each child in node  do
16:              if  then
17:                 .Enqueue()
18:              end if
19:          end for
20:      end if
21:      process the root node of next HI-Quadtree
22:  end while
23:  return 
Algorithm 1 Spatial Temporal Image Search(, , )

We proceed to present an important metric, the minimum spatial temporal visual score , which will be used in the query processing. Given a query and a node in the HI-Quadtree, the metric offers a lower bound on the actual spatial temporal visual score between query and the images enclosed in the rectangle of node . This bound can be used to order and efficiently prune the paths of the search space in the HI-Quadtree.

Definition 7 ()

The score of a query point from a node in the HI-Quadtree, denoted as , is defined as follows:


where , , and are the same as in Equation 4; is the minimum Euclidian distance between and , is the minimum visual relevance between and , is the minimum time recency between and .

A salient feature of the proposed HI-Quadtree structure is that it inherits the nice properties of the Quadtree for query processing.

Theorem 4.1

Given a query point , a node , and a set of geo-temporal images in node , for any , we have .


Since geo-temporal image is enclosed in the rectangle of node , the minimum Euclidian distance between and is no larger than the Euclidian distance between and :

For each timestamp , is the maximum value of all the geo-temporal images in node . Hence:

For each visual word , (the weight of the visual word in , which is the inverted file of node ) is the maximum value of all the geo-temporal images in node . Thus:

According to Equation 4 and Equation 5, we obtain:

thus completing the proof.

When searching the HI-Quadtree for the objects nearest to a query , one must decide at each visited node of the HI-Quadtree which entry to search first. Metric offers an approximation of the spatial-temporal ranking score to every entry in the node and, therefore, can be used to direct the search. Note that only node satisfied the constraint of query keywords need to be loaded into memory and compute .

To process spatial temporal image queries with HI-Quadtree framework, we exploit the best-first traversal algorithm for retrieving the top-k objects. With the best-first traversal algorithm, a priority queue is used to keep track of the nodes and objects that have yet to be visited. The values of and are used as the keys of objects and nodes, respectively.

When deciding which node to visit next, the algorithm picks the node with the smallest value in the set of all nodes that have yet to be visited. The algorithm terminates when nearest objects (ranked according to Equation 4) have been found.

Algorithm 1 illustrates the details of the HI-Quadtree based spatial temporal image query. A minimum heap is employed to keep the HI-Quadtree’s nodes where the key of a node is its minimal spatial temporal visual ranking score. For the input query, we calculate its frequency signature in Line 3. In Line  4, we find out the root node of current time segment, calculate the minimal spatial temporal visual ranking score for the root node, and then pushed the root node into the . The the algorithm executes the while loop (Line 5-21)until the top- results are ultimately reported in Line 23.

In each iteration, the top entry with minimum spatial temporal visual ranking score is popped from . When the popped node is a leaf node(Line 7), for each signature in node , we will iterator extract the image and check whether its spatial temporal visual score is less than . If its score is not larger than , we push into result set and add update . When the popped node is a non-leaf node(Line 15), a child node of will be pushed to if its minimal spatial temporal visual ranking score between and , denoted by , is not larger than (Line 1517). We process the root node of next interval in Line 21. The algorithm terminates when is empty and the results are kept in .

5 Performance Evaluation

In this section, we present the results of a comprehensive performance study to evaluate the effectiveness and efficiency of our techniques proposed in this paper.

Workload. A workload for this experiment consists of 100 input queries, and the average query response time are employed to evaluate the performance of the algorithms. The query locations are randomly selected from the underlying dataset. By default, the number of query visual keywords varies from 10 to 200, the number of results grows from 10 to 100, and the preference parameter changes from 0:1 to 0:9.

Experiments are run on a PC with Intel i7 7700K 4.20GHz CPU and 16GB RAM running Ubuntu 16.04 LTS Operation System. All algorithms in the experiments are implemented in Java. For a fair comparison, we tune the important parameters of the competitor algorithms for their best performance. Particularly, the node capacity of all algorithms is set to 100. Our measures of performance include insertion time, deletion time, storage overhead, number of node access and response time. The rest of this section evaluates index maintenance and query processing.

Datasets Number of Images Dist. Visual Words Number Avg. Visual Words Number
200K 200000 616347 124.7
400K 400000 613940 118.2
600K 600000 613026 114.3
800K 800000 607401 128.6
1M 1000000 612905 134.8
Table 2: Information of datasets

Dataset. We first evaluate the scalability and performance of our system on an image dataset of over one million images crawled from the photo-sharing site, Flickr, using Oxford landmarks as queries. For the scalability and performance evaluation, we randomly sampled five sub datasets whose sizes vary from 100,000 to 2000,000 from the image dataset.

5.1 Index Maintenance

In this subsection, we evaluate the insertion time, deletion time, storage overhead of all the algorithms.

Evaluation on insertion time. Fig. gives the performance when varying the arrival rate from 200 to 3200. It is clearly that the average insertion time of these three algorithm gradually grows with the increasing of arrival rate. Both HIQ and IFA have a better performance than STVI, shown as Fig.(a). As IFA adopts a simple data structure, the insertion need less time than HIQ. Fig.(b) illustrates that the average insertion time of HIQ and STVI with varying node capacity from 100 to 500. As HIQ has no node capacity, we just compare HIQ and STVI. Apparently, the time cost of STVI is nearly 4 times of HIQ.

Evaluation on deletion time. Fig indicates that the average deletion time of these three algorithm. With the rising of arrival rate from 200 to 3200, all of them increase step by step. Like the situation of evaluation on insertion time, the performance of IFA is the best due to its simple structure. the deletion time of our method is less than STVI. We can find out from the Fig.(b) that when varying the node capacity from 100 to 500, the performance of HIQ and STVI wave slightly. The former fluctuate between 70 and 75, and the latter As expected, HIQ has the best performance when node capacity changes.

5.2 Time Evaluation

In this subsection, we evaluate the query response time of all the algorithms.

Effect of the number of query visual words. To investigate the response time of IFA, STVI and our HIQ algorithm under queries with different number of visual words, we increase the number of visual query visual words from 10 to 200. Fig.(a) evaluates the response time of these three methods where the number of query visual words varies from 10 to 200. Not surprisingly, all of them increase gradually with the rising of and the performance of HIQ is the best. The time cost of HIQ and STVI grow faster than IFA when .

Effect of the number of returned results. We increase the number of results from 10 to 200 and evaluate the time cost of these three methods. In Fig.(b), our methods HIQ shows superior performance in comparison with other algorithms, which increases step by step with the growth of . Clearly, The trend of STVI is similar to HIQ. On the other hand, the response time of IFA is almost unchanged.

Effect of dataset size. We study the response time under different sizes of image datasets . The experimental results are shown in Fig.(c). It is obvious that with the increasing of , the time cost of IFA, STVI and HIQ ascend by degrees. When , the climbing of them is slow down. Not surprisingly, the performance of our method is the best of them.

Effect of weight . In the next experiment, we vary from 1/7 to 5/7, setting = = . Fig.(d) demonstrates the processing cost of each method as a function of . We can observe that the performance of these three methods are practically unchanged in the interval . Like the evaluation above-mentioned, the performance of HIQ is the highest among them. Fig.(e) illustrate the results of evaluation on varying . We can see that with the growth of all of the performance of these three algorithms slightly slow down. In Fig.(f), the response time of HIQ and IFA gently decrease and the performance of STVII is almost unchanged.

6 Conclusion

To the best of our knowledge, this is the first work to study the problem of spatial temporal image queries over streaming spatial temporal multimedia stream, which has a wide spectrum of application. To tackle with this problem, we propose a novel spatial temporal visual indexing structure, namely HI-Quadtree, efficiently organize a massive number of streaming spatial temporal images such that each incoming query submitted by users can rapidly find out the top- results. An efficient spatial temporal image search algorithm based on HI-Quadtree is designed to deal with this problem. Extensive experiments demonstrate that our technique achieves a high throughput performance over streaming spatial temporal multimedia data.

Acknowledgments: This work was supported in part by the National Natural Science Foundation of China (61379110, 61472450, 61702560), the Key Research Program of Hunan Province(2016JC2018), and project 2018JJ3691 of Science and Technology Plan of Hunan Province.


  • (1) Alfarrarjeh, A., Shahabi, C.: Hybrid indexes to expedite spatial-visual search. CoRR abs/1702.05200 (2017)
  • (2) Amati, G., Amodeo, G., Gaibisso, C.: Survival analysis for freshness in microblogging search. In: 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, HI, USA, October 29 - November 02, 2012, pp. 2483–2486 (2012)
  • (3) Aref, W.G., Samet, H.: Efficient processing of window queries in the pyramid data structure. In: Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, April 2-4, 1990, Nashville, Tennessee, USA, pp. 265–272 (1990)
  • (4) Bay, H., Tuytelaars, T., Gool, L.J.V.: SURF: speeded up robust features.

    In: Computer Vision - ECCV 2006, 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006, Proceedings, Part I, pp. 404–417 (2006)

  • (5) Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The r*-tree: An efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, May 23-25, 1990., pp. 322–331 (1990)
  • (6) Bunte, K., Biehl, M., Jonkman, M.F., Petkov, N.: Learning effective color features for content based image retrieval in dermatology. Pattern Recognition 44(9), 1892–1902 (2011)
  • (7) Cao, X., Chen, L., Cong, G., Jensen, C.S., Qu, Q., Skovsgaard, A., Wu, D., Yiu, M.L.: Spatial keyword querying. In: Conceptual Modeling - 31st International Conference ER 2012, Florence, Italy, October 15-18, 2012. Proceedings, pp. 16–29 (2012)
  • (8) Chen, L., Cong, G., Cao, X., Tan, K.: Temporal spatial-keyword top-k publish/subscribe. In: 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pp. 255–266 (2015)
  • (9) Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB 2(1), 337–348 (2009)
  • (10) Efron, M., Golovchinsky, G.: Estimation methods for ranking recent information. In: Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, July 25-29, 2011, pp. 495–504 (2011)
  • (11) Gargantini, I.: An effective way to represent quadtrees. Commun. ACM 25(12), 905–910 (1982)
  • (12) Guo, L., Shao, J., Aung, H.H., Tan, K.: Efficient continuous top-k spatial keyword queries on road networks. GeoInformatica 19(1), 29–60 (2015)
  • (13) Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD’84, Proceedings of Annual Meeting, Boston, Massachusetts, June 18-21, 1984, pp. 47–57 (1984)
  • (14) Huang, M., Liu, A., Xiong, N., Wang, T., Vasilakos, A.V.: A low-latency communication scheme for mobile wireless sensor control systems. IEEE Trans. Systems Man Cybernetics-Systems (2018)
  • (15) Irtaza, A., Jaffar, M.A., Aleisa, E., Choi, T.: Embedding neural networks for semantic association in content based image retrieval. Multimedia Tools Appl. 72(2), 1911–1931 (2014)
  • (16) Jing, Y., Baluja, S.: Visualrank: Applying pagerank to large-scale image search. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1877–1890 (2008)
  • (17) Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: State of the art and challenges. TOMCCAP 2(1), 1–19 (2006)
  • (18) Li, Z., Lee, K.C.K., Zheng, B., Lee, W., Lee, D.L., Wang, X.: Ir-tree: An efficient index for geographic document search. IEEE Trans. Knowl. Data Eng. 23(4), 585–599 (2011)
  • (19) Liu, X., Liu, Y., Liu, A., Yang, L.T.: Defending on-off attacks using light probing messages in smart sensors for industrial communication systems. IEEE Trans. Industrial Informatics (2018)
  • (20) Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV, pp. 1150–1157 (1999)
  • (21) Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
  • (22) Lu, J., Lu, Y., Cong, G.: Reverse spatial and textual k nearest neighbor search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011, pp. 349–360 (2011)
  • (23) Mehta, P., Skoutas, D., Sacharidis, D., Voisard, A.: Coverage and diversity aware top-k query for spatio-temporal posts. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2016, Burlingame, California, USA, October 31 - November 3, 2016, pp. 37:1–37:10 (2016)
  • (24) Nepomnyachiy, S., Gelley, B., Jiang, W., Minkus, T.: What, where, and when: keyword search with spatio-temporal ranges. In: Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014, Dallas/Fort Worth, TX, USA, November 4-7, 2014, pp. 2:1–2:8 (2014)
  • (25) Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., Nørvåg, K.: Efficient processing of top-k spatial keyword queries. In: Advances in Spatial and Temporal Databases - 12th International Symposium, SSTD 2011, Minneapolis, MN, USA, August 24-26, 2011, Proceedings, pp. 205–222 (2011)
  • (26) Rocha-Junior, J.B., Nørvåg, K.: Top-k spatial keyword queries on road networks. In: 15th International Conference on Extending Database Technology, EDBT ’12, Berlin, Germany, March 27-30, 2012, Proceedings, pp. 168–179 (2012)
  • (27) Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their localization in images. In: 10th IEEE International Conference on Computer Vision (ICCV 2005), 17-20 October 2005, Beijing, China, pp. 370–377 (2005)
  • (28) Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: 9th IEEE International Conference on Computer Vision (ICCV 2003), 14-17 October 2003, Nice, France, pp. 1470–1477 (2003)
  • (29) Theodoridis, Y., Vazirgiannis, M., Sellis, T.K.: Spatio-temporal indexing for large multimedia applications. In: Proceedings of the IEEE International Conference on Multimedia Computing and Systems, ICMCS 1996, Hiroshima, Japan, June 17-23, 1996, pp. 441–448 (1996)
  • (30)

    Wan, J., Wang, D., Hoi, S.C., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: A comprehensive study.

    In: Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03 - 07, 2014, pp. 157–166 (2014)
  • (31) Wang, Y., Huang, X., Wu, L.: Clustering via geometric median shift over riemannian manifolds. Information Sciences 220, 292–305 (2013)
  • (32) Wang, Y., Lin, X., Wu, L., Zhang, Q., Zhang, W.: Shifting multi-hypergraphs via collaborative probabilistic voting. Knowledge and Information Systems 46, 515–536 (2016)
  • (33) Wang, Y., Lin, X., Wu, L., Zhang, W.: Effective multi-query expansions: Robust landmark retrieval. In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM ’15, Brisbane, Australia, October 26 - 30, 2015, pp. 79–88 (2015)
  • (34) Wang, Y., Lin, X., Wu, L., Zhang, W.: Effective multi-query expansions: Collaborative deep networks for robust landmark retrieval. IEEE Trans. Image Processing 26(3), 1393–1404 (2017)
  • (35) Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q.: Exploiting correlation consensus: Towards subspace clustering for multi-modal data. In: Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03 - 07, 2014, pp. 981–984 (2014)
  • (36) Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q.: LBMCH: learning bridging mapping for cross-modal hashing. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, August 9-13, 2015, pp. 999–1002 (2015)
  • (37) Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q., Huang, X.: Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans. Image Processing 24(11), 3939–3949 (2015)
  • (38) Wang, Y., Lin, X., Zhang, Q.: Towards metric fusion on multi-view data: a cross-view based graph random walk approach. In: 22nd ACM International Conference on Information and Knowledge Management, CIKM’13, San Francisco, CA, USA, October 27 - November 1, 2013, pp. 805–810 (2013)
  • (39) Wang, Y., Lin, X., Zhang, Q., Wu, L.: Shifting hypergraphs by probabilistic voting. In: Advances in Knowledge Discovery and Data Mining - 18th Pacific-Asia Conference, PAKDD 2014, Tainan, Taiwan, May 13-16, 2014. Proceedings, Part II, pp. 234–246 (2014)
  • (40)

    Wang, Y., Pei, J., Lin, X., Zhang, Q., Zhang, W.: An iterative fusion approach to graph-based semi-supervised learning from multiple views.

    In: PAKDD, pp. 162–173 (2014)
  • (41)

    Wang, Y., Wu, L.: Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering.

    Neural Networks 103, 1–8 (2018)
  • (42) Wang, Y., Wu, L., Lin, X., Gao, J.: Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans. Neural Networks and Learning Systems (2018)
  • (43) Wang, Y., Zhang, W., Wu, L., Lin, X., Fang, M., Pan, S.: Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering.

    In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, pp. 2153–2159 (2016)

  • (44) Wang, Y., Zhang, W., Wu, L., Lin, X., Zhao, X.: Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion. IEEE Trans. Neural Netw. Learning Syst. 28(1), 57–70 (2017)
  • (45) Wu, L., Wang, Y.: Robust hashing for multi-view data: Jointly learning low-rank kernelized similarity consensus and hash functions. Image Vision Comput. 57, 58–66 (2017)
  • (46) Wu, L., Wang, Y., Gao, J., Li, X.: Deep adaptive feature embedding with local sample distributions for person re-identification. Pattern Recognition 73, 275–288 (2018)
  • (47)

    Wu, L., Wang, Y., Ge, Z., Hu, Q., Li, X.: Structured deep hashing with convolutional neural networks for fast person re-identification.

    Computer Vision and Image Understanding 167, 63–73 (2018)
  • (48) Wu, L., Wang, Y., Li, X., Gao, J.: Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Trans. Cybernetics (2018)
  • (49) Wu, L., Wang, Y., Li, X., Gao, J.: What-and-where to match: Deep spatially multiplicative integration networks for person re-identification. Pattern Recognition 76, 727–738 (2018)
  • (50) Wu, L., Wang, Y., Pan, S.: Exploiting attribute correlations: A novel trace lasso based weakly supervised dictionary learning method. IEEE Transactions on Cybernetics 47(12), 4479–4508 (2017)
  • (51) Wu, L., Wang, Y., Shepherd, J.: Efficient image and tag co-ranking: a bregman divergence optimization method. In: ACM Multimedia, pp. 593–596 (2013)
  • (52) Wu, L., Wang, Y., Shepherd, J., Zhao, X.: Max-sum diversification on image ranking with non-uniform matroid constraints. Neurocomputing 118, 10–20 (2013)
  • (53) Xie, Y., Yu, H., Hu, R.: Multimodal information joint learning for geotagged image search. In: 2013 IEEE International Conference on Multimedia and Expo Workshops, Chengdu, China, July 14-18, 2014, pp. 1–6 (2014)
  • (54) Zhang, C., Zhang, Y., Zhang, W., Lin, X.: Inverted linear quadtree: Efficient top k spatial keyword search. In: 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013, pp. 901–912 (2013)
  • (55) Zhang, C., Zhang, Y., Zhang, W., Lin, X.: Inverted linear quadtree: Efficient top K spatial keyword search. IEEE Trans. Knowl. Data Eng. 28(7), 1706–1721 (2016)
  • (56) Zhang, D., Chan, C., Tan, K.: Processing spatial keyword query as a top-k aggregation query. In: The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’14, Gold Coast , QLD, Australia - July 06 - 11, 2014, pp. 355–364 (2014)
  • (57) Zhang, D., Tan, K., Tung, A.K.H.: Scalable top-k spatial keyword search. In: Joint 2013 EDBT/ICDT Conferences, EDBT ’13 Proceedings, Genoa, Italy, March 18-22, 2013, pp. 359–370 (2013)
  • (58) Zhao, S., Yao, H., Yang, Y., Zhang, Y.: Affective image retrieval via multi-graph learning. In: Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03 - 07, 2014, pp. 1025–1028 (2014)
  • (59) Zheng, K., Su, H., Zheng, B., Shang, S., Xu, J., Liu, J., Zhou, X.: Interactive top-k spatial keyword queries. In: 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pp. 423–434 (2015)