With the rapid development of mobile Internet and cloud computing technology, large scale multimedia data DBLP:conf/cikm/WangLZ13 ; DBLP:conf/mm/WangLWZZ14 ; DBLP:conf/mm/WangLWZ15 , e.g., texts, images DBLP:journals/tip/WangLWZ17 , audio and videos have been generated, collected, stored and shared. For example. Facebook, the most famous online social networks services, reports more then 300 million photos uploaded and shared daily in the November 2013. More than 3.5 million photos had been uploaded by 87 million registered users of Flickr, which is the largest online photo sharing service. More than 140 million Twitter users posts 400 million tweets which contain 140 characters in text and images with geographical information like latitude and longitude. YouTube, the largest video sharing web site in the world, shares more than 100 hours of videos every minutes in the end of 2013. The total amount of users in Himalaya which is a popular audio sharing platform are already more than 470 million. As of December 2015, the total amount of audio has exceeded 15 million in Himalaya. Beyond all doubt, unlike the situation in the past, the mess multimedia data account for nearly 80% total amount of data in the big data environment. As it knows to all, advanced mobile devices equipped with wireless network module, high definition camera and microphone such as smartphones and tablets, together with other popular mobile applications and location based services (LBS for short) like WeChat, Uber, Amap and etc., bring a lot of convenience to people everyday by collecting and sharing massive multimedia data with geo-location information. In the meanwhile, more challenges are raised concerning multimedia data retrieval DBLP:journals/tnn/WangZWLZ17 .
, content-based image retrievalDBLP:journals/cviu/WuWGHL18 ; DBLP:journals/pr/WuWLG18 is applied in many applications such as image processing and retrieval DBLP:journals/corr/abs-1708-02288 ; DBLP:conf/pakdd/WangLZW14 . The basic objective of CBIR is to automatically extract low-level visual features from images, such as color, texture, edge, shape and etc. There are two important part of CBIR research, i.e., image representation and visual similarity measurement. Many researchers work for these two problem and many approaches have been proposed. Thomee et al. DBLP:journals/ijmir/ThomeeL12 proposed a review of interactive search in image retrieval. Fu et al. DBLP:conf/ICCC/Fu2016
proposed a solution which combines convolution neural networkDBLP:journals/pr/WuWGL18 (CNN) and CBIR for image retrieval. They utilized support victor machine (SVM) to train a hyerplane which can separate similar image pairs and dissimilar image pairs to a large degree. Norouzi et al. DBLP:conf/nips/0002FS12
designed a mapping learning approach for large scale multimedia applications from high-dimensional data to binary codes that preserve semantic similarity.
The smart devices equipped high definition camera, wireless mobile communication module and GPS-module, like smartphone and tablet, are used by lots of users to take photos and shared on the Internet easily everyday and everywhere. These photos record nice sense and on the other hand, they have geo-tags which record the geographical information of where they were taken. We can treat these images with geo-tags as geo-visual objects which contains two parts of features, i.e., visual content features and geographical features. Like the problem of spatial textual query on road network, more and more people are beginning to pay close attention to geo-tagged image search on road network.
To the best of our knowledge, we are the first to study the problem of continuous top- geo-visual objects query on road network, which aims to search out best geo-visual objects generated from images with geo-tags taking into account two types of relevancy: (1)road network distance proximity between query location and objects, (2)visual content similarity between query and objects. Both of query and objects consist of geographical information and visual content information, and with the moving of query on road network, the query continuously returns satisfactory results. In order to describe this new problem more clearly and concretely, we introduce two examples as follows:
As illustrated in Figure. 1, an user is driving back from work and she want to buy a handbag which is same as one in a photo. But she do not know its brand and the item number. Obviously, describing a handbag in great detail by a few words or a sentence is very challenging. Furthermore, she have no idea which nearby shop has the satisfactory style of handbags, she want to search out which shop has this type of handbag on her way home. In such case, she can input this photo and her current location information by her smartphone as a top- geo-visual object query on road network to a geo-tagged multimedia data retrieval system when her driving. Then this system according to the driving route returns a set containing geo-visual objects meeting her requirements. that can tell the user which shops have this kind of handbag and are close to her location. With the location of the user changing, the system will dynamically update the result set. In Figure. 1, the geo-visual objects are the cyan small point and the query point is in the red circle and the red arrow indicates it moving direction.
As illustrated in Figure. 2, a photographer traveling in a natural scenic area want to take some photo about sunset. As she is unfamiliar with the environment of this scenic area, she cannot choose a good position near her current location to take a good work. In this case, she do not need to depict the picture in her mind in detail in words. It is only necessary to select a photo which can represent her intention as the query image and then input it and her location to the geo-tagged multimedia retrieval system when she walks through the scenery. A top- geo-visual objects query will be processed and the set of results containing photos taking by other photographers or tourists can instruct her which position is the better choice and which route is the nearest.
This paper aims to implement the challenging application described in example 1 and example 2, namely, implementing efficient continuous top- geo-visual objects query on road network. We propose a novel hybrid indexing framework named VIG-Tree which combines G-Tree and visual inverted index technique. To process geo-visual queries with VIG-Tree, we exploit the best-first traversal algorithm for retrieving the top- geo-visual objects. In order to further reduce the computational cost, we introduce the notion of safe interval in road network and design the results updating rule in the process of query moving. An efficient algorithm named moving monitor algorithm is developed to improve search efficiency.
Contributions. Our main contributions can be summarized as follows:
To the best of our knowledge, we are the first to study continuous top- geo-visual objects query on road network. Firstly we propose the definition of geo-visual object and top- geo-visual object query on road network. The socre function containing road network distance proximity component and visual content similarity component is designed and then we define the continuous geo-visual object and top- geo-visual object query on road network.
We present a hybrid index framework named VIG-Tree to support the geo-visual object search on road network, which combines G-Tree and visual inverted index technique. Then a algorithm named geo-visual search on road network is developed.
In order to reduce the computational cost of query, we propose the notion of safe interval and results updating rule, and then introduce moving monitor algorithm.
We have conducted extensive experiments on real multimedia dataset and road network dataset. Experimental results demonstrate that our solution outperforms the state-of-the-art method.
Roadmap. The remainder of this paper is organized as follows: We review the related work in Section 2. Section 3 introduces the definition of the continuous top- geo-visual objects query on road network as well as relevant notions. We propose a novel hybrid index framework named VIG-Tree and then present the geo-visual search on road network algorithm in section 4. Section 5 present the definition of safe interval and moving monitor algorithm. The experimental results are shown in Section 6, and finally we draw our conclusion of this paper in Section 7.
2 Related work
In this section, we introduce an overview of existing researches of image retrieval and spatial keyword query on road network, which are related to this work. To the best of our knowledge, there is no existing work on the problem of continuous top- geo-visual objects query on road network.
2.1 Content-Based Image Retrieval
Content-based image retrieval (CBIR for short) is important research problem in the area of multimedia system and retrieval NNLS2018 ; DBLP:journals/ivc/WuW17 ; TC2018 ; DBLP:conf/mm/WuWS13 . Over the last decade, lots of researchers focus on this hot issue and made great progress. The Scale Invariant Feature Transform (SIFT for short) proposed by Lowe DBLP:conf/iccv/Lowe99
is a popular approach which transforms an image into a large collection of local feature vectors. These local features are invariant to image translation, scaling, rotation, and partially invariant to illumination changes and affine or 3-dimension projection. This method consists of four stagesDBLP:journals/ijcv/Lowe04 : (1)scale-space peak selection; (2)keypoint localization; (3)orientation assignment; (4)keypoint descriptor. Based on SIFT, Ke et al. DBLP:conf/cvpr/KeS04
proposed a descriptors named PCA-SIFT encode the salient aspects of the image gradient in the feature point’s neighborhood and they used Principal Components Analysis (PCA) to the normalized gradient patch. Mortensen et al.DBLP:conf/cvpr/MortensenDS05 introduced a feature descriptor which augments SIFT with a global context vector that adds curvilinear shape information from a much larger neighborhood. This descriptor is robust to local appearance ambiguity and non-rigid transformations. Liu et al. DBLP:journals/inffus/LiuLW15 proposed a novel image fusion method for multi-focus images by applying dense SIFT. In order to improving the object retrieval performance, Zhang et al. DBLP:journals/ijon/ZhangZZZW17 presented a novel method to employ CNN evidences to improve the SIFT matching accuracy.
Many other techniques have been designed to improve effectiveness and efficiency of content-based image retrieval. For content-based landmark image search problem, Zhu et al. DBLP:journals/tcyb/ZhuSJZX15 proposed multimodal hypergraph (MMHG) to characterize the complex associations between landmark images. Furthermore, they developed a novel content-based visual landmark search system based on MMHG to improve effectiveness of searching. Xiao et al. DBLP:journals/mta/XiaoQ14
presented a complementary relevance feedback-based content-based image retrieval system using short-term and long-term learning techniques to improve the retrieval performance. Jaffar et al. proposed a semantic image retrieval system in a web 3.0 environment incorporates Genetic algorithms with support vector machines and user feedbacks for image retrieval purposes.
These approaches for CBIR problem mentioned-above are not applicable to solve the problem of geo-tagged image search on road network because they just only consider the visual content during the retrieval, ignoring the geographical information during the query processing.
2.2 Spatial Keyword Queries
There are a great number of studies on spatial keywords queries techniques area, which is a hotspots interested by spatial database community.
Sanpshot query. Spatial keyword snapshot query DBLP:conf/icde/ZhangZZL13 technique can be divided into three categories: text priority index, spatial priority index and loosely structured index . The text first index query is to extract relevant inverted files using text index, and then use spatial index to do spatial filtering. For a given spatial keyword query, we can perform an incremental nearest neighbor search on the extracted inverted document DBLP:journals/tods/HjaltasonS99 , Until the objects that satisfies the keywords are found. Text first index query techniques include IF-R DBLP:conf/cikm/ZhouXWGM05 , S2I DBLP:conf/ssd/RochaGJN11 , I DBLP:conf/edbt/ZhangTT13 , SFC-QUAD DBLP:conf/cikm/ChristoforakiHDMS11 , IL-Quadtree DBLP:conf/icde/ZhangZZL13 ; DBLP:journals/tkde/ZhangZZL16 .
Spatial priority index is to use spatial index to prune space, and then extract corresponding inverted files from keywords. It includes R-tree with inverted files, R-tree with bitmap files, and grid with inverted files. R-tree with inverted files technique comprises R-IF DBLP:conf/cikm/ZhouXWGM05 , KR-tree DBLP:conf/ssdbm/HariharanHLM07 , IR-tree and its variations DBLP:conf/ssdbm/HariharanHLM07 ; DBLP:journals/pvldb/CongJW09 ; DBLP:journals/vldb/WuCJ12 like IR-tree(Li) DBLP:journals/tkde/LiLZLLW11 , WIR-tree DBLP:journals/tkde/WuYCJ12 , LBAK-tree DBLP:conf/gis/AlsubaieeBL10 . R-tree with bitmap files index includes IR2-tree DBLP:conf/icde/FelipeHR08 , SKI DBLP:conf/ssdbm/CaryWR10 , bR-tree DBLP:conf/icde/ZhangCMTK09 , and MHR-tree DBLP:conf/icde/YaoLHH10 . IRGI DBLP:conf/icde/ChenC2013 is the only approach for grid with inverted files, which is proposed to address the spatial keywords query problem of wireless data broadcast environment.
Loosely structured index DBLP:conf/icde/ZhangOT10 ; DBLP:conf/sigir/ZhangCT14 constructs spatial index and textual index respectively from spatial dimension and textual dimension. However, There is no connection or loose connection between the spatial index and the textual index. During the query processing, it search out objects that satisfy spatial or time constraints from spatial index and textual index respectively, And then return the results by taking their intersection. Zhang et al. DBLP:conf/icde/ZhangOT10 utilized R-tree to index spatial information, but each node adds a tag that records the path from the root node to this node. Their method applies inverted file to index documents, but the objects in the inverted file also carry information that can be used for spatial judgement. Zhang et al. DBLP:conf/sigir/ZhangCT14 only used an inverted table to organize spatial information and textual information based on CA algorithm DBLP:journals/jcss/FaginLN03 . They proposed a Top- query algorithm to solve Top- aggregate queries based on spatial keywords.
Continuous query. Wu et al. DBLP:conf/icde/WuYJC11 first time proposed spatial keyword continuous nearest neighbor queries. They used the doubly weighted Voronoi unit as the security area of the query to ensure that the user’s activities are limited to this security area. For the problem that need to use special sort function in the solution introduced by Wu et al, Huang et al. DBLP:conf/cikm/HuangLTF12 proposed a widely used ranking function based approach that works better in terms of efficiency and communication costs. For the problem that the security area query processing method is difficult to achieve the optimal update frequency and single update cost simultaneously, and existing spatial keyword continuous nearest neighbor query processing methods are based on sequential computation model design, Li et al. DBLP:journals/pvldb/Li0QYZ014 presented a spatial keyword continuous neighbor query processing method based on influence set. However, these studies just focused on the spatial textual queries, rather than image retrieval with geo-tagged information.
2.3 Spatial Keyword Queries on Road Network
Spatial keyword queries on road network is an other hot issue in the area of spatial data, which can reflect the environment of daily life more realistically. Although the research on keyword query techniques on road network started late, some important achievements have been proposed. Like spatial keyword queries, this query problem can be divided into two categories: snapshot query and continuous query.
Sanpshot query on road network. Rocha-Junior et al. DBLP:conf/edbt/Rocha-JuniorN12 studied keyword query on road network space in the first time. They proposed spatial textual index composed of space components, adjacency components, mapping components and inverted file components. And they proposed to build overlay network on the road network to improve query performance. For the same problem, Fang et al. DBLP:conf/adc/FangZSWXLC15 proposed a hybrid index SG-Tree which combines G-tree with signature file. Gao et al. DBLP:journals/tkde/GaoQZ015
studied a spatial keyword reverse kNN query on the road network and developed an algorithm based on filter refining framework. Furthermore, they introduced a count tree to improve query efficiency. Luo et al.DBLP:journals/kbs/LuoJLWLL16 and Lin et al. DBLP:conf/icde/LinXH16 studied spatial keyword reverse KNN query on the road network. Luo et al. DBLP:journals/kbs/LuoJLWLL16 proposed an algorithm based on network extension, and an algorithm to make use of the characteristics of network Voronoi graph. Lin et al. DBLP:conf/icde/LinXH16 designed a hybrid index structure KcR-tree to store and summarize the space and keyword information of objects. And then they proposed three kinds of optimization techniques for query. In our previous work DBLP:conf/edbt/ZhangZZLCW14 , we studied the keyword diversity query on road network and developed inverted index based on signature files. Besides, a segmentation-based method is used to improve the validity of signature files, and an efficient incremental and diversified spatial keyword search algorithm is designed.
Continuous query on road network. For the continuous query problem, Li et al. DBLP:journals/Huazhong/Li2013 studied the problem of spatial keyword continuous top- query in road network. They proposed a data structure consisting of a PMR-quad tree and three memory tables to store and retrieve relevant information of the road network and objects. Moreover, an adjustable formula for calculating the comprehensive distance is proposed by them to satisfy the different emphasis on keyword similarity and road network distance in various practical applications. The query result is corrected by monitoring the change of the comprehensive distance value of the candidate, so as to realize the continuous processing of the query. Guo et al. DBLP:journals/geoinformatica/GuoSAT15 proposed the notion of safety road segment, i.e., when the query is moving on the road, the top- query results remain unchanged. They presented two algorithms for continuous query in road network QCA and OCA.
Apparently, the researches above just consider the textual similarity and road network distance proximity, They do not consider the situation that the user need to search a image with geo-tag on the road network. To the best of our knowledge, we are the first to study the problem of continuous top- geo-visual query on road network.
In this section, we firstly review two basic approaches of image representation, namely invariant feature transform and bag-of-visual-words. Then we formally define the problem of interactive search for geo-tagged image and some concepts. Furthermore, we introduce the outline of our framework. Table 1 summarizes the notations frequently used throughout this paper to facilitate the discussion.
|A given database of geo-tagged images|
|The -th geo-tagged images|
|A given dataset of geo-visual object|
|The -th geo-visual object in|
|The graph model of road network|
|The -th node in a road network|
|A edge connecting node and|
|The weight of edge|
|The geo-location information descriptor of|
|A visual content descriptor of|
|A location point on road network|
|A vector of visual words|
|A visual word|
|A network node|
|A VIG-Tree index|
|A top- geo-visual query on road network|
|A path connecting and|
|The shortest path connecting and|
|The travel distance between and|
|The distance of path|
|A parameter used to balance the importance between road network distance proximity and visual content similarity.|
|The longitude of a geo-location|
|The latitude of a geo-location|
|The number of final results|
|Score function measuring the relevance of and|
|The diameter of road network|
|The road network distance proximity of and|
|The result set of TGVQ on road network|
|The visual content similarity between and .|
|The safe segment of edge|
|A safe interval|
3.1 Problem Definition
Definition 1 (Road Network)
Without loss of generality, A road network is modeled as a simple weighted undirected planar graph represented by and , wherein is the set of nodes which represent road intersections or road endpoints, and is the total number of nodes. The set of edges denotes all the road segments connecting two nodes, i.e., represents the road segment connecting node and . Let be a location point on , if is located on edge , we denote it as . denotes the set of weights which represent the travel distance of road segments or trip time, i.e., is the trip distance or time cost traveling through from node to . To simplifying the presentation, we use travel distance hereinafter. We define the travel distance between any two positions and on road networks as , and for an edge , . Furthermore, in this paper we assume that each edge is bi-directional, and the distance or time cost of it is irrelevant to the direction. It is described formally in Assumption 1.
Given a road network and . , and .
All the conceptions and algorithms which will be stated in the following sections are meaningful and effective based on Assumption 1 is this work.
Definition 2 (Shortest Path)
Given a road network , are two node. A path between and is defined as , is the number of edges in this path. Its distance is . Obviously, the distance of a path is equal to the sum of weights of all edges in this path, i.e.,
where is the number of edges in the path , represents the -th edge. Based on the notion of path, we define the shortest path connecting and as , which has the smallest distance among all paths connecting these two nodes. Formally, the distance of is described as:
where the function is to return the minimum value of element in the set , and represents the set of positive integer.
Figure. 3(a) shows an example of road network contains 14 nodes and 16 edges. The set of node is . The weight of each edge is shown aside, e.g., , . The shortest path between and consists of , and , which is highlight by red dashed line with arrow. The distance of it is . The blue dashed line denotes the shortest path connecting and , .
Definition 3 (Geo-visual Object)
Given a geo-tagged image database which storages image, i.e., . A geo-visual objects dataset is defined as containing objects. Each geo-visual object resides an edge of a road network, and it is associated with a geographical information descriptor and a visual content descriptor . More specifically, is represented by a 2-dimensional geo-location extracted from the geo-tag of image , i.e., , wherein and are longitude and latitude respectively. The visual content descriptor is defined as , which is a visual words vector generated from a geo-tagged image
by low-level feature extraction.
If a road network contains several geo-visual objects which are located on edges, we can denote it as . As the definition presented above, is the set of geo-visual objects. On the other hand, we define the distance between a geo-visual object and endpoint nodes and of an edge on which it lies as and based on Definition 2. The shortest distance between two objects and is denoted as .
Definition 4 (Top- Geo-visual Query (TGvq))
Given a road network with geo-visual objects , a top- geo-visual query is defined as , in which denotes the location of query, is a visual works vector generated from the query image, and is the number of requested objects. TGVQ aims to search out geo-visual objects from , which are ranked according to the score measured by function , i.e.,
and the score function is defined as follows:
where function measures the distance proximity between and , function calculates the visual content relevance between these two visual words vectors, and is used to balance the importance between road network distance proximity and visual content similarity. If , the road network distance proximity is more important than visual content similarity, and conversely, If , the visual content similarity plays a more considerable role. Note that the geo-visual objects with the small score values are preferred (i,e., ranked higher).
According to the definition of TGVQ, the score of geo-visual objects is determined by road network distance proximity and visual content similarity, which are measured by function and respectively. next we describe these two functions in formal to explain how to calculate the distance proximity and visual content similarity. Firstly, an important notion named road network diameter is presented in the following definition.
Definition 5 (Road Network Diameter)
Given a road network , the road network diameter is defined as follows:
where the function is to return the maximum value of element in the set . It is apparent that the diameter of a road network is the maximum shortest path connecting any two node.
Based on the conception of road network diameter and shortest path, next we propose the definition of road network distance proximity.
Definition 6 (Road Network Distance Proximity)
Given a road network with geo-visual objects , a geo-visual object and a top- geo-visual query , the road network distance proximity measure is defined as:
where is the distance of shortest path connecting and . Obviously, the road network distance proximity is determined by the shortest path between the query location and the object. Besides, it is easily to prove that the value range of is .
We prove the proposition by contradiction. (1)Firstly, we assume that , i.e., . Thus, according to Definition 5, . Without loss of generality, assume is located on the edge and is located on the edge , and contains and . Thus . It is clearly that , . Thus, . This is contradictory to the assumption that . (2)It is easily to know that the distance of shortest path and network diameter are positive, and if and at the same place, then , i.e., and . Therefore .
The other important part of score function, visual content similarity function , relies on Bag-of-Visual-Words (BoVWs for short) model and SIFT technique to. The local features of an image which are extracted from image key-points are encoded into a visual words vector.
SIFT is used to detect and describe local features of an image, which transforms an image into a large collection of local feature vectors. These local features are invariant to image translation, scaling, rotation, and partially invariant to illumination changes and affine or 3-dimension projection DBLP:journals/ijcv/Lowe04 . It has four main stages introduced as follows:
(1)Scale-space extrema detection. The first stage is named scale-space extrema detection, which searches all scales and image location. The algorithm can recognize the potential interest points by using a difference-of-Gaussian (DoG) function convolved with the image.
(2)Keypoint localization. The second stage is keypoint localization aiming to precisely localize the keypoints. In this stage, two kinds of extremum point will be discarded: low contrast extreme points and unstable edge response points.
(3)Orientation assignment. The third stage is named orientation assignment. In this stage, one or more orientations are designated to each keypoint based on gradient directions of the local image. all computations are performed in a scale-invariant manner for each image. The dominant orientation of the keypoint is designated by the highest peak in the histogram after orientation histogram formed by computing the magnitude and orientation within a region around the keypoint.
(4)Keypoint descriptor. The last stage, named keypoint descriptor, which measures the gradients of local image at the selected scale in the region around each keypoint. These gradients are transformed into a representation, which allows for relatively large local shape distortion and illumination changes.
An image is represented as a vector of visual words denoted as constructed by vector quantization of feature descriptors utilizing BoVWs extending from bag-of-words (BoW for short) technique which is widely used in textual similarity measurement DBLP:conf/iccv/SivicZ03 . In this approach, -means is applied to create the visual words. In other words, visual words are defined by the -means cluster centers and the SIFT features in every images are then assigned to the nearest cluster center to give a visual word representation DBLP:conf/bmvc/ChumPZ08 .
Based on the notion introduced above, the similarity of two images and can be calculated by the following equation:
where and are two sets of visual words generated from and . In this method, all the words are equally important. Obviously,
According to the conception of BoVWs and the image similarity measurement, we propose the definition of visual content similarity to measure the visual relevance between two geo-visual objects.
Definition 7 (Visual Content Similarity)
Given a TGVQ denoted as and a geo-visual object , the visual content similarity between and is measured by the following function:
Definition 8 (Continuous Top- Geo-visual Query (CTGvq))
Given a road network with geo-visual objects , a continuous top- geo-visual query is defined as , wherein , and have the same meaning with the relevant part in Definition 4, is a time range in which a CTGVQ is running. For each new location , CTGVQ returns a new result set .
To demonstrate the CTGVQ problem plainly, we consider a simple example in Figure. 3(b). A CTGVQ denoted as is represented by a red triangle which is moving along the route marked by red line with arrow. Clearly, it will pass through node , and successively. During the movement, the result of will be updated continuously. When is near , the result set is . When it moves to , the result set will be .
4 Hybrid Indexing for Network Aware Continuous Image Retrieval
In this section, we present a Visual Inverted G-tree (VIG-Tree for short) that supports the following required functions for continuous geo-visual search and ranking on road network: 1)visual filtering: all the visually irrelevant nodes and objects have to be discarded as early as possible to cut down the search cost; 2)network filtering: all the nodes,which are farther in networks, have to access as later as possible to avoid unnecessary network expansion. 3)relevance computation and ranking: since only the top-k geo-visual objects are returned and is expected to be much smaller than the total number of match objects, it is desirable to have an incremental search process that integrates the computation of the joint relevance, and object ranking seamlessly so that the search process can stop as soon as the top- objects are identified.
4.1 Hybrid Index Framework: VIG-Tree
VIG-Tree is a combination of an G-Tree and visual inverted index. In particular, each node of an VIG-Tree contains both network information and visual words information; the former consists of a distance matrix and a set of minimum bounding areas (MBR) of children nodes, and the latter in the form of a visual inverted index file for the edges or nodes rooted at the node.
Fig. 5 presents the basic indexing architecture of VIG-Tree. In the VIG-Tree, a leaf node L consists of four parts: MBR component, matrix component, visual component and a subnetwork component. In the following, we describe each component in more detail.
MBR component. The MBR component contains a set of MBRs that encloses the corresponding edges which roots at current node, and are used to find the edges that cover the original query location.
Matrix component. The matrix component consists of three parts, the rows are all borders in the node, the columns are all vertices that are rooted at this node, the entry of distant matrix records the shortest-path distance between the border and the vertex.
Visual component. The visual component consists of a list of inverted indexes of unique visual words of leaf node. Each visual inverted index corresponds to a visual word and pointing to a list of edges that contain .
Subnetwork component. The subnetwork component combines network connectivity information and detail visual word information of geo-visual objects of each edge. It points to an edge array list of current node. The edge array consists of the edge id (e.g., (, )), the length of the edge (e.g., (, )), and a detail polyline to describe the edge. The detail polyline is used to locate the edge where the query lies, and start a network expansion from the nodes of the edge.
A non-leaf node N of VIG-Tree is composed of MBR component, Matrix component, and visual component. To be specific, the MBR component contains a set of MBRs that encloses the corresponding nodes which roots at current node. Similar to leaf node, the matrix component of non-leaf node also consists of three parts, but the major difference is that both rows and columns are all borders that rooted at this node, and the value of each entry of distance matrix is the shortest-path distance between the two borders. Similarly, The visual component of non-leaf node is composed of a set of visual inverted indexes which is generated by aggregating the visual information from its children node, and points to a list of nodes that contain corresponding visual word.
Similar to DBLP:conf/edbt/Rocha-JuniorN12 , we also used a B-tree to manage the edge information. The key of the B-tree is composed of edge id and visual word id to the inverted list that contains the geo-visual objects lying on the edge with visual word in their description. This key value points to an inverted list which stores the geo-visual objects lying on the corresponding edge that have a visual word in their description.For each geo-visual objects, the object id and the network distance between the geo-visual objects and the related node of the edge are stored. This information is used to compute the visual content similarity between the geo-visual object and query.
4.2 Processing of Geo-visual Queries on Road Network
We proceed to present an important metric, the minimum visual network distance , which will be used in the geo-visual query processing. Given a geo-visual query and a network node in the VIG-Tree, the metric offers a lower bound on the actual visual network distance between query and the geo-visual objects enclosed in the rectangle of network node . The search space of VIG-Tree can be efficiently pruned by this bound.
Definition 9 ()
The distance of a geo-visual query from a node in the VIG-Tree, denoted as , is defined as follows:
where is the length of the shortest path between and , is the minimum visual content relevance between and .
Given a geo-visual query , a node , and a set of geo-visual objects in node , for any , we have .
Since geo-visual object is enclosed by the border nodes of node , the minimum network distance between and is no larger than the minimum network distance between and :
When searching the VIG-Tree for the geo-visual objects nearest to a query , the first step is to decide which node should enter first. Metric offers an approximation of the visual network distance ranking score to every children in the node and, therefore, can be used to guide the search.
To process geo-visual queries with VIG-Tree, we exploit the best-first traversal algorithm for retrieving the top-k geo-visual objects. With the best-first traversal algorithm, a priority queue is used to keep track of the nodes and objects that have yet to be visited. When deciding which node to visit next, the algorithm picks the node on road network with the smallest value in the set of all nodes that have yet to be visited. The algorithm terminates when nearest objects (ranked according to Equation 3) have been found.
Algorithm 1 illustrates the details of the VIG-Tree based geo-visual query on road network. A minimum heap is employed to keep the VIG-Tree’s nodes where the key of a node is its minimum visual network ranking score. For the input query, we find out the root node of current time segment, calculate the minimal spatial temporal visual ranking score for the root node, and then pushed the root node into the in Line 3. Then the purposed algorithm executes the while loop (Line 4-27)until the top- results are ultimately reported in Line 28.
In each iteration, the top node with minimum visual network ranking score is popped from . When the popped node is a leaf node(Line 5), the enclosed edge of will be pushed to if its minimal visual network distance ranking score between and , denoted by , is not larger than (Line 6- 9). When the popped object is an edge(Line 13), for the geo-visual objects which have same visual word with and belong to current edge, if its minimal visual network distance ranking score between geo-visual and is not larger than , we push into result set and add update . Otherwise, if the popped object is a non-leaf node(Line 21), a child node of will be pushed to if its minimal visual network distance ranking score between and , denoted by , is not larger than (Line 21- 23). The algorithm terminates when is empty and the results are kept in .
5 Moving Monitor Algorithm
In this section, we propose a efficient algorithm to solve the problem of continuous top- geo-visual objects query on road network. At first we introduce the relevant notion named safe interval which plays a key role in this solution. A result updating rule is designed to guide the result set updating during movement of query. After that we describe the moving monitor algorithm in detail.
5.1 Safe Interval
If applying the solution of snapshot query in the continuous top- geo-visual object query on road network, the cost of computation is quite high due to the continuous movement of . In order to solve this problem effectively, in this paper we propose a novel conception on road network named safe interval which is the base of our solution. Firstly we introduce a lemma which is the base of the notion of safe interval, and then propose the definition of safe interval.
Given a CTGVOQ located on the edge , the result set of is denoted as which consists of three part: (1)the set of relevant objects located on the edge denoted as , (2)the result set of TGVOQ on the location denoted as and (3) the result set of TGVOQ on the location denoted as , that is,
where relevant object refers to object that matches at least one of the query visual word.
We prove Lemma 1 by contradiction. We assume that and . The shortest path connecting and denoted as must pass through one of and since . In general, we assume that is closer to . . As , it is easy to know that . Since can be added to the road network distance proximity component of both sides of the above inequality and the visual content similarity component remains constant, we have . It means that , . Therefore, should not be in . This is contradictory to the initial assumption that .
Definition 10 (Safe Segment)
Given a road network and a CTGVQ denoted as which is moving on an edge . A safe segment of is defined as , which is a subsegment of and if is located on it, the results is not change. Formally, , , where and are two location and denotes the result set of query at the location .
Based on safe segment, a client-server architecture is utilized in this paper. When a client submits a continuous top- geo-visual object query , the edge on which this query locates can be searched out and the result sets of query at the two end nodes of this edge are returned at first. After that the top- results of can be generated from the result sets of the end nodes. It is only when the client leaves out from the safe segment, It is need to send a new location to the server and then repeats the above process.
According the definition above-mentioned, we can know that each edge has its safe segment, in which the result set of a query do not need to be updated. This can greatly reduce computing cost. Moreover, we find a situation in a road network that if a long path containing serval segment, and it has no crossroad, we can consider these safe segments as a whole one in which the result set still do not need to be updated. There is no doubt that it can further cut down the cost when we utilize this conception in CTGVQ. We call it as safe interval and give the definition as follows.
Definition 11 (Safe Interval)
Given a road network , we define safe interval as follows: assume that and the path connecting them is denoted as which has no branch road, the safe segments of each edge in are denoted as , the safe interval of this path is defined as .
Theorem 5.1 (Results Updating Rule)
Given a CTGVQ denoted as located on the edge moves towards the direction from to , the result set will be updated under the following rule: (1)the geo-visual objects which are located on the position behind will be discarded. (2)the geo-visual objects which are located on the position in the front of will be added in. Formally, assume that pass through and successively, .
(1)If , then
(2)If , then
As shown in Figure. 5, the query denoted as red triangle located on the edge . Three geo-visual objects denoted by yellow small circle are located on edge , and respectively. is moving from location to . It is easy to find that , thus it may will be discarded from the result set. On the other hand, and , that means and may will be added in the result set.
5.2 Moving Monitor Algorithm
A standard client-server architecture is utilized to imply a continuous top- geo-visual object query. During the processing, there are two cases must be considered, i.e., (1)CASE 1. The client moves into a safe interval; (2)CASE 2. The client leaves out from a safe interval and moves into a new edge. In order to understand our approach easily, before introducing the query algorithm, we at first present these two case in detail as follows.
CASE 1. When the client moves into a safe interval, first the server returns the candidates set according to Lemma 1. Then in the processing of moving, the client will generate the results set based on candidates set and continuously update it according the result updating rule described above.
CASE 2. When the client leaves out from a safe interval and moves into a new edge, it promptly informs the server to recompute the candidates set for the new edge. After that the client will move into a new safe interval of this edge, and then the results set will be calculated as CASE 1.
Based on the conception of safe segment and the results updating rule, we design an algorithm named Moving Monitor Algorithm (MMA for short) which runs on the client-server architecture. The pseudo-code is shown as follows.
6 Experimental Evaluation
In this section, we present results of a comprehensive performance study on real road network datasets to evaluate the efficiency and scalability of the proposed techniques. Specifically, we evaluate the effectiveness of the following indexing techniques for continuous geo-visual search on road network.
Datasets. Performance of various algorithms is evaluated on both real road network datasets. The following three datasets are deployed in the experiments. Road network NA is obtained from the North America Road Network(http://www.cs.utah.edu/~lifeifei/SpatialDataset.htm) with nodes and road segments; Road network of SF is obtained from San Francisco Road Network (http://www.cs.utah.edu/~lifeifei/SpatialDataset.htm) where there are nodes and edges; Road network AU is obtained from DBLP:conf/edbt/Rocha-JuniorN12 with nodes and edges. The locations of the objects are randomly chosen from spatial datasets Rtree-Portal ( http://www.rtreeportal.org). Note that we move an object to its closest road segment if it does not lie on any edge in the road network. In the experiments, the locations of all datasets are scaled to the 2-dimensional space . Similarly, we generate image dataset by crawling millions image from photo-sharing site Flickr(http://www.flickr.com/). The dataset size varies from 400K to 2M to evaluate the scalability of our proposed algorithm. Table 2 summaries the important statistics of these image datasets.
|Datasets||Number of Images||Dist. Visual Words Number||Avg. Visual Words Number|
Workload. A workload for the continuous geo-visual query consists of queries. The query response time, the number of communication time are employed to evaluate the performance of the algorithms. The query locations are randomly selected from the locations of the underlying objects, and each query contains a sequence of locations in the form of (, , ). As both Overlay-SI and VIG-SI adopt segment interval technique, we only take Overlay and VIG-SI into comparison, while comparing the number of communication time. The query length, which indicates the number of locations the moving object reported, varies from 100 to 500; the number of the query visual words changes from 20 to 100; the number of the returned results grows from 10 to 50; the preference parameter varies from 0.1 to 0.9; the image dataset size grows from 400K to 2M. By default, the query length, the query visual word number, the result number, he preference parameter and dataset size is set to 100, 40, 10, 0.5 and 800K respectively. Experiments are run on a PC with Intel Xeon 2.60GHz dual CPU and 16G memory running Ubuntu. All algorithms in the experiments are implemented in Java.
Evaluation on query length. We investigate the query response time of three algorithm use the query length of 100, 200, 300, 400, 500. Figure 6(a) reports that the performance of three algorithm in terms of the query response time degrades against the growth of query length. As expected, Overlay-SI significantly outperforms Overlay in Figure 6(a), since the safe interval can reduce the number of geo-visual search in server. VIG-SI achieves the better performance compared with Overlay-SI because the G-tree can significantly reduce the network expansion cost. Similarly, in Figure 6
(b), the communication time increasing as the growth of query length, because more query locations mean higher probability to invoke geo-visual search.
Evaluation on the number of query visual words. We evaluate the effect of the number of query visual words in Figure 7 on dataset SF, where varies from 20 to 100. The larger l will lead to the more chance for each object to meet the keyword constraint, it indicates more nodes or objects need to take into consideration. Thus, the response time of both three algorithms increases. However, the communication cost of both Overlay and VIG-SI is stable and VIG-SI has less communication. As VIG-SI adopts safe interval technique, the communication time of VIG-SI significantly reduces. Meanwhile, because the Overlay executes geo-visual search in each query location, and the probability to invoke geo-visual search for safe interval will not change as the query visual words increase.
Evaluation on the number of results. Figure 8 reports the performance of algorithms when the number of result varies from 10 to 50. Not surprisingly, all of them increase gradually with the increase of and the performance of VIG-SI is always the best. Similar to the above situation, the communication cost of both Overlay and VIG-SI is still stable, and VIG-SI has better performance.
Evaluation on the preference parameter. The effect of query preference parameter is shown in Figure 9. A large value of indicates high preference to network proximity, while a small value of means high preference to visual similarity. As can be seen in Figure 9(a), all algorithm are sensitive to , and their performance becomes better as the increase of . Figure 9(b) shows that the communication time is insensitive to .
Evaluation on the dataset size. Figure 10(a) illustrates the query response time of the under different sizes of image datasets. It is shown that all algorithms are sensitive to the growth of dataset size. As expected, Figure 10(b) shows that the number of communication time of both algorithms are still stable when dataset size varies from 400K to 2M.
Evaluation on different dataset. We evaluate the response time and the number of communication time for three real road network datasets of different size in Figure 11. As can be seen in Figure 11(a), all algorithms achieve the best performance in dataset NA, and have the worst performance in dataset AU. This is because more edges should be accessed as the edges size increases. The algorithm Overlay has better performance in dataset AU, while the algorithm Overlay-SI and VIG-SI achieve better performance in dataset NA, because the probability to invoke geo-visual search become higher, as the edge number increases. Figure 11(b) illustrates Overlay gains a constant value since it reports invoke the geo-visual search for every location, while the communication cost of Overlay-SI and VIG-SI increase, as the edge number increases.
In this paper, we propose and study a novel query problem named continuous top- geo-visual objects query (CTGVOQ) on road network. Given a geo-visual objects which contains geographical information and visual content information, a CTGVOQ aims to search out best geo-visual objects ranked in terms of visual content similarity or relevance to the query and the road network distance proximity to the query location. Firstly we define CTGVOQ formally and propose the score function. In order to improve the efficiency of searching, we present a novel hybrid indexing framework called VIG-Tree and a efficient algorithm named geo-visual search on road network is proposed. To further reducing the computational cost in the process of query moving and searching our top- results from candidates set faster, we propose the notion of safe interval and introduce an efficient algorithm named moving monitor algorithm. The experimental evaluation on real multimedia dataset and road network dataset shows that our solution outperforms the state-of-the-art method.
Acknowledgments: This work was supported in part by the National Natural Science Foundation of China (61702560), project (2018JJ3691, 2016JC2011) of Science and Technology Plan of Hunan Province, and the Research and Innovation Project of Central South University Graduate Students(2018zzts177).
- (1) Alsubaiee, S., Behm, A., Li, C.: Supporting location-based approximate-keyword queries. In: 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2010, November 3-5, 2010, San Jose, CA, USA, Proceedings, pp. 61–70 (2010)
- (2) Cary, A., Wolfson, O., Rishe, N.: Efficient and scalable method for processing top-k spatial boolean queries. In: Scientific and Statistical Database Management, 22nd International Conference, SSDBM 2010, Heidelberg, Germany, June 30 - July 2, 2010. Proceedings, pp. 87–95 (2010)
- (3) Chen, C., Chen, C., Sun, W.: Spatial keyword queries in wireless broadcast environment. In: Journal of Computer Research and Development, 2013 (2013)
- (4) Christoforaki, M., He, J., Dimopoulos, C., Markowetz, A., Suel, T.: Text vs. space: efficient geo-search query processing. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM 2011, Glasgow, United Kingdom, October 24-28, 2011, pp. 423–432 (2011)
- (5) Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-hash and tf-idf weighting. In: Proceedings of the British Machine Vision Conference 2008, Leeds, UK, September 2008, pp. 1–10 (2008)
- (6) Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB 2(1), 337–348 (2009)
- (7) Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
- (8) Fang, H., Zhao, P., Sheng, V.S., Wu, J., Xu, J., Liu, A., Cui, Z.: Effective spatial keyword query processing on road networks. In: Databases Theory and Applications - 26th Australasian Database Conference, ADC 2015, Melbourne, VIC, Australia, June 4-7, 2015. Proceedings, pp. 194–206 (2015)
- (9) Felipe, I.D., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Cancún, Mexico, pp. 656–665 (2008)
- (10) Fu, R., Li, B., Gao, Y., Wang, P.: Content-based image retrieval based on cnn and svm. In: Proceedings of 2nd IEEE International Conference on Computer and Communications, 2016 (20016)
- (11) Gao, Y., Qin, X., Zheng, B., Chen, G.: Efficient reverse top-k boolean spatial keyword queries on road networks. IEEE Trans. Knowl. Data Eng. 27(5), 1205–1218 (2015)
- (12) Guo, L., Shao, J., Aung, H.H., Tan, K.: Efficient continuous top-k spatial keyword queries on road networks. GeoInformatica 19(1), 29–60 (2015)
- (13) Hariharan, R., Hore, B., Li, C., Mehrotra, S.: Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In: 19th International Conference on Scientific and Statistical Database Management, SSDBM 2007, 9-11 July 2007, Banff, Canada, Proceedings, p. 16 (2007)
- (14) Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Trans. Database Syst. 24(2), 265–318 (1999)
- (15) Huang, W., Li, G., Tan, K., Feng, J.: Efficient safe-region construction for moving top-k spatial keyword queries. In: 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, HI, USA, October 29 - November 02, 2012, pp. 932–941 (2012)
- (16) Ke, Y., Sukthankar, R.: PCA-SIFT: A more distinctive representation for local image descriptors.
- (17) Li, C., Gu, Y., Qi, J., Yu, G., Zhang, R., Yi, W.: Processing moving knn queries using influential neighbor sets. PVLDB 8(2), 113–124 (2014)
- (18) Li, Y., Li, G., Zhang, C.: Processing continuous top-k spatial keyword queries over road networks. In: J.Huazhong Univ. of Sci. and T ECH. (Natural Science Edition), 2013., pp. 29–60 (2013)
- (19) Li, Z., Lee, K.C.K., Zheng, B., Lee, W., Lee, D.L., Wang, X.: Ir-tree: An efficient index for geographic document search. IEEE Trans. Knowl. Data Eng. 23(4), 585–599 (2011)
- (20) Lin, X., Xu, J., Hu, H.: Reverse keyword search for spatio-textual top-k queries in location-based services. In: 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016, pp. 1488–1489 (2016)
- (21) Liu, Y., Liu, S., Wang, Z.: Multi-focus image fusion with dense SIFT. Information Fusion 23, 139–155 (2015)
- (22) Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV, pp. 1150–1157 (1999)
- (23) Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
- (24) Luo, C., Li, J., Li, G., Wei, W., Li, Y., Li, J.: Efficient reverse spatial and textual k nearest neighbor queries on road networks. Knowl.-Based Syst. 93, 121–134 (2016)
- (25) Mortensen, E.N., Deng, H., Shapiro, L.G.: A SIFT descriptor with global context. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 20-26 June 2005, San Diego, CA, USA, pp. 184–190 (2005)
- (26) Norouzi, M., Fleet, D.J., Salakhutdinov, R.: Hamming distance metric learning. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., pp. 1070–1078 (2012)
- (27) Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., Nørvåg, K.: Efficient processing of top-k spatial keyword queries. In: Advances in Spatial and Temporal Databases - 12th International Symposium, SSTD 2011, Minneapolis, MN, USA, August 24-26, 2011, Proceedings, pp. 205–222 (2011)
- (28) Rocha-Junior, J.B., Nørvåg, K.: Top-k spatial keyword queries on road networks. In: 15th International Conference on Extending Database Technology, EDBT ’12, Berlin, Germany, March 27-30, 2012, Proceedings, pp. 168–179 (2012)
- (29) Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: 9th IEEE International Conference on Computer Vision (ICCV 2003), 14-17 October 2003, Nice, France, pp. 1470–1477 (2003)
- (30) Thomee, B., Lew, M.S.: Interactive search in image retrieval: a survey. IJMIR 1(2), 71–86 (2012)
- (31) Wang, Y., Lin, X., Wu, L., Zhang, W.: Effective multi-query expansions: Robust landmark retrieval. In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM ’15, Brisbane, Australia, October 26 - 30, 2015, pp. 79–88 (2015)
- (32) Wang, Y., Lin, X., Wu, L., Zhang, W.: Effective multi-query expansions: Collaborative deep networks for robust landmark retrieval. IEEE Trans. Image Processing 26(3), 1393–1404 (2017)
- (33) Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q.: Exploiting correlation consensus: Towards subspace clustering for multi-modal data. In: Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03 - 07, 2014, pp. 981–984 (2014)
- (34) Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q.: LBMCH: learning bridging mapping for cross-modal hashing. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, August 9-13, 2015, pp. 999–1002 (2015)
- (35) Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q., Huang, X.: Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans. Image Processing 24(11), 3939–3949 (2015)
- (36) Wang, Y., Lin, X., Zhang, Q.: Towards metric fusion on multi-view data: a cross-view based graph random walk approach. In: 22nd ACM International Conference on Information and Knowledge Management, CIKM’13, San Francisco, CA, USA, October 27 - November 1, 2013, pp. 805–810 (2013)
- (37) Wang, Y., Lin, X., Zhang, Q., Wu, L.: Shifting hypergraphs by probabilistic voting. In: Advances in Knowledge Discovery and Data Mining - 18th Pacific-Asia Conference, PAKDD 2014, Tainan, Taiwan, May 13-16, 2014. Proceedings, Part II, pp. 234–246 (2014)
Wang, Y., Wu, L.: Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering.Neural Networks 103, 1–8 (2018)
- (39) Wang, Y., Wu, L., Lin, X., Gao, J.: Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans. Neural Networks and Learning Systems (2018)
Wang, Y., Zhang, W., Wu, L., Lin, X., Fang, M., Pan, S.: Iterative views
agreement: An iterative low-rank based structured optimization method to
multi-view spectral clustering.
In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, pp. 2153–2159 (2016)
- (41) Wang, Y., Zhang, W., Wu, L., Lin, X., Zhao, X.: Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion. IEEE Trans. Neural Netw. Learning Syst. 28(1), 57–70 (2017)
- (42) Wu, D., Cong, G., Jensen, C.S.: A framework for efficient spatial web object retrieval. VLDB J. 21(6), 797–822 (2012)
- (43) Wu, D., Yiu, M.L., Cong, G., Jensen, C.S.: Joint top-k spatial keyword query processing. IEEE Trans. Knowl. Data Eng. 24(10), 1889–1903 (2012)
- (44) Wu, D., Yiu, M.L., Jensen, C.S., Cong, G.: Efficient continuously moving top-k spatial keyword query processing. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany, pp. 541–552 (2011)
- (45) Wu, L., Wang, Y.: Robust hashing for multi-view data: Jointly learning low-rank kernelized similarity consensus and hash functions. Image Vision Comput. 57, 58–66 (2017)
- (46) Wu, L., Wang, Y., Gao, J., Li, X.: Deep adaptive feature embedding with local sample distributions for person re-identification. Pattern Recognition 73, 275–288 (2018)
- (47) Wu, L., Wang, Y., Ge, Z., Hu, Q., Li, X.: Structured deep hashing with convolutional neural networks for fast person re-identification. Computer Vision and Image Understanding 167, 63–73 (2018)
- (48) Wu, L., Wang, Y., Li, X., Gao, J.: Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Trans. Cybernetics (2018)
- (49) Wu, L., Wang, Y., Li, X., Gao, J.: What-and-where to match: Deep spatially multiplicative integration networks for person re-identification. Pattern Recognition 76, 727–738 (2018)
- (50) Wu, L., Wang, Y., Shepherd, J.: Efficient image and tag co-ranking: a bregman divergence optimization method. In: ACM Multimedia Conference, MM ’13, Barcelona, Spain, October 21-25, 2013, pp. 593–596 (2013)
- (51) Xiao, Z., Qi, X.: Complementary relevance feedback-based content-based image retrieval. Multimedia Tools Appl. 73(3), 2157–2177 (2014)
- (52) Yao, B., Li, F., Hadjieleftheriou, M., Hou, K.: Approximate string search in spatial databases. In: Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, March 1-6, 2010, Long Beach, California, USA, pp. 545–556 (2010)
- (53) Zhang, C., Zhang, Y., Zhang, W., Lin, X.: Inverted linear quadtree: Efficient top k spatial keyword search. In: 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013, pp. 901–912 (2013)
- (54) Zhang, C., Zhang, Y., Zhang, W., Lin, X.: Inverted linear quadtree: Efficient top K spatial keyword search. IEEE Trans. Knowl. Data Eng. 28(7), 1706–1721 (2016)
- (55) Zhang, C., Zhang, Y., Zhang, W., Lin, X., Cheema, M.A., Wang, X.: Diversified spatial keyword search on road networks. In: Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, March 24-28, 2014., pp. 367–378 (2014)
- (56) Zhang, D., Chan, C., Tan, K.: Processing spatial keyword query as a top-k aggregation query. In: The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’14, Gold Coast , QLD, Australia - July 06 - 11, 2014, pp. 355–364 (2014)
- (57) Zhang, D., Chee, Y.M., Mondal, A., Tung, A.K.H., Kitsuregawa, M.: Keyword search in spatial databases: Towards searching by document. In: Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009 - April 2 2009, Shanghai, China, pp. 688–699 (2009)
- (58) Zhang, D., Ooi, B.C., Tung, A.K.H.: Locating mapped resources in web 2.0. In: Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, March 1-6, 2010, Long Beach, California, USA, pp. 521–532 (2010)
- (59) Zhang, D., Tan, K., Tung, A.K.H.: Scalable top-k spatial keyword search. In: Joint 2013 EDBT/ICDT Conferences, EDBT ’13 Proceedings, Genoa, Italy, March 18-22, 2013, pp. 359–370 (2013)
- (60) Zhang, G., Zeng, Z., Zhang, S., Zhang, Y., Wu, W.: SIFT matching with CNN evidences for particular object retrieval. Neurocomputing 238, 399–409 (2017)
- (61) Zhou, Y., Xie, X., Wang, C., Gong, Y., Ma, W.: Hybrid index structures for location-based web search. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31 - November 5, 2005, pp. 155–162 (2005)
- (62) Zhu, L., Shen, J., Jin, H., Zheng, R., Xie, L.: Content-based visual landmark search via multimodal hypergraph learning. IEEE Trans. Cybernetics 45(12), 2756–2769 (2015)