The advent of manifold representation and graph-based techniques as diffusion approaches has affected several computer vision research fields, such as Content-Based Image Retrieval (CBIR). This is a computer vision task, tailored for mobile devices, aimed at ranking increasingly the database images (that can be millions or more) based on the similarity to a query. Similarity is a metric that can be calculated between two vectors that represent the images. The task seems simple but poses several challenges. The algorithm needs to be invariant to: image resolution, illumination conditions, viewpoints, and to the presence of distractors as cars, people and trees(Magliani et al., 2019a). Furthermore, the method adopted for the retrieval task needs to be precise (i.e., to obtain a good retrieval performance) and fast (i.e., to retrieve the results in as short time as possible). Unfortunately, it is not always possible to obtain excellent results in a short time, therefore the final target is finding a trade-off between these two metrics. The use of descriptors from pre-trained CNN has allowed researchers to obtain good results in a very simple manner: simply extracting the features from an intermediate layer and then applying pooling and normalization techniques. Furthermore, different embedding algorithms for improving the results have been proposed in order to make the descriptors more discriminating and invariant to rotation, change of dimension, occlusions, and so on (Babenko and Lempitsky, 2015; Kalantidis et al., 2016; Magliani and Prati, 2018; Gordo et al., 2016).
Recently, Iscen et al.(Iscen et al., 2017) and Yang et al.(Yang et al., 2018) outperformed the state of the art on several public image retrieval datasets through the application of the diffusion process to R-MAC descriptors (Gordo et al., 2017). The reason for the success of diffusion for retrieval (Zhou et al., 2004) is that it permits to find more neighbors that are close to the query using the manifold representation, than using the Euclidean one. Although the diffusion improves retrieval results, it requires a long time to create the kNN graph necessary for the diffusion application. To solve this issue we follow the technique proposed by Magliani et al.(Magliani et al., 2019b), that proposes a method for effective and efficient creation of an approximate kNN graph suitable for the application of the diffusion approach. On this graph it is possible to obtain the same or better retrieval performance after diffusion than using a brute-force approach, requiring a shorter computation time.
As previously said, the diffusion process works well on this task, but it requires the configuration of several parameters in order to obtain the best retrieval performance for each dataset. Some of them are: the number of walks to execute and the number of neighbors in the graph and the number of database images to consider for the random walk process. Currently, the configuration of these parameters is obtained through an extensive testing of several different configurations. As an alternative, a brute-force approach could be applied but it is unfeasible due to the huge time necessary to test all possible combinations of the different parameters.
In this paper, we propose to use genetic algorithms to find an optimal configuration of the parameters of the diffusion approach applied to several CBIR datasets. Besides that, the execution of the diffusion process with the correct configuration allows yields very interesting results on several public image datasets, outperforming the state of the art.
The main contributions of this paper are:
the use of genetic algorithms for tuning the diffusion parameters;
the comparison with other different optimization methods which can solve the above problem;
a test of the optimization methods on several public image datasets.
The paper is structured as follows. Section 2 introduces the general techniques used in the state of the art. Section 3 describes in detail the graphs and the diffusion mechanism. Section 4 describes the proposed approach. Section 5 reports the experimental results on three public datasets: Oxford5k, Paris6k and Oxford105k. Finally, some concluding remarks are reported.
2. Related work
The setting of algorithm parameters has a relevant impact on the performance of machine learning methods. Finding an optimal parameter configuration can be treated as a search problem, aimed at maximizing the quality of a machine learning model, according to some performance metrics (e.g., accuracy).
One of the main challenges of parameter setting optimization is given by the complex interactions between the parameters. Configuring the parameters individually may lead to suboptimal results, whereas trying all different combinations is often impossible due to the curse of dimensionality.
Parameter tuning: the parameter values are chosen offline and then the algorithm is run using those values, which do not change anymore during execution. This is the case of interest for this paper;
Parameter control: the parameter values may vary during the execution, according to a strategy that depends on the results that are being achieved (Karafotias et al., 2015).
The importance of parameter tuning has been frequently addressed in the last years (Montero et al., 2018; Sipper et al., 2018). Several algorithms for parameter tuning have been proposed (Hoos, 2011; Bergstra et al., 2011; Falkner et al., 2018), among which the simplest strategies are grid search and random search. In (Bergstra and Bengio, 2012)
, the authors compare the performance of neural networks whose hyperparameters have been configured using grid search and random search. They show that random search is more efficient than grid search and able to find models that are as good or better requiring much less computation time. Random search performs better especially when only few hyperparameters affect the final performance of the machine learning algorithm. In this case, grid search allocates too many trials to the exploration of dimensions that do not matter, suffering from poor coverage of dimensions that are important.
When the search space is non-continuous, high-dimensional, non-convex or multi-modal, local search methods are consistently outperformed by stochastic optimization algorithms (Grefenstette, 1986). Metaheuristics are general-purpose stochastic procedures designed to solve complex optimization problems (Glover and Kochenberger, 2006; Engelbrecht, 2007).
These optimization algorithms are non-deterministic and approximate, i.e., they do not always guarantee that they find the optimal solution, but they can find a good one in reasonable time. Metaheuristics require no particular knowledge about the problem structure other than the objective function itself, when defined, or a sampling of it (Mesejo et al., 2016). The main objective of metaheuristics is to achieve a trade-off between diversification (exploration) and intensification (exploitation). Diversification implies generating diverse solutions to explore the search space on a global scale, while exploitation implies focusing the search onto a local region where good solutions have been found. An overview of the main proofs of convergence of metaheuristics to optimal solutions can be found in (Gutjahr, 2010).
Trajectory methods, in which the search process describes a trajectory in the search space and can be seen as the evolution in (discrete) time of a discrete dynamical system (e.g., simulated annealing (Kirkpatrick et al., 1983));
Memetic algorithms, which are hybrid global/local search methods in which a local improvement procedure is combined with a population-based algorithm (e.g., scatter search (Glover et al., 2003)).
In particular, evolutionary computing has been very successful in solving hard, multi-modal, multi-dimensional problems in many different tasks (e.g., parameter tuning(Rasku et al., 2019)). When the dimension of the search space is large, evolutionary computing allows one to perform an efficient directed search, taking inspiration from biological evolution to guide the search (Eiben and Smith, 2015). In (Konstantinov et al., 2019), the authors present an experimental comparison of evolutionary algorithms and random search algorithms to solve the problem of the optimal control of mobile robots, showing that evolutionary algorithms can find better solutions with the same number of fitness function calculations.
Genetic algorithms (GAs) are evolutionary algorithms inspired by the process of natural selection (survival of the fittest, crossover, mutation, etc.) (Goldberg, 1989) commonly used to solve optimization problems. In this paper we use a genetic algorithm to optimize the diffusion process, which is a promising approach for image retrieval whose performance depends on the setting of several parameters over different ranges.
3. Graphs and diffusion
The k-Nearest Neighbor (kNN) graph is an undirected graph denoted by , where is the set of nodes and represents the set of edges . The nodes represent the dataset images, while the edges are the connections between the nodes. The edges are weighted and these weights determine how much the connected images are similar: the larger the weight, the more similar the two images.
More formally, starting from a dataset , composed by images, and a similarity measure , it is possible to construct the kNN graph for . It contains edges between nodes and whose value is given by the similarity measure
. The similarity measure adopted can change depending on the topic. In our case, the cosine similarity is used, so the similarity is calculated through the application of the dot product between the image descriptors.
3.1. Approximate kNN graph creation
The creation of the kNN graph is an operation that usually requires much computation time. The approach that is used more frequently is brute-force, which consists in the connection of each node to all the others. In order to reduce computation time and resources, an approximate graph creation method can be used. There are different methods for constructing the approximate kNN graph. The main strategies are: methods following the divide and conquer strategy, and methods using a local search strategy (e.g., NN-descent (Dong et al., 2011)). The divide-and-conquer strategy is composed by: the subdivision of the dataset in subsamples (divide) and the brute-fore creation of the graphs for all the elements of the subsample (conquer).
We follow the idea of Magliani et al.(Magliani et al., 2019b) that exploits the LSH (Locality Sensitive Hashing) (Indyk and Motwani, 1998) to approximately split the elements in several buckets using the hash table mechanism. This method can reduce the time required for the creation of the kNN graph, maintaining or, in some cases, improving the final retrieval performance obtained after the diffusion application.
Fig. 1 shows two exemplar data distributions where the diffusion approach is capable to improve the final retrieval performance.
The diffusion is usually applied starting from the query point with the objective to find the neighbors, i.e. images which are the most similar to the query. As mentioned before, the diffusion can be applied only to a kNN graph, that is created based on the database images. The graph is mandatory because it helps to establish the best path from the query to the database points. It is also possible to exploit the similarities between the images (nodes on the graph) in order to find the best path from the database images to the query point on the graph. Indirectly, the nodes crossed on the graph to reach the query represent the neighbors of the query itself, finding which is the objective of the image retrieval task. The path to follow on the graph is chosen through the application of the random walk process in several iterations. The wrong paths are discarded exploiting the weights of the edges of the kNN graph, which indicate the similarity between two nodes: the greater the weight, the more similar the two nodes. Mathematically, the entire process is represented by a system of equation , where
is the affinity matrix of the database images (mathematical representation of the graph),represents the query vector and is the solution of the system (ranking vector).
4. Genetic Algorithms for diffusion parameters optimization
The diffusion process is regulated by several parameters, which can be optimized to improve the retrieval performance.
In this section we present the diffusion parameters and propose a genetic algorithm for diffusion parameter tuning.
4.1. Diffusion parameters
The diffusion approach consists in the resolution of the following equation system: . The diffusion applied in this paper is similar to the Google PageRank algorithm (Page et al., 1999) where a graph is solved by using diffusion iteratively. To achieve this result, the affinity matrix is modified as follows: , where represents the damping factor used in the Google algorithm to adjust the connections between nodes. In their case, the best value for this parameter is set to 0.85, which is obtained after executing many experiments. In our case, this parameter is a real value in the interval . Moreover, the elements of the sparse affinity matrix can be raised to power by a factor (, where ) in order to remove useless neighbors, similarly to the power iteration method (Mises and Pollaczek-Geiringer, 1929) used for the resolution of the PageRank problem. The same reasoning can also be applied to the query vector , where . The other parameters to optimize are: i) , that is the number of steps to execute on the graph during the random walk process; ii) , that is the number of neighbors to find; iii) the maximum number of iterations allowed for the algorithm to converge to the equation system solution (); iv) the number of database elements to be used during the application of the diffusion ().
4.2. Genetic algorithm
The diffusion parameters have been tuned using a genetic algorithm. Each individual corresponds to a specific setting of diffusion parameters and is represented by a string of seven values, corresponding to the seven parameters. The values have been set in the following ranges: , , , , , , .
The fitness function to be maximized corresponds to the mean Average Precision (mAP) obtained by the diffusion process in the retrieval phase. It identifies how many elements of an image dataset, on average, are found which are relevant to the query image. In order to compare a query image with the dataset images, the Euclidean distance is employed.
The initial population, of size
, is obtained by generating random individuals according to the constraints on the parameter ranges. During the selection operation, each individual is replaced by the best of three individuals extracted randomly from the current generation (tournament selection). The selected individuals are crossed with a probability, generating new individuals by means of a single-point crossover. An individual is mutated with a probability , while each gene is mutated with a probability . The population is then entirely replaced by the offspring (generational GA). The evolutionary process is iterated for generations.
A buffer has been introduced to store the best individuals (those leading to the largest mAP) found during the evolutionary process, and their corresponding fitness (mAP) values. Thus, at the end of the run, the best parameter setting can be found not only among the individuals of the last population, but also among the best ones found during the whole evolutionary process, which are stored in the buffer.
5. Experimental results
In this section we illustrate the experimental results we have obtained on three public datasets: Oxford5k, Paris6k and Oxford105k.
Mean Average Precision (mAP) is used on all image datasets to evaluate the accuracy in the retrieval phase.
The results of the GA optimization are compared to the results obtained by other commonly used techniques for parameter tuning.
To evaluate the optimization of the diffusion parameters, the experiments are applied on several CBIR public image datasets:
Oxford5k (Philbin et al., 2007) contains 5063 images belonging to 11 classes.
Paris6k (Philbin et al., 2008) contains 6412 images belonging to 12 classes.
Flickr1M (Huiskes and Lew, 2008) contains 1 million Flickr images used for large scale evaluation. The images are divided in multiple classes and are not specifically selected for the image retrieval task.
With the addition of 100k images of Flickr1M it is possible to create the dataset Oxford105k.
5.2. Results on Oxford5k
Different experiments have been executed on the Oxford5k dataset. In order to find the best configuration of the diffusion parameters, several combination of genetic algorithm parameters have been tested.
Tables 1-5 report the results obtained on Oxford5k by varying one parameter of the genetic algorithm at a time. Starting from a standard configuration of the GA (, and ), the number of generations and the population size have been varied from to , considering a maximum budget of fitness computations. The best configurations, as shown in Table 1 and 2, correspond to the largest numbers of fitness computations (, and , ). Since these configurations lead to the same mAP (), the remaining parameters of the GA have been varied starting from the configuration which is fastest to compute (, ).
Table 3 shows that the precision reaches its highest value for a crossover probability () of (). Regarding the mutation probability (), the best results have been achieved with values () and (), as shown in Table 4. Considering the mutation probability for each gene (), the highest precision has been achieved with value ().
Therefore, as shown in Table 5, the best set of parameters for the genetic algorithm thus obtained is: = , = , = , = , = . The corresponding configuration obtained for the diffusion parameters is: , , , , , , .
After this preliminary analysis, another set of experiments has been performed. The number of generations has been increased in order to check the convergence status of the GA, obtaining a further improvement in the mAP. It is to be noticed that this set of experiments is less structured than the previous one, due to the longer computation time. The best set of GA parameters thus obtained () is: generations, population size equal to , crossover probability set to , mutation probability to and mutation probability for each gene to . The corresponding configuration obtained for the diffusion parameters is: , , , , , , . Given the stochastic nature of the GA, five independent runs of the algorithm have been executed to assess how repeatable the results are (avg = 94.39%, stdev = 0.038, max = 94.44%, min = 94.34%).
|genetic algorithms||5000||17695 s||94.44%|
|PSO (Poli et al., 2007)||5000||27767 s||94.30%|
|random search (Bergstra and Bengio, 2012)||20000||27045 s||93.67%|
|grid search||200000||1036800 s||94.43%|
|manual configuration (Magliani et al., 2019b)||1||2 s||90.95%|
Table 6 reports the results of different optimization techniques applied on the diffusion process. For each technique the table shows the result of the best configuration found. The results have been compared in terms of mAP, running time and number of fitness computations.
The random search (Bergstra and Bengio, 2012)
has sampled, in this case, 20k configurations using uniform distribution for all the parameters to test.
The Particle Swarm Optimization (Poli et al., 2007) has been executed using the same number of fitness computation of the GA (population of particles, iterations). Moreover, the minimum speed is set to and the maximum speed to .
The grid search has been performed over 200k different parameter setting. Given the large number of fitness computations it can be seen as a brute-force strategy.
“Manual configuration” means that the configuration of the parameters of the diffusion mechanism was taken from the literature.
The “manual configuration” technique obviously requires less time than the other methods, but it obtains the worst final results. The genetic algorithms achieve an excellent result in much shorter time than the others. It is to be noticed that, in all the previous experiments, the GA has performed better than manual configuration and random search. Thus, only the manual configuration and the GAs have been tested on the other datasets. The results of PSO are comparable, but the computation time required to perform the same number of fitness computations as the GA is longer.
5.3. Results on Paris6k
|genetic algorithms||18787 s||97.32%|
|manual configuration (Magliani et al., 2019b)||4 s||97.01%|
Table 7 reports the results of different optimization methods applied on Paris6k. The best result (97.32%) has been obtained with the following GA configuration: , , , , . The final configuration of the diffusion parameters is: , , , , , , .
As in the previous dataset, the GAs need more computation time than the “manual configuration”, but they improve the final performance of the diffusion process for retrieval.
5.4. Results on Oxford105k
Given the large dimension of Oxford105k dataset, the ranges of parameters and have been extended to and , respectively.
Table 8 reports the results of different optimization methods applied on Oxford105k.
|genetic algorithms||63911 s||94.20%|
|manual configuration (Magliani et al., 2019b)||13 s||92.50%|
The best result (94.20%) is obtained with the following GA configuration: , , , , . The final configuration of the diffusion parameters is: , , , , , , .
The “manual configuration” is faster than the GAs, but the final performance is very different: the GAs obtain 94.20% while the ”manual configuration” achieves only 92.50%.
In this paper we propose to use genetic algorithms for searching the optimal configuration of the diffusion parameters using kNN graphs within the field of Context-Based Image Retrieval (CBIR). By applying genetic algorithms to this optimization problem, a better set of parameters has been obtained, resulting in a higher precision of the retrieval when applied to several public image datasets. Comparing our method with other techniques, as random search, grid search and PSO, our optimization approach is faster and obtains the same or better retrieval results. It should be noticed that, despite our objective to find a common set of parameters for all the datasets, it turns out that the optimization needs to be tailored on a specific dataset in order to achieve the best result.
Finally, we will further study the dependence of the GA on its parameters, to improve its effectiveness using Meta-EAs, methods that tune the parameters of evolutionary algorithms to optimize their performance.
The work by Federico Magliani and Laura Sani was funded by Regione Emilia Romagna within the “Piano triennale alte competenze per la ricerca, il trasferimento tecnologico e l’imprenditorialità” framework. The work of Laura Sani was also co-funded by Infor srl.
Aggregating local deep features for image retrieval. In
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1269–1277. Cited by: §1.
- An overview of evolutionary algorithms for parameter optimization. Evolutionary computation 1 (1), pp. 1–23. Cited by: 1st item.
- Random search for hyper-parameter optimization. Journal of Machine Learning Research 13 (Feb), pp. 281–305. Cited by: §2, §5.2, Table 6.
- Algorithms for hyper-parameter optimization. In Advances in neural information processing systems, pp. 2546–2554. Cited by: §2.
- Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th International Conference on World Wide Web, pp. 577–586. Cited by: §3.1.
- Introduction to evolutionary computing. 2nd edition, Springer Publishing Company, Incorporated. External Links: Cited by: §2.
- Parameter control in evolutionary algorithms. IEEE Transactions on evolutionary computation 3 (2), pp. 124–141. Cited by: §2.
- Computational intelligence: an introduction. 2nd edition, Wiley Publishing. External Links: Cited by: §2.
- Bohb: robust and efficient hyperparameter optimization at scale. arXiv preprint arXiv:1807.01774. Cited by: §2.
- DEAP: evolutionary algorithms made easy. Journal of Machine Learning Research 13 (Jul), pp. 2171–2175. Cited by: §4.2.
- Scatter search. In Advances in evolutionary computing, pp. 519–537. Cited by: 3rd item.
- Handbook of metaheuristics. Vol. 57, Springer Science & Business Media. Cited by: §2.
- Genetic algorithms in search, optimization and machine learning. 1st edition, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. External Links: Cited by: §2.
- Deep image retrieval: learning global representations for image search. In European Conference on Computer Vision, pp. 241–257. Cited by: §1.
- End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision 124 (2), pp. 237–254. Cited by: §1.
- Optimization of control parameters for genetic algorithms. IEEE Transactions on systems, man, and cybernetics 16 (1), pp. 122–128. Cited by: §2.
- Convergence analysis of metaheuristics. In Matheuristics: Hybridizing Metaheuristics and Mathematical Programming, V. Maniezzo, T. Stützle, and S. Voß (Eds.), pp. 159–187. External Links: Cited by: §2.
- Automated algorithm configuration and parameter tuning. In Autonomous search, pp. 37–71. Cited by: §2.
- The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia Information Retrieval, pp. 39–43. Cited by: 3rd item.
Approximate nearest neighbors: towards removing the curse of dimensionality.
Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp. 604–613. Cited by: §3.1.
- Efficient diffusion on region manifolds: recovering small objects with compact CNN representations.. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 3. Cited by: §1.
- Cross-dimensional weighting for aggregated deep convolutional features. In European Conference on Computer Vision, pp. 685–701. Cited by: §1.
- Parameter control in evolutionary algorithms: trends and challenges.. IEEE Transactions on Evolutionary Computation 19 (2), pp. 167–187. Cited by: 2nd item.
- Optimization by simulated annealing. science 220 (4598), pp. 671–680. Cited by: 2nd item.
- Comparative research of random search algorithms and evolutionary algorithms for the optimal control problem of the mobile robot. Procedia Computer Science 150, pp. 462–470. Cited by: §2.
- Landmark recognition: from small-scale to large-scale retrieval. In Recent Advances in Computer Vision, pp. 237–259. Cited by: §1.
- An efficient approximate knn graph method for diffusion on image retrieval. arXiv preprint arXiv:1904.08668. Cited by: §1, §3.1, Table 6, Table 7, Table 8.
- An accurate retrieval through R-MAC+ descriptors for landmark recognition. In Proceedings of the 12th International Conference on Distributed Smart Cameras, pp. 6. Cited by: §1.
- A survey on image segmentation using metaheuristic-based deformable models: state of the art and critical analysis. Applied Soft Computing 44, pp. 1–29. Cited by: §2.
- Praktische verfahren der gleichungsauflösung.. ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik 9 (2), pp. 152–164. Cited by: §4.1.
- Tuners review: how crucial are set-up values to find effective parameter values?. Engineering Applications of Artificial Intelligence 76, pp. 108–118. Cited by: §2.
- The pagerank citation ranking: bringing order to the web.. Technical report Stanford InfoLab. Cited by: §4.1.
- Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: 1st item.
- Lost in quantization: improving particular object retrieval in large scale image databases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Cited by: 2nd item.
- Particle swarm optimization. Swarm intelligence 1 (1), pp. 33–57. Cited by: 1st item, §5.2, Table 6.
- On automatic algorithm configuration of vehicle routing problem solvers. Journal on Vehicle Routing Algorithms. External Links: Cited by: §2.
- Investigating the parameter space of evolutionary algorithms. BioData Mining 11 (1), pp. 2. Cited by: §2.
- What can we learn from multi-objective meta-optimization of evolutionary algorithms in continuous domains?. Mathematics 7 (3), pp. 232. Cited by: §2.
- Efficient image retrieval via decoupling diffusion into online and offline processing. arXiv preprint arXiv:1811.10907. Cited by: §1.
- Ranking on data manifolds. In Advances in Neural Information Processing Systems, pp. 169–176. Cited by: §1.