1 Introduction
Usually, huge volumes of data are required to train deep neural networks for target applications such as image classification, speech recognition and machine translation. Moreover, in case that new data sets are given, conventional deep learning methods start training networks from scratch. This may take days to weeks with many high performance devices like GPUs. Using continually accumulated information by study or experience, on the other hand, humans are able to learn novel tasks much more efficiently with a few examples or even just one example
(Carey & Bartlett, 1978).One of the attempts to bridge the gap between the human learning and deep learning is metalearning or learningtolearn (Thrun & Pratt, 1998)
, where neural networks are trained to quickly adapt to new environments or solve unseen tasks during the training phase with a limited number of examples. Fewshot regression and classification have been considered as typical tasks of metalearning in the supervised learning domain and addressed in many different ways such as recurrent modelbased methods
(Munkhdalai & Yu, 2017; Mishra et al., 2018), optimizationbased methods (Ravi & Larochelle, 2017; Finn et al., 2017), metricbased methods (Vinyals et al., 2016; Snell et al., 2017), and the variants or combinations of these ones (Lee & Choi, 2018). Most above approaches adopted episodic training framework where a collection of tasks (e.g., N different classes, K examples sampled from each class) was considered as a training data set in each learning step without pretrained large networks utilized in recently proposed methods (Oreshkin et al., 2018; Gidaris & Komodakis, 2018; Qiao et al., 2018).Similar to nearest neighbor algorithms, metricbased approaches seem to be the simplest and most efficient way to solve fewshot learning. In Prototypical Networks (Snell et al., 2017), for example, a good representation space is learned so that each class has one representative called a prototype, and examples of that class are clustered around the corresponding prototype in the learned space. Then, new examples are classified by choosing the nearest prototype. In the representation space, a prototype for a class is defined as a mean of examples which belong to the corresponding class, and this choice is justified when distances are computed from a Bregman divergence (Banerjee et al., 2005b).
Our method is based on the same inductive bias, that is, such prototypes are assumed to be in the embedding space. We consider a von MisesFisher (vMF) mixture model on the learned feature space, and choose prototypes from a discriminative model point of view. Our approach resorts to popular belief that discriminiative models usually solve classification problems better than generative models. However, it is hard to obtain a closedform solution for discriminative parameters in our setting. Thus, we derived an approximated solution with an additional neural network.
The reason for choosing the vMF distribution other than the Gaussian distribution is that, for the vMF, we have only one controllable parameter
which acts as a variance. For a Gaussian distribution, the number of parameters in its covariance matrix is proportional to the square of the number of variables. In addition, the vMF distributions are known to be more expressive than the Gaussian in a high dimensional space. We can also observe that the naturally induced classifiers based on the vMF mixture model have a form of a softmax over scaled cosine similarity, which was successfully applied in recent fewshot learning literature
(Gidaris & Komodakis, 2018; Qiao et al., 2018; Qi et al., 2018).In this paper, we propose a new metricbased fewshot learning algorithm: (i) based on directional statistics realized by a mixture of the vMF distributions, (ii) from a discrimination perspective considering interclass correlation which has not been addressed by conventional algorithms. In summary, our key contribution is to propose a novel prototype generator based on a theoretical basis with vMF mixture model viewpoint. We show the effectiveness of the proposed algorithm on two widely used fewshot classification benchmarks, miniImageNet and tieredImageNet.
The remainder of the paper is organized as follows. In Section 2, we summarize related work. With notation of Section 3, we describe the proposed algorithm in Section 4. In Section 5, experimental results are givien. Finally, we conclude and offer some directions for future research in Section 6.
2 Related Work
Metricbased fewshot learning aims to learn a feature representation in such a way that samples from the same category could be clustered together in the learned representation space. Matching Networks (Vinyals et al., 2016) use attention LSTMs to create the contextual embedding considering both a few labeled examples (i.e., support set) and new instances. Prototypical Networks (Snell et al., 2017) assume that each class has one prototype and a classifier can assign classes to given samples by choosing the nearest prototype. Our method is based on the similar idea, but the prototypes are created differently on a theoretical basis for the use of the von MisesFisher (vMF) distribution from the discriminative approach point of view. In Relation Network (Sung et al., 2018), there are two neural networks; one for general embedding and the other for metric learning. Our method uses an additional neural network to learn a metric, but it is used to measure the distance between support examples only while distances between support and query ones are measured in Relation Network.
Pretrained neural networks can be utilized to supply not only the appropriate embedding but also a good configuration for final classifiers. In (Qiao et al., 2018), activations from the pretrained networks are used to train another network whose outputs are predicted parameters for the last layer. In (Gidaris & Komodakis, 2018), a classifier is trained first, then an attention network on the learned parameters is learned to create parameters for novel examples. These ideas have similarity to our method, but we use a neural network to indirectly create parameters through an induced formula. Moreover, our method does not resort to the pretraining.
Some recent fewshot learning methods adopt scaled cosine similarity for the final softmax layer. In
(Gidaris & Komodakis, 2018), the cosine similarity is used to overcome the difference in weight magnitudes of seen and novel categories. In (Qi et al., 2018), the cosine similarity based recognition model is trained with scaling factors because the softmax function whose output is in cannot produce a groundtruth onehot distribution. The importance of scaling factor is also explained in (Oreshkin et al., 2018) by observing how the gradients change when the scale factor becomes close to extreme cases. We show that the form of scaled cosine similarity can be naturally induced from a vMF mixture model in fewshot learning cases (Section 4).Finally, it is known that directional statistics are better at modeling directional data (Mardia, 1975). It is observed that the vMF distribution is more appropriate prior than the Gaussian distribution for learning representations of hyperspherical data (Davidson et al., 2018). The vMF distribution also has been used for various image classification tasks (Hasnat et al., 2017; Wang et al., 2018; Zhe et al., 2018), clustering (Banerjee et al., 2005a; Gopal & Yang, 2014), and machine translation (Kumar & Tsvetkov, 2019).
3 Preliminaries
We consider the standard episodic fewshot learning setting (i.e., way shot). Each episode consists of a support set and a query set , where are dimensional examples and denote the corresponding classes. Let be the set of examples of class from , i.e., .
Von MisesFisher Distribution On the dimensional hypersphere , the von MisesFisher distribution (vMF) is defined as
where is called the mean direction, is the concentration parameter, and is the normalizing constant which is equal to
Here, is the modified Bessel function of the first kind at order which cannot be written as a closed formula. The vMF distribution shares a property with the multivariate Gaussian distribution in the sense that the maximum entropy density on subject to the constraint that is fixed is a vMF density (Rao, 1973).
4 Methodology
We assume that there is an encoder that maps samples to a representation space. With abuse of notation, we write instead of normalized in this section for simplicity. The key concept of our method and process for handling an episode are illustrated in Figure 1.
4.1 vMF Mixture Model and Generative Parameters
First of all, we try to find generative parameters for fewshot classification tasks using a mixture of vMFs. Suppose each class has its own vMF distribution that generates examples of class . Let be the latent variable for so that if is from class . With a prior , we can write
which represents a mixture of vMFs.
In the episodic fewshot learning, we know all the labels of the support set elements during the training phase. Thus, for a support element ,
by the assumption.
Since we consider the same number of examples from each class in fewshot learning settings, it is reasonable to assume that we have uniform prior . For simplicity, we also assume all the coincides, i.e., for all . Then, note that
(1) 
which is a softmax over scaled cosine similarities.
Now, let
We will treat
as a hyperparameter, and try to find the parameters
that maximizes or , which we call generative parameters and discriminative parameters, respectively. Since we use the vMF distribution, we have the constraints that for all . To solve this, we set the following Lagrangians:First, we solve the generative parameters. By taking partial derivatives of with respect to and , we have
By setting the above equations to zero and solving, one can check that the solutions are
With the vMF distribution, similar argument was also presented in (Banerjee et al., 2005a). Note that
intuitively represents the mean direction in the hypersphere. This is not exactly same as the mean vector defined in Prototypical networks even if the cosine similarity would be used since
is the mean of normalized representations on the hypersphere.4.2 Main Algorithm
Now we try to find the discriminative parameters. As we have done in the previous subsection, we take partial derivatives of with respect to :
Again, by setting above equations to zero and solving, we have the following equation for local optima :
(2) 
where Norm is the normalizing operator to make a unit vector. The second equality follows from Equation (4.1). Note that now depends not only on examples of class as in generative parameters, but also on examples of other classes. Strictly speaking, we have two possibilities for the sign in the above equation. Since the generative parameters are means of the corresponding class, we take the plus sign to have positive coefficients on the examples of the corresponding class. Empirically, taking all the minus signs proved to have bad performance.
To use Equation (4.2), we make a learnable function , whose output is for each and class , that substitutes . This function is implemented by a neural network and used to approximate by :
(3) 
Then, for a query example , we can classify with the following class distribution
Hence with a query set, we define the loss
and jointly train and by gradient descent. The overall procedure of our method is given in Algorithm 1.
4.3 Description of
Here we describe how to construct the function described in the previous subsection. We will train a neural network such that we expect it to operate like a metric. So will output a real number for any but we do not impose it to have properties of a metric such as symmetry or nonnegativity. The detailed architecture of will be given in Section 5.
Given such , we compute all the pairwise distance , and let . Finally, we define
(4) 
The motivation of this definition is that, since represents as shown in Equation (3), we want to be larger compared to if and . After training the networks, we expect that the features from the images of will be clustered according to classes. Thus, will be relatively small if and are from the same class. The similar result will happen to , thus we can satisfy our goal by Equation (4). Note that the property also holds for .
5 Experiments
The proposed fewshot learning method was compared with strong baselines. To evaluate the performance of different approaches, we considered two widely used benchmark datasets for the fewshow classification: miniImageNet (Vinyals et al., 2016) and tieredImageNet datasets (Ren et al., 2018).
5way  

Model  1shot  5shot 
MatchingNet (Vinyals et al., 2016)^{*}  
MatchingNet FCE (Vinyals et al., 2016)^{*}  
MetaLearner LSTM (Ravi & Larochelle, 2017)  
ProtoNet (Snell et al., 2017)  
RelationNet (Sung et al., 2018)  
Ours  Metric1 (M1)  
Ours  Metric2 (M2) 
ImageNet data with 95% confidence intervals (
^{*}No reported confidence intervals)5way  

Model  1shot  5shot 
ProtoNet (Snell et al., 2017)^{*}  
Ours  Metric1 (M1)  
Ours  Metric2 (M2) 
5.1 Network Architecture
For the encoder network , we followed the same architecture used in (Vinyals et al., 2016; Snell et al., 2017; Sung et al., 2018), which is composed of four convolutional blocks. In the first three blocks, each block is composed of a convolutional layer composed of
filters, followed by a batch normalization layer, a ReLU layer, and a
max pooling layer. The last block only has a convolutional layer and a max pooling layer of the same size to have features laid on the entire hypersphere. In all of our experiments, images of size are used, so the features encoded by the become 1,600 () dimensional vectors.For the distance metric network , we propose two neural networks. The first one, which we call M1, consists of a flatten layer, a substraction layer and a twolayered MLP. For the ordered input pair , we first flatten each input and the substraction layer outputs . After that, the first fullyconnected layer reduces the dimension to 8 and the output is additionally processed by a ReLU layer. Then the output is fed to the the second fullyconnected layer which reduces the dimension to 1. This metric network has only parameters, which is only a small increase compared to that of the encoder ().
The second distance metric called M2 was inspired by the Relation Network (Sung et al., 2018). The architecture is the same as that of the relation module of Relation Network, which has a concatenation layer in depth, two convolutional blocks as in M1 followed by two fullyconnected layers with ReLU of size 8 and 1, respectively. Because of its complexity, this network is heavier than M1 in terms of the number of parameters.
5.2 MiniImageNet Results
MiniImageNet dataset, proposed by (Vinyals et al., 2016), is a subset of the ILSVRC12 ImageNet dataset (Russakovsky et al., 2015). The original ImageNet dataset is a notoriously huge dataset composed of more than a million color images depicting 1,000 object categories, which consumes a large amount of resources to train deep neural networks for their classification. MiniImageNet was proposed to reduce the burden. It contains 60,000 color images of size from 100 classes, each having 600 examples. We adopted class splits of (Ravi & Larochelle, 2017). These splits used 64 classes for training, 16 for validation, and 20 for test.
Through
, we encoded both the support sets and the query sets while only the support sets were used to build the prototypes. All the models were trained via stochastic gradient descent using Adam optimizer with an initial learning rate
.We experimented our algorithm on 5way 1shot and 5way 5shot classification cases. The models with metric 1 (M1) were trained using 35way episodes for 1shot classification and 20way episodes for 5shot classification with . The models with metric 2 (M2) were also trained using 30way and 15way episodes with for 1shot and 5shot classification problems, respectively. We used 15 query points for both the train and test data per episode. The proposed method was compared with recent relevant studies, Matching Network (Vinyals et al., 2016), Metalearner LSTM (Ravi & Larochelle, 2017), Prototypical Network, (Snell et al., 2017), and Relation Network (Sung et al., 2018), which had similar base encoder architectures and training strategies as ours, i.e., a fourlayer convolutional network with 64 filters without pretraining, for fair comparisons.
As shown in Table 1
, our method had the best performance in both 1shot and 5shot experiments with the accuracy 52.44% and 68.60%. In case of the 5shot experiments, the accuracy difference between ours and the Prototypical Network was smaller than the standard deviation, however, ours outperformed the prototypical network with greater accuracy improvement than the standard deviation in case of the 1shot problems.
5.3 TieredImageNet Results
TieredImageNet dataset was first proposed by (Ren et al., 2018). It is also a subset of the ILSVRC12 ImageNet dataset (Russakovsky et al., 2015) like the miniImageNet dataset. However, it contains more subsets than the miniImageNet, containing 608 classes whereas the miniImageNet contains 100 classes. In the TieredImageNet dataset, more than 600 classes are grouped into broader 34 categories which correspond to the nodes located at the higher positions of the ImageNet hierarchy (Deng et al., 2009). Each broader category contains 10 to 30 classes. These 34 broader categories are split into 20 train sets, 6 validation sets, and 8 test sets.
Same as in the miniImageNet experiments, the models were trained using Adam optimizer with an initial learning rate . Models with M1 were trained using 30way episodes for 1shot classification and 25way episodes for 5shot classification with . Models with M2 were trained using 35way and 25way episodes with . We used 15 query points for both the train and test data per episode same as we did in the miniImageNet experiments.
In the tieredImageNet experiments, our method with metric M1 showed the best performance in both 1shot and 5shot experiments with the accuracy 52.94% and 69.44% as shown in the Table 2. There were tiny differences between the metric M1 and the metric M2, which were even smaller than the standard deviations. However, our method outperformed Prototypical Networks in both 1shot and 5shot problems.
5.4 Effects of Scaling the Concentration Parameter
5.4.1 Classification Accuracy
We study how the concentration parameter of the von MisesFisher distribution affects the performance of our method. The concentration parameter characterizes how strongly the vectors drawn from are concentrated around the mean direction . When , is equivalent to a uniform density on and when , tends to be a point density. Thus, when is too small, the vMFs of different classes are indistinguishable. On the other hand, when
is too large, only those points that are sufficiently close to mean directions can have decent probability.
Figure 2 shows how the classification accuracy changes over various . The graphs have concave forms where the accuracy decreases when is too small or too big. This implies that there exists an optimal value which gives the best description of the class distribution and the accuracy reaches the highest point.
5.4.2 Configuration of features
We further explore how affects the distribution of features which lie on the hypersphere by measuring pairwise cosine similarities of feature vectors. Figure 3 shows how cosine similarity values between these feature vectors vary over various values when applying tieredImageNet test examples to the encoder.
We can see that as the increases, the whole features converge to a point, which can be read from a narrower range of the cosine similarities. Although the distribution of features changes drastically according to the value of , the features of a particular class (“intraclass”) are more concentrated than those of all classes (“all pairs”), and the mean values of pairwise cosine similarities between examples from the same classes are consistently higher than those of all pairs with a clear distinction. This explains partly why our algorithm shows decent classification accuracy over various values.
6 Conclusion
We have proposed an algorithm for fewshot learning based on the von MisesFisher mixture model. To the best of our knowledge, this is the first research that uses vMF distribution on fewshot learning. Furthermore, we propose a novel method that approximates discriminative parameters based on a theoretical basis. Together with an embedding network, our method trains another network whose outputs are prototypes for each episode. Using two standard image datasets, we showed that our method outperformed other baseline approaches that have the similar convolutional encoders as ours in terms of their capacity and training method.
Much deeper encoders such as Wide Residual Networks (Zagoruyko & Komodakis, 2016) than ours were adopted in recently proposed fewshot learning methods (Rusu et al., 2019; Ye et al., 2018). These encoders were usually pretrained before the fewshot learning phase. Thus, large networks and their pretraining could be considered in our future work.
As shown in our results, the scaling factor (i.e., concentration parameter ) has an effect on the classification accuracy. Further improvement can be made by designing an algorithm that automatically chooses the optimal value for this parameter.
References
 Banerjee et al. (2005a) Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. Clustering on the unit hypersphere using von misesfisher distributions. J. Mach. Learn. Res., 6:1345–1382, December 2005a. ISSN 15324435.
 Banerjee et al. (2005b) Banerjee, A., Merugu, S., Dhillon, I. S., and Ghosh, J. Clustering with bregman divergences. J. Mach. Learn. Res., 6:1705–1749, December 2005b. ISSN 15324435.
 Carey & Bartlett (1978) Carey, S. and Bartlett, E. Acquiring a single new word. Proceedings of the Stanford Child Language Conference, 15:17–29, 01 1978.

Davidson et al. (2018)
Davidson, T. R., Falorsi, L., De Cao, N., Kipf, T., and Tomczak, J. M.
Hyperspherical variational autoencoders.
34th Conference on Uncertainty in Artificial Intelligence (UAI18)
, 2018.  Deng et al. (2009) Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and FeiFei, L. Imagenet: A largescale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248–255. IEEE, 2009.

Finn et al. (2017)
Finn, C., Abbeel, P., and Levine, S.
Modelagnostic metalearning for fast adaptation of deep networks.
In Precup, D. and Teh, Y. W. (eds.),
Proceedings of the 34th International Conference on Machine Learning
, volume 70 of Proceedings of Machine Learning Research, pp. 1126–1135, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.  Gidaris & Komodakis (2018) Gidaris, S. and Komodakis, N. Dynamic fewshot visual learning without forgetting. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 Gopal & Yang (2014) Gopal, S. and Yang, Y. Von misesfisher clustering models. In Xing, E. P. and Jebara, T. (eds.), Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pp. 154–162, Bejing, China, 22–24 Jun 2014. PMLR.
 Hasnat et al. (2017) Hasnat, M. A., Bohné, J., Milgram, J., Gentric, S., and Chen, L. Von misesfisher mixture modelbased deep learning: Application to face verification. CoRR, abs/1706.04264, 2017. URL http://arxiv.org/abs/1706.04264.
 Kumar & Tsvetkov (2019) Kumar, S. and Tsvetkov, Y. Von misesfisher loss for training sequence to sequence models with continuous outputs. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJlDnoA5Y7.
 Lee & Choi (2018) Lee, Y. and Choi, S. Gradientbased metalearning with learned layerwise metric and subspace. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 2927–2936, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
 Mardia (1975) Mardia, K. V. Statistics of Directional Data. Journal of Royal Statistical Society, 1975. ISSN 00359246. doi: doi:10.2307/2984782.
 Mishra et al. (2018) Mishra, N., Rohaninejad, M., Chen, X., and Abbeel, P. A simple neural attentive metalearner. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=B1DmUzWAW.
 Munkhdalai & Yu (2017) Munkhdalai, T. and Yu, H. Meta networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 2554–2563, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.
 Oreshkin et al. (2018) Oreshkin, B. N., Rodríguez, P., and Lacoste, A. Tadam: Task dependent adaptive metric for improved fewshot learning. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31, pp. 719–729. Curran Associates, Inc., 2018.
 Qi et al. (2018) Qi, H., Brown, M., and Lowe, D. G. Lowshot learning with imprinted weights. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 Qiao et al. (2018) Qiao, S., Liu, C., Shen, W., and Yuille, A. L. Fewshot image recognition by predicting parameters from activations. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 Rao (1973) Rao, C. R. Linear Statistical Inference and its Applications. Wiley, 1973. doi: 10.1002/9780470316436.
 Ravi & Larochelle (2017) Ravi, S. and Larochelle, H. Optimization as a model for fewshot learning. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rJY0Kcll.
 Ren et al. (2018) Ren, M., Triantafillou, E., Ravi, S., Snell, J., Swersky, K., Tenenbaum, J. B., Larochelle, H., and Zemel, R. S. Metalearning for semisupervised fewshot classification. In Proceedings of 6th International Conference on Learning Representations ICLR, 2018.
 Russakovsky et al. (2015) Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and FeiFei, L. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s112630150816y.
 Rusu et al. (2019) Rusu, A. A., Rao, D., Sygnowski, J., Vinyals, O., Pascanu, R., Osindero, S., and Hadsell, R. Metalearning with latent embedding optimization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=BJgklhAcK7.
 Snell et al. (2017) Snell, J., Swersky, K., and Zemel, R. Prototypical networks for fewshot learning. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30, pp. 4077–4087. Curran Associates, Inc., 2017.
 Sung et al. (2018) Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P. H., and Hospedales, T. M. Learning to compare: Relation network for fewshot learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 Thrun & Pratt (1998) Thrun, S. and Pratt, L. (eds.). Learning to Learn. Kluwer Academic Publishers, Norwell, MA, USA, 1998. ISBN 0792380479.
 Vinyals et al. (2016) Vinyals, O., Blundell, C., Lillicrap, T., and Wierstra, D. Matching networks for one shot learning. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 29, pp. 3630–3638. Curran Associates, Inc., 2016.

Wang et al. (2018)
Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., and Liu, W.
Cosface: Large margin cosine loss for deep face recognition.
In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.  Ye et al. (2018) Ye, H.J., Hu, H., Zhan, D.C., and Sha, F. Learning embedding adaptation for fewshot learning. CoRR, abs/1812.03664, 2018.
 Zagoruyko & Komodakis (2016) Zagoruyko, S. and Komodakis, N. Wide residual networks. CoRR, abs/1605.07146, 2016. URL http://arxiv.org/abs/1605.07146.
 Zhe et al. (2018) Zhe, X., Chen, S., and Yan, H. Directional statisticsbased deep metric learning for image classification and retrieval. CoRR, abs/1802.09662, 2018. URL http://arxiv.org/abs/1802.09662.
Comments
There are no comments yet.