Machine learning models, especially neural networks, have shown great potential in many fields in recent years [6, 5, 4]. Most of these existing models follow an implicit assumption, that is, the classes of the testing samples must exist in the classes set of the training samples. However, in real-life applications, sometimes the classes of testing samples never appear in the training data set. In this case, we call the classes of these testing samples unseen classes, which corresponds to the classes that have appeared in the training data set (i.e., the seen classes). Traditional machine learning algorithms are often unable to accurately predict the classes of the samples belonging to the unseen classes. Zero-Shot Learning (ZSL) is one of the effective algorithms proposed to solve this problem [19, 25, 1, 39, 23] .
In ZSL, both the seen and unseen classes are described by specific semantic vectors in the semantic space (i.e., the side information). The commonly used semantic vectors include the attribute, word2vec , sentences , and gaze . For a specific data set, this side information is encoded into vectors with the same dimension, and each class corresponds to a specific vector. ZSL establishes the mapping relationship between the visual space where the images’ feature vectors of the seen classes are located and the semantic space where the side information is located to obtain the learner with strong generalization ability and then applies it to predict the classes of the testing samples. In other words, ZSL directly builds the internal relationship between the visual space and the semantic space based on the training samples of the seen classes and then extends it to predict the labels of the testing samples.
According to the different prediction targets of ZSL, the current ZSL algorithms can be grouped into two categories: Conventional Zero-Shot Learning (CZSL)  and Generalized Zero-Shot Learning (GZSL) . They differ in that the testing samples of CZSL come only from the unseen classes, while the testing samples of GZSL can come from both the seen and unseen classes. Compared with CZSL, GZSL is more consistent with practical applications, because the samples tested may come from both the seen and unseen classes in reality. However, GZSL suffers from a serious bias problem [38, 45]
, that is, since the ZSL model has never seen samples of the unseen classes, it tends to predict the labels of these samples as the classes of the seen classes similar to their real labels. Although ZSL has great potential advantages in solving real-world tasks, the research on ZSL is still in its infancy due to the difficulty of the problem, many fundamental problems have not been effectively solved, such as the quality of feature extraction in ZSL cannot be guaranteed at present.
Specifically, most of the current ZSL algorithms use pre-trained models, which are usually trained on the ImageNet data set , to transfer the training data set of the ZSL tasks into feature vectors, and then focus on building the mapping relationship between the feature space and the semantic space. This method improves the feature extraction ability of the ZSL model on the ZSL training data set with the help of the rich feature information embedded in the pre-trained models. However, intuitively, if the difference between the training data set of the current ZSL task and the training data set used by the pre-trained model is too large, this method may not work well. For example, given two pre-trained models and , suppose is trained with the data set containing only fruit images, while
is trained with the data set containing only cat images. If the current ZSL task is to classify the dog species, then intuitively usingmay be more beneficial to the accurate classification of the final ZSL model. In other words, we think that if there is a strong correlation between the data set used by the pre-trained model and the training data set of the current ZSL task, then the pre-trained model may have a positive impact on the predictive ability of the ZSL model and vice versa. However, it is too expensive to collect and label a relevant large data set like ImageNet to train a specific pre-trained model. The best compromise is to fine-tune the existing pre-trained models to fit the current task.
To solve this problem, based on the idea of multi-task learning, we design a dual-channel learning framework to enhance the feature extraction ability of the pre-trained model used in ZSL by using auxiliary data sets. Specifically, we choose some image samples from ImageNet that are most relevant to the seen classes of the current ZSL task to form the auxiliary data set and then put it and the original ZSL training data set into our proposed framework to train the model. The auxiliary data set can regularize the feature extractor of ZSL (i.e., the pre-trained model) and make it provide more relevant features for the current task.
But how to choose the relevant auxiliary data set? In other words, how to measure the correlation strength between an image data set and the current image classification task? It is still very difficult to measure the similarity of two image data sets mathematically. However, this problem is not difficult for human beings, even for children. Because biologists have built a relatively complete body of knowledge to distinguish the relationship between two species.
Inspired by this observation, we propose a novel biological taxonomy-based data set selection method to help us to select the auxiliary data set. Specifically, biologists currently divide the degree of kinship of all things in the world into seven levels: , , , , , , and . is the most basic and specific taxonomic rank. In other words, the same means the strongest correlation. With the gradual expansion of the taxonomic rank (i.e., ), the degree of correlation gradually decreases. We use this biological knowledge system to guide the selection of the auxiliary data set.
To our best knowledge, we are the first to design the dual-channel learning framework for ZSL to enhance its feature extractor and introduce the biological correlation to select the auxiliary data set. The main contributions of this study are summarized as follows.
(1) Based on the idea of multi-task learning, we propose a dual-channel learning framework for ZSL, which can enhance the feature extractor of ZSL with the help of the auxiliary data set and improve the generalization ability of the ZSL model.
(2) We propose a novel auxiliary data set selection strategy based on the knowledge of biological taxonomy, which can effectively measure the correlation degree between image data sets.
(3) Our study found that under the learning framework we proposed, the performance of the ZSL model shows a linear increasing trend with the increasing degree of the correlation between the auxiliary data set and the current task, which implies that the performance of the ZSL model can be greatly improved when the feature extractor is fine-tuned by using the most relevant auxiliary tasks. The findings promise to provide a new way of thinking for a follow-up study of the ZSL algorithm.
(4) The experimental results on three benchmark datasets show that our proposed method can effectively improve the generalization ability of the ZSL model and achieve state-of-the-art results on three benchmark ZSL tasks. Moreover, we use the method of feature visualization to explain the experimental phenomena.
The remainder of this paper is organized as follows. In Sec. II, we introduce the necessary preliminaries including the general knowledge of biological taxonomy, multi-task learning, and zero-shot learning. We give the details of our proposed method in Sec. III. The experimental results and the corresponding analysis are given in Sec. IV. In Sec. V, we conclude this paper.
Ii-a General knowledge of biological taxonomy
Biological taxonomy is an important branch of biological research. Its goal is to clarify the kinship between different organisms. The earliest related research can be traced back to 1735 [22, 21]. Linnaeus, a Swedish botanist, proposed to divide nature into three realms: plant, animal, and mineral. Plant and animal are further divided into four levels: , , , and , thus forming an early classification system. Since then, with the continuous improvement of other biologists, a seven-level classification system of ”------” has been formed (as shown in Fig. 1). Next, we briefly introduce this classification system.
As shown in Fig. 1, the smallest and most basic unit is . If two organisms are belonging to the same species, it means they can share a genetic heritage and produce offspring by mating. For example, two Vulpes Lagopus have the closest relationship because they are the same species.
One level is higher than is , which refers to a group of evolved from a relatively recent common ancestor. For example, Canis Lupus Lycaon and Poodle are animals of the same (i.e., Canis), but their kinship is slightly farther than that of the same .
Similarly, the level of is higher than that of , and the related belong to the same . For example, Vulpes Lagopus and Canis Lupus Lycaon belong to the same but different (as shown in Fig. 1).
The is subordinate to the , the is subordinate to the , the is subordinate to the , and the is subordinate to the . Correspondingly, as the scope of the concept expands, the kinship gradually decreased.
The closer the kinship, the more common the creatures have, the more similar their characteristics. This law provides us with a solution to measure the degree of correlation of biological image data sets.
Ii-B Multi-task learning
Modeling some real-life applications can sometimes be difficult to collect enough training data and expensive to accurately label the samples, such as the medical data and rare species data. Multi-task learning (MTL) [28, 44] is a technique that can use relevant tasks to assist the decision-making of the model, which can alleviate the problem of data scarcity to some extent. The learning characteristic of MTL is similar to that of human beings, that is, humans can acquire complementary knowledge from related tasks, thereby better solving the current task. MTL also improves the generalization ability of the model by fusing useful information among multiple related tasks.
Taking the neural networks based MTL algorithms as an example, in order to share useful information among tasks, sharing strategies are often used in the network structure and parameter constraints. At present, the commonly used sharing mechanisms include hard parameter sharing  and soft parameter sharing [13, 41]. Hard parameter sharing refers to the hidden layers shared by each task and only the last few layers are task-specific. The dual-channel learning framework proposed in this paper is also based on this sharing mechanism (as shown in Fig. 2 ). Soft parameter sharing refers to that each task has its own independent structure, but the distance of their parameters are constrained to ensure them to be similar. Here the constraints can be the  distance or the trace norm .
One of the difficulties in MTL is the selection of auxiliary tasks. The authors in  believed that the tasks making decisions using the same features are relevant. In , the authors pointed out that the tasks with common inductive bias are relevant. In addition, the reference  mentioned that if the classification boundaries of the two tasks are similar, the two tasks are related. Although these suggestions are helpful for some specific scenarios, they have not been widely used because of the complexity of selection strategies.
Ii-C Zero-shot learning
ZSL algorithms are mainly used to establish the relationship between the visual space (i.e., the feature vectors extracted from the training data) and the semantic space (i.e., the semantic vectors of the seen and unseen classes). Currently, the training methods of these algorithms can be divided into three categories as follows.
1) Mapping from the visual space to the semantic space. After the model training, the model will be used to classify the semantic space. The typical algorithms include DAP , ESZSL , and SAE .
2) Mapping from the visual space and the semantic space to a third-party embedding space. This method aims to obtain better feature representation in the embedding space and then uses them to make the classification. For example, the CADA-VAE algorithm  used in this paper is belonging to this method.
At present, most of the GZSL algorithms belong to the generation model. As long as the model is able to generate samples of the unseen classes, it can transform ZSL tasks into traditional classification tasks. In addition, this approach can also avoid the problems of hubness [20, 30] and bias [38, 45]. For example, CVAE-ZSL 
generates the unseen classes’ samples by learning a Conditional Variational Autoencoder (VAE). f-CLSWGAN uses a special classifier to make the features generated by the generator of WGAN more conducive to the final classification.
Considering that GZSL has many advantages and its prediction objectives include both the seen and unseen classes, which is more suitable for real-life application scenarios, the dual-channel learning framework designed in this paper is mainly for GZSL.
Iii The details of the proposed biologically inspired feature enhancement framework
As mentioned in Sec. I, inspired by the general knowledge of the multi-task learning and biological taxonomy, in this section, we design a novel dual-channel learning framework for ZSL and propose a biologically inspired auxiliary data set selection method for the framework. We study the impact of the auxiliary data sets with different degrees of correlation to the current task on the performance of the ZSL model.
Iii-a The biologically inspired auxiliary data set selection method
Given a ZSL task, according to the names of the seen classes in its training data set, especially the biological classes, we select three auxiliary data sets with different correlation degrees from ImageNet based on the knowledge of biological taxonomy (i.e., the seven-level classification system shown in Fig. 1). For example, if one of the seen classes is dogs, we will select the biological images with different kinship levels to dogs (e.g., same , same , and same ) from ImageNet to form the three auxiliary data sets. For the fairness of the experiment, the number of samples in the three auxiliary data sets is set to be the same. The details of our experimental settings are given in Sec. IV-A2.
Iii-B The details of the dual-channel learning framework
Our proposed dual-channel learning framework is shown in Fig. 2. In our framework, one channel is trained based on the auxiliary data set and the other channel is trained based on the data set of the current ZSL task (only composed of the training samples from the seen classes). The two channels work together to fine-tune the feature extractor. All modules in the framework (i.e., the feature extractor, the auxiliary task classifier, and the current classifier) are trained together and the optimization objective is as follows:
where , , and are the parameters of the feature extractor, the auxiliary task classifier, and the current classifier, respectively. The goal of model training is to minimize both the loss on the auxiliary task (i.e., ) and the loss on the current task (i.e., ). Here refers to the trade-off factor between the auxiliary task and the current task.
Feature extractor: The feature extractor used in our experiment is ResNet101  and the parameter of is marked as . Noted that other deep neural networks such as VGG  and Inception  can also be used here.
Auxiliary task classifier: The parameter of the auxiliary task classifier is marked as . is used to provide additional back-propagation gradients for by using the auxiliary tasks. Using auxiliary tasks with different correlations to the current task will have different effects on the features extracted by . Most of the traditional classification algorithms such as SVM 
and KNN can be used as .
Current task classifier: The parameter of the current task classifier is marked as . is used to fine-tune the parameters of the feature extractor by using the seen classes’ training samples to make the extracted features more favorable to the current ZSL task.
Iii-B3 The training mechanism
And then the and the can be expressed by
where , , , refer to the output of the feature extractor for the auxiliary task, the labels of the auxiliary samples, the output of the feature extractor for the current task, and the labels of the samples of the current ZSL task, respectively.
Iii-C Performance evaluation
In this study, we use CADA-VAE to test the impact of different auxiliary data sets on the ZSL model under the proposed framework.
CADA-VAE obtains domain-agnostic representations by aligning the distribution of the feature vector and the side-information in the VAE latent space. After training, CADA-VAE inputs the feature vectors of the seen classes’ samples to its encoder to get the corresponding low-dimensional latent features in the VAE latent space. For the unseen classes, it inputs the side-information of the samples to obtain the corresponding low-dimensional latent features. With the low-dimensional latent features of the seen and unseen classes in the VAE latent space, one can train a classifier for ZSL.
In the testing phase, one can input the feature vectors of the testing samples to the encoder of CADA-VAE to get the low-dimensional latent features, and then use the classifier to predict the corresponding labels.
Iv Experimental settings and results
Iv-A1 Zero-shot data sets
Three benchmark ZSL tasks are chosen to test the performance of our proposed method, that is, Animals with Attributes2 (AWA2) , Caltech-UCSD Bird-200-2011 (CUB) , and A Pascal-a Yahoo(APY) .
Specifically, AWA2 includes 30475 images of 50 kinds of animals. Among them, 40 are seen classes and 10 are unseen classes, and each class is represented by an 85-dimension vector. CUB includes 11788 images of 200 kinds of birds. Among them, 150 are seen classes, 50 are unseen classes, and each class is described by a 312-dimension vector. APY contains 15339 images, 20 classes from the PASCAL VOC2008 database and 12 classes from the Yahoo database. Each class is represented by a 64-dimension vector. We regard the samples from the PASCAL VOC 2008 database as the seen classes and the samples from the Yahoo database as the unseen classes. The details of each data set and the division of the seen and unseen classes are shown in Table I.
|The number of the images||11788||30475||15339|
|The number of the attributes||312||85||64|
|The number of the seen classes||150||40||20|
|The number of the unseen classes||50||10||12|
Iv-A2 Auxiliary Data sets
As mentioned in Sec. III, based on the knowledge of biological taxonomy, we select three types of samples from ImageNet that have very low correlation, moderate correlation, and strong correlation with the seen classes of the current ZSL task respectively, and then use them to construct the corresponding low-relevant, middle-relevant, and high-relevant auxiliary data sets.
It is worth mentioning that most of the pre-trained models used by ZSL are trained with ImageNet, and the samples used to construct the auxiliary data set are also from ImageNet, which means that the auxiliary data sets used in our method are very easy to be obtained. Note that the auxiliary data set can also be obtained from other sources such as the Internet based on our proposed selection strategy. Here we take the auxiliary samples directly from ImageNet. The advantage of this is that it is easier to implement. At the same time, we can verify whether we can improve the performance of the ZSL model based on our proposed framework without adding new data sources.
Specifically, given a ZSL task, we choose the low-relevant auxiliary samples from ImageNet based on the filtering criteria of and then use them to construct the corresponding low-relevant auxiliary data set. Similarly, we construct the middle-relevant auxiliary data set with ”the same but different ” as the filtering criteria and the high-relevant auxiliary data set with ”the same ” as the filtering criteria.
For example, in AWA2, all the seen classes are mammals. In this case, we choose some non-biological images from ImageNet such as water bottles and napkins to construct the low-relevant auxiliary data set. The samples in the middle-relevant auxiliary data set are non-mammal animals such as geckos and tortoises and the samples in the high-relevant auxiliary data set are mammals such as cats. Each auxiliary data set contains 50 classes and each class has the same number of samples.
In our experiment, the ResNet-101 pre-trained on ImageNet was chosen as the baseline model. For each ZSL task, we only use the training samples of its seen classes to fine-tune the pre-trained model.
Iv-C Experimental settings
The original feature extractor used in our framework is the ResNet-101 without its classification layer, and both the auxiliary task classifier and the current task classifier are single-layer network structures. We use Stochastic Gradient Descent (SGD) as the optimizer and the learning rate is set to 0.001. We use the most commonly used Proposed Split (PS) as the data division method of CADA-VAE. Other parameter settings of the CADA-VAE algorithm are the same as .
In our experiment, the testing samples can come from both the seen and unseen classes. Suppose and
represents the average prediction accuracy of the model on each seen and unseen classes, respectively. Their harmonic meancan be expressed as follows.
At present, has become one of the most important indexes to measure the performance of GZSL algorithms. In this paper, we also use as the evaluation criterion.
Iv-D Experimental results
The experimental results on three benchmark data sets are shown in Tabel II.
Note: and refer to the accuracy of the model on the seen classes and the unseen classes. refers to the harmonic mean of them
From Table II, one can observe that the performance of the ZSL model becomes better with the increasing correlation between the auxiliary task and the current ZSL task. For example, the hybrid accuracy (i.e., ) of the model obtained by training with the low-relevant, middle-relevant and high-relevant auxiliary data sets on AWA2 are 65.7%, 66.0%, and 67.1%, respectively.
CADA-VAE (high-relevant) vs Baseline. Compared with the baseline model (i.e., CADA-VAE-fine-tuning), our method (using the high-relevant auxiliary data sets) can achieve higher prediction accuracy. The accuracy improvement rates on the benchmark data sets CUB, AWA2, and APY are 2.4%, 4.7%, and 4.1%, respectively.
CADA-VAE (high-relevant) vs CADA-VAE. Compared with the original CADA-VAE, one of the best ZSL algorithms, our method can also achieve higher prediction accuracy. The accuracy improvement rates on the CUB and AWA2 are 23.3% and 5.0%, respectively.
In conclusion, the dual-channel learning framework proposed in this paper can effectively improve the generalization ability of the ZSL model with the help of appropriate auxiliary data sets. Specifically, with the help of the auxiliary data sets that are highly related to the current ZSL task, our algorithm has achieved state-of-the-art results in all the three benchmark data sets.
Iv-E An explanation for our experimental phenomena
Here we explain the experimental phenomenon from the perspective of feature visualization. Specifically, we analyze the influence of different auxiliary data sets on the final extracted features by using the dimension reduction and visualization method. Due to the experimental results are similar on the three ZSL tasks, so here we take the results on AWA2 as an example (as shown in Fig. 3) to explain the experimental phenomenon. In Fig. 3, each color represents the shape of the features of the samples belonging to a specific class after the dimension reduction.
From Fig. 3, one can observe that as the degree of the correlation between the auxiliary task and the current ZSL task increases, the data features obtained by the ZSL feature extractor show the following rules: the features of the samples belonging to the same class become more clustered and the features of the samples belonging to different classes become more discrete.
For machine learning, such feature changes are very conducive to the correct decision-making of the final classifier, because the samples belonging to the same class become more clustered and the samples belonging to different classes become more discrete will make the classification easier. Fig. 3 also explains the experimental phenomena of this paper to some extent, that is, with the help of the proposed dual-channel learning framework and the auxiliary data sets, the data features of the ZSL task become more separable, so the performance of the final ZSL model is better.
In this paper, we design a novel dual-channel learning framework for ZSL and propose a new guideline to select auxiliary data sets for the learning framework based on the knowledge of biological taxonomy. Specifically, one can measure the correlation degree of image samples according to the seven-level classification system of the biological taxonomy. We propose to choose the samples closest to the kinship of the seen classes in the current ZSL task to construct the auxiliary data set and then use it to enhance the feature extractor based on our proposed framework. The experimental results on three benchmark data sets (i.e., CUB, AWA2, and APY) show that under the proposed framework, the performance of the ZSL model is gradually improved with the improvement of the correlation degree between the auxiliary data sets and the current ZSL task. This phenomenon is expected to provide a new direction for the future research of ZSL. It is worth mentioning that our algorithm has achieved state-of-the-art results on all three data sets. In the future, we will study to explain the experimental phenomena of this paper mathematically.
-  Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid, Label-embedding for attribute-based classification
-  Z. Akata, S. Reed, D. Walter, H. Lee, and B. Schiele, Evaluation of output embeddings for fine-grained image classification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2927–2936.
-  J. Baxter, A model of inductive bias learning, Journal of artificial intelligence research, 12 (2000), pp. 149–198.
W. Cao, J. Gao, Z. Ming, S. Cai, and Z. Shan,
Fuzziness based random vector functional-link network for semi-supervised learning, in 2017 International Conference on Computational Science and Computational Intelligence (CSCI), IEEE, 2017, pp. 782–786.
-  , Fuzziness-based online sequential extreme learning machine for classification problems, Soft Computing, 22 (2018), pp. 3487–3494.
-  W. Cao, X. Wang, Z. Ming, and J. Gao, A review on neural networks with random weights, Neurocomputing, 275 (2018), pp. 278–287.
-  R. Caruana, Multitask learning: A knowledge-based source of inductive bias., Machine Learning, (1997), pp. 41–48.
-  R. Caruana, Multitask learning., Autonomous Agents and Multi-Agent Systems, 27 (1998), pp. 95–133.
-  L. C.H, H. Nickisch, and S. Harmeling, Attribute-based classification for zero-shot visual object categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 36 (2013), pp. 453–465.
-  C. Cortes and V. Vapnik, Support-vector networks, Machine learning, 20 (1995), pp. 273–297.
-  T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE transactions on information theory, 13 (1967), pp. 21–27.
-  J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in 2009 IEEE conference on computer vision and pattern recognition, Ieee, 2009, pp. 248–255.
L. Duong, T. Cohn, S. Bird, and P. Cook, Low resource dependency
parsing: Cross-lingual parameter sharing in a neural network parser
, in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2015, pp. 845–850.
-  A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, Describing objects by their attributes, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 1778–1785.
-  K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
-  H. Jiang, R. Wang, S. Shan, and X. Chen, Transferable contrastive network for generalized zero-shot learning, in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 9765–9774.
-  N. Karessli, Z. Akata, B. Schiele, and A. Bulling, Gaze embeddings for zero-shot image classification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4525–4534.
-  E. Kodirov, T. Xiang, and S. Gong, Semantic autoencoder for zero-shot learning, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3174–3183.
-  C. H. Lampert, H. Nickisch, and S. Harmeling, Learning to detect unseen object classes by between-class attribute transfer, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 951–958.
-  A. Lazaridou, G. Dinu, and M. Baroni, Hubness and pollution: Delving into cross-space mapping for zero-shot learning, in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015, pp. 270–280.
-  C. Linnaeus, Systema naturae, vol. 1, Stockholm Laurentii Salvii, 1758.
-  C. Linnaeus, Species plantarum, vol. 3, Impensis GC Nauk, 1799.
-  Y. Luo, X. Wang, and W. Cao, A novel dataset-specific feature extractor for zero-shot learning, Neurocomputing, (2020), pp. 1–18.
-  A. Mishra, S. Krishna Reddy, A. Mittal, and H. A. Murthy, A generative model for zero shot learning using conditional variational autoencoders, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 2188–2196.
-  M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell, Zero-shot learning with semantic output codes, in Advances in neural information processing systems, 2009, pp. 1410–1418.
-  S. Reed, Z. Akata, H. Lee, and B. Schiele, Learning deep representations of fine-grained visual descriptions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 49–58.
-  B. Romera-Paredes and P. Torr, An embarrassingly simple approach to zero-shot learning, in International Conference on Machine Learning, 2015, pp. 2152–2161.
-  S. Ruder, An overview of multi-task learning in deep neural networks, arXiv preprint arXiv:1706.05098, (2017).
-  E. Schonfeld, S. Ebrahimi, S. Sinha, T. Darrell, and Z. Akata, Generalized zero-and few-shot learning via aligned variational autoencoders, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 8247–8255.
-  Y. Shigeto, I. Suzuki, K. Hara, M. Shimbo, and Y. Matsumoto, Ridge regression, hubness, and zero-shot learning, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2015, pp. 135–151.
-  K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, (2014).
-  R. Socher, M. Ganjoo, C. D. Manning, and A. Ng, Zero-shot learning through cross-modal transfer, in Advances in neural information processing systems, 2013, pp. 935–943.
-  F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, Learning to compare: Relation network for few-shot learning, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1199–1208.
-  C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
-  C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The caltech-ucsd birds-200-2011 dataset, (2011).
-  Y. Xian, Z. Akata, G. Sharma, Q. Nguyen, M. Hein, and B. Schiele, Latent embeddings for zero-shot classification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 69–77.
-  Y. Xian, C. H. Lampert, B. Schiele, and Z. Akata, Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly, IEEE transactions on pattern analysis and machine intelligence, (2018).
-  Y. Xian, T. Lorenz, B. Schiele, and Z. Akata, Feature generating networks for zero-shot learning, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5542–5551.
-  Y. Xian, B. Schiele, and Z. Akata, Zero-shot learning-the good, the bad and the ugly, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4582–4591.
-  Y. Xue, X. Liao, L. Carin, and B. Krishnapuram, Multi-task learning for classification with dirichlet process priors, Journal of Machine Learning Research, 8 (2007), pp. 35–63.
-  Y. Yang and T. M. Hospedales, Trace norm regularised deep multi-task learning, arXiv preprint arXiv:1606.04038, (2016).
-  F. Zhang and G. Shi, Co-representation network for generalized zero-shot learning, in International Conference on Machine Learning, 2019, pp. 7434–7443.
-  L. Zhang, T. Xiang, and S. Gong, Learning a deep embedding model for zero-shot learning, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2021–2030.
-  Y. Zhang and Q. Yang, An overview of multi-task learning, National Science Review, 5 (2017), pp. 30–43.
-  Y. Zhu, M. Elhoseiny, B. Liu, X. Peng, and A. Elgammal, A generative adversarial approach for zero-shot learning from noisy texts, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1004–1013.