DeepAI
Log In Sign Up

TGG: Transferable Graph Generation for Zero-shot and Few-shot Learning

08/30/2019
by   Chenrui Zhang, et al.
Peking University
4

Zero-shot and few-shot learning aim to improve generalization to unseen concepts, which are promising in many realistic scenarios. Due to the lack of data in unseen domain, relation modeling between seen and unseen domains is vital for knowledge transfer in these tasks. Most existing methods capture seen-unseen relation implicitly via semantic embedding or feature generation, resulting in inadequate use of relation and some issues remain (e.g. domain shift). To tackle these challenges, we propose a Transferable Graph Generation (TGG) approach, in which the relation is modeled and utilized explicitly via graph generation. Specifically, our proposed TGG contains two main components: (1) Graph generation for relation modeling. An attention-based aggregate network and a relation kernel are proposed, which generate instance-level graph based on a class-level prototype graph and visual features. Proximity information aggregating is guided by a multi-head graph attention mechanism, where seen and unseen features synthesized by GAN are revised as node embeddings. The relation kernel further generates edges with GCN and graph kernel method, to capture instance-level topological structure while tackling data imbalance and noise. (2) Relation propagation for relation utilization. A dual relation propagation approach is proposed, where relations captured by the generated graph are separately propagated from the seen and unseen subgraphs. The two propagations learn from each other in a dual learning fashion, which performs as an adaptation way for mitigating domain shift. All components are jointly optimized with a meta-learning strategy, and our TGG acts as an end-to-end framework unifying conventional zero-shot, generalized zero-shot and few-shot learning. Extensive experiments demonstrate that it consistently surpasses existing methods of the above three fields by a significant margin.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 5

page 6

page 7

page 8

page 9

08/09/2020

A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning

Generalized Zero-Shot Learning (GZSL) is a challenging topic that has pr...
07/15/2021

Context-Conditional Adaptation for Recognizing Unseen Classes in Unseen Domains

Recent progress towards designing models that can generalize to unseen d...
10/30/2020

Logic-guided Semantic Representation Learning for Zero-Shot Relation Classification

Relation classification aims to extract semantic relations between entit...
05/25/2021

GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot Action Recognition

Zero-shot action recognition can recognize samples of unseen classes tha...
06/15/2021

Zero-shot Node Classification with Decomposed Graph Prototype Network

Node classification is a central task in graph data analysis. Scarce or ...
01/06/2022

Balancing Generalization and Specialization in Zero-shot Learning

Zero-Shot Learning (ZSL) aims to transfer classification capability from...
05/09/2020

Memory-Augmented Relation Network for Few-Shot Learning

Metric-based few-shot learning methods concentrate on learning transfera...

1. Introduction

In the past decade, traditional supervised learning has advanced rapidly due to deep learning techniques and large-scale labeled datasets. However, towards an ultimate machine learning paradigm, supervised learning is far from satisfactory in various real-world situations. On the one hand, the heavy reliance on large-scale labeled data makes it unscalable, as annotating sufficient data is laborious and costly, as well as the instances in some classes are quite rare for a long-tailed data distribution. On the other hand, supervised learning cannot deal with recognition tasks with ever-growing novel classes, which is urgently needed in many realistic scenarios.

To tackle the challenges stated above, Zero-Shot Learning (ZSL) and Few-Shot Learning (FSL) have recently emerged (Frome et al., 2013; Xian et al., 2017; Vinyals et al., 2016; Schönfeld et al., 2019). Typically, ZSL aims to recognize unseen classes with no labeled instances during training, while a few representative instances of unseen classes are provided in FSL. The key to the success of ZSL/FSL is the relation modeling between seen and unseen domains, which transfers knowledge from the seen domain to the unseen domain, for improving model’s generalization to novel concepts.

Previous ZSL methods mainly focus on semantic embedding (Akata et al., 2013; Zhang and Saligrama, 2015)

, which learn a projection between visual space and semantic space. The principle of this paradigm is to utilize the side information (e.g., attributes or word vectors) shared by seen and unseen domains for projection learning, and measure similarity in the resulting semantic space for final classification. Such a projection-based paradigm is limited by the

heterogeneity between visual feature and side information, as well as the domain shift (Romera-Paredes and Torr, 2015) when the learned projection is directly applied to unseen domain without adaptation. Moreover, each class is represented as a fixed embedding point in semantic space, while the intra-class variation and discriminative information implied in visual data distribution are ignored (Wang et al., 2018a).

Recently, deep generative models have been introduced as alternative frameworks in ZSL (Zhang and Peng, 2018; Huang et al., 2018; Xian, 2019)

. In this paradigm, visual feature and side information of seen domain are utilized for capturing visual-semantic joint distribution, and then the visual feature of unseen domain can be synthesized conditioned on the associated side information. Hence, ZSL can be converted into a supervised problem, as the synthesized visual features can be straightforwardly fed to typical classifiers for supervised training. However, the inherent handicap of this paradigm is that evaluating how well the dummy features capture the targeted unseen domain distribution is still ambiguous. Furthermore, the instability of generative models (e.g., mode collapse of generative adversarial networks 

(Goodfellow et al., 2014)) leads to noisy synthesized feature with poor diversity, which is harmful for the downstream classifier training.

Paradigms stated above fall under the taxonomy of implicit relation modeling methods, in which the use of relation is inadequate and some key issues (e.g. domain shift) are still unsolved. In contrast, another novel paradigm is proposed (Wang et al., 2018b)

to explicitly utilize knowledge from knowledge graph (KG) for ZSL. Typically, these methods are built upon the graph convolutional network (GCN) 

(Kipf and Welling, 2016), which distills knowledge from KG for class-level relation modeling. The graph nodes denote the class embedding, while the edges describe the relations of different classes. Despite the promising performance, they still have some shortcomings. First, they simply learn an independent classifier for each class, while the unseen class labels are not involved and thus domain shift remains. Second, the relation is only modeled at class level, while the instance-level relation is ignored, resulting in the loss of discriminative ability. Third, the utilization of relation in these methods is still implicit, where the distilled knowledge can get diluted during classification.

To overcome the above limitations, in this paper, we propose to explicitly model and utilize relation at both class level and instance level, via graph generation and relation propagation. Specifically, we propose a Transferable Graph Generation (TGG) approach, which contains a graph generation module and a relation propagation module. The details are presented as follows.

Graph generation module aims to capture relations among class concepts, attributes and visual instances. In this module, an attention-based aggregate network and a relation kernel are proposed, which take a class-level prototype graph and visual instances as inputs, and output instance-level graphs with the revised instance embeddings as nodes and their relations as edges. The prototype graph is derived from an off-the-shelf knowledge graph, which acts as a relation template and will be enriched by integrating visual information during graph generation. In order to model comprehensive seen-unseen relation and reduce domain gap, we introduce unseen information at both class level and instance level from the very beginning. For class level, the prototype graph is constructed to contain class concepts of both seen and unseen domains. For instance level, the graph generation module is also fed with instances of both domains, here we skillfully unify the ZSL and FSL with the dummy feature synthesis. Concretely, we synthesize dummy features for unseen classes via Generative Adversarial Networks (GANs) (Goodfellow et al., 2014), and they will be treated equally as the few provided instances in FSL. Hence, the graph generation can be fully-supervised, which is beneficial for the downstream relation utilization.

Our aggregate network aims to learn a revised node embedding space, which revises the input visual features by aggregating neighbors’ information at both class and instance level. A multi-head graph attention mechanism is proposed to enhance the aggregation procedure, which prevents information dilution and negative knowledge transfer. The relation kernel is proposed to explicitly generate relations/edges over the revised nodes, where GCNs and graph kernel methods are used to tackle data imbalance and noise.

Relation propagation module aims to make full use of the learned relations for final classification. Compared with the implicit embedding methods, knowledge transfer in graph manifold space with explicit relation inference is more efficient, and it helps to learn better decision boundary. Motivated by this, and with the advantages of fully-supervised graph generation, we propose a dual relation propagation approach, to explicitly infer supervision via relation propagation and further alleviate domain shift with dual learning. Relations in the generated graph start propagation separately from seen and unseen subgraphs, and the two reverse propagations learn from each other in a dual learning manner.

Moreover, we joint optimize all the above components end-to-end with an episodic training strategy of meta-learning. Graph nodes of both seen and unseen classes are randomly divided into training and test subsets, where relations are used for missing label prediction. Such a strategy ensures that the settings of training and test in our TGG are consistent, reducing inductive bias significantly.

The main contributions are summarized as follows:

  • We propose a Transferable Graph Generation (TGG) approach, to explicitly model and utilize seen-unseen relation for ZSL/FSL via graph generation. We design an attention-based aggregate network and a relation kernel, which capture multi-granular relations and are robust to data imbalance and dummy data noise.

  • We propose a dual relation propagation approach to utilize relation explicitly, which alleviates domain shift with fully-supervised relation propagation in a dual learning manner. An episodic training strategy is designed based on meta-learning, combining all components of our TGG for end-to-end joint optimization.

  • Our TGG acts as a unified framework for conventional zero-shot, generalized zero-shot and few-shot learning, and as demonstrated by extensive experiments, it consistently outperforms existing methods by a large margin. The code of our work is available at: https://github.com/zcrwind/tgg-pytorch.

2. Related work

Figure 1. Architecture of the proposed TGG. and denote the class-level and instance-level graph, respectively. and is the instances of seen and unseen classes, respectively. is the side information.

2.1. Zero-shot and few-shot learning

2.1.1. Zero-shot learning (ZSL)

According to the label space setting for evaluation, existing ZSL methods can be divided into two categories, i.e. conventional ZSL and generalized ZSL (GZSL) (Xian et al., 2017). Conventional ZSL aims to learn classifier based on seen instances, and then evaluate the trained model on unseen instances, where the label spaces of seen and unseen domains are totally disjoint, and model evaluation is only performed on the unseen domain. In contrast, GZSL aims to classify instances in the combination of seen and unseen classes, which is more realistic in practice.

From the algorithm perspective, existing ZSL and GZSL methods can be grouped into three paradigms (Wang et al., 2018b; Huang et al., 2018). The first paradigm, known as semantic embedding, learns a projection between visual and semantic space with the aid of side information, then the learned projection will be applied directly to the unseen domain during test, where unseen instances can be classified by certain similarity measurement (Akata et al., 2013; Zhang and Saligrama, 2015). Due to the heterogeneity between visual and semantic feature, such a paradigm suffers from the information degradation issue. Recently, attention has shifted to another paradigm, which uses generative models to synthesize unseen feature. Huang et al. (Huang et al., 2018) utilize GANs (Goodfellow et al., 2014) to learn visual-semantic joint distribution, and unseen instances can be synthesized as dummy data, which are used to convert ZSL to a typical supervised problem. In contrast to the above paradigms, a new paradigm is rising lately for borrowing power from structure knowledge. (Wang et al., 2018b) and (Kampffmeyer et al., 2018) use knowledge graph and GCNs to predict classifiers for each class. The constraint is a mean-square error between the predicted and ground truth classifiers of seen classes. While promising, there are mainly two shortcomings of them. First, the generalization is limited by the fixed ground truth classifiers. Second, the relation only focuses on the seen domain at class-level. Our TGG captures relation among seen and unseen classes at both class-level and instance-level.

2.1.2. Few-shot learning (FSL)

The data sparsity issue leads the typical finetuning strategy not adaptable for FSL, as overfitting is easy to happen. Thus, current FSL research turns to meta-learning, a new supervised learning setup that performs optimization over batches of tasks rather batches of data. The task that meta-learner tries to solve, called episode task, corresponds to independent learning problem that simulates the few-shot setting within episodes, and thus helps to learn high generalization. Siamese Networks (Koch et al., 2015) learn pair-wise distance under the principle that similar instances should be close, and then perform one-shot classification by nearest neighbors search. Matching network (Vinyals et al., 2016)

is an end-to-end trainable k-nearest neighbors framework for FSL, in which the pair-wise distance is computed by cosine similarity. Prototypical network 

(Snell et al., 2017) extends (Vinyals et al., 2016) by replacing cosine distance with Euclidean distance, and learns class prototypes for similarity measurement. However, meta-learning based models typically cannot scale to ZSL, and we address this limitation via graph generation.

2.2. Graph learning

Our work is conceptually related to graph neural networks (GNNs) w.r.t. architecture, as well as graph generation w.r.t. application.

GNNs are first introduced by (Gori et al., 2005; Franco et al., 2009), whose target is to learn a state embedding that contains neighborhood information for each node in graphs. In (Gori et al., 2005; Franco et al., 2009), a parametric local transition function is applied on all nodes in a stacked manner, where a recurrent message propagation is learned discriminatively. (Li et al., 2015)

proposes the gated graph neural network (GGNN), which uses the Gate Recurrent Units (GRU) in the propagation step to untie the recurrent layer weights, and increase nonlinearity via gate mechanism. Bruna et al.

(Bruna et al., 2013) propose to learn spectral convolution in the Fourier domain via eigen decomposition on the graph Laplacian. Subsequent work (Defferrard et al., 2016) reduces the computational complexity of (Bruna et al., 2013) by learning polynomials of the graph Laplacian. As one of the most representative graph convolutional networks, GCN (Kipf and Welling, 2016) is proposed to solve the semi-supervised problem via spectral methods, which learns layer-wise propagation operations directly on graphs. GraphSAGE (Hamilton et al., 2017) acts as a spatial graph convolutional method, which uniformly samples a fixed number of neighbors for each node, and then uses different aggregating functions for large graph node embedding.

Recently there has been a surge of interest in graph generation, due to its wide applications on molecule discovery, social network analysis and knowledge graph construction. NetGAN (Bojchevski et al., 2018) has done a preliminary trial on graph generation via random walk, which converts graph generation to a walk sequence generation problem via generative adversarial training (Goodfellow et al., 2014). MolGAN (De Cao and Kipf, 2018) utilizes GAN (Goodfellow et al., 2014)

and reinforcement learning (RL) to generate discrete graph structure, where a permutation-invariant discriminator is designed to handle the node variant, as well as a RL-based reward function is developed to endow the generated molecule with the desired chemical properties. Li et al. 

(Li et al., 2018) propose to generate graph nodes and edges sequentially, where GNNs are applied to learn latent states of current graph, and then the latent states will be used as the history memory for deciding the next generation action.

All the above graph generation methods are based on the fact that there exists real graph data for distribution fitting, while our work focuses on generating graph without prior distribution information, and can be generalized to unseen node types.

3. Our TGG Approach

As illustrated in Figure 1, our Transferable Graph Generation (TGG) framework mainly contains two components, i.e. graph generation and relation propagation. Graph generation module takes the class-level graph and real/dummy visual instances of seen/unseen classes as input, learns both node embeddings and relations with aggregate network and relation kernel. Relation propagation module exploits generated relation graph for classification, via a dual relation propagation approach with meta-learning strategy.

3.1. Preliminaries

Figure 2. Class-level graphs of aPY and AwA2 dataset.

3.1.1. Problem Formulation

Let denote the training set of image instances, and denote the test set of image instances. Their corresponding label spaces are and with . and here denote the total number of seen and unseen classes, respectively. is the -dimensional visual feature of the -th instance with label , and denotes the side information (e.g., attributes or word vectors) uniquely associated with the class label . Based on the symbol definition, we then formulate three problems addressed in this paper as below.

  • Zero-shot Learning (ZSL): The image features of unseen classes are not available during training. The goal of ZSL is to predict the label given an unseen class instance using its visual feature .

  • Generalized Zero-shot Learning (GZSL): The image features of unseen classes are not available during training. The goal of GZSL is to predict the label given an image instance using its visual feature .

  • Few-shot Learning (FSL): Only a few/one randomly chosen image instances from unseen classes are available with label information during training, and the goal of FSL is same with ZSL and GZSL settings above.

3.1.2. Class-level graph construction

Similar to (Gao et al., 2019), we exploit ConceptNet 5.5 (Speer et al., 2017) for class-level graph construction, which is an off-the-shelf knowledge graph connecting words and phrases of natural language edges. It is noted that we treat CUB dataset (Wah et al., 2011) (see Section 4) as a special case, as its class labels are proper nouns of fine-grained birds, which is hard to build semantic connections via ConceptNet. Rather we build the class-level graph of CUB via computing Hadamard product over part-level attributes. The resulting graph is densely connected with normalized edge weights, which denote similarities among different classes. The class-level graphs of two small datasets are shown as examples in Figure 2.

3.1.3. Dummy visual feature synthesis

For ZSL and GZSL, we synthesize dummy visual feature for unseen classes, using the recently emerged generative adversarial learning (Goodfellow et al., 2014). Specifically, we use conditional GAN (Mirza and Osindero, 2014) to perform synthesis conditioned on the associated side information, and use WGAN-GP (Gulrajani et al., 2017) for training settings. To stabilize the training of GAN, similar to (Zhang and Peng, 2018; Huang et al., 2018)

, a dual learning mechanism is applied with semantic feature regression. We use visual feature synthesis as a pre-processing step rather directly learning relations over it, as we believe that there are several unsolved issues with such feature synthesis methods for ZSL/GZSL. First, the generated feature cannot fit the true distribution very well, and is thus suboptimal for GZSL. Second, instance-level relations cannot be captured by the generated feature in such feature mapping learning, where intra-class variance is ignored. Our TGG revises them into a node embedding space via explicit relation modeling.

3.2. Graph Generation

3.2.1. Attention-based aggregate network

As shown in Figure 1, the graph generation module of our TGG takes the class-level graph and visual feature as inputs, where the synthesized dummy feature is used for unseen classes in ZSL/GZSL, and the few provided unseen class features are used repeatedly in FSL. Our goal of graph generation is to generate implicit instance representations as node embeddings and explicit relations as edges, via incorporating proximity information from each node’s neighborhood at both class level and instance level.

We draw inspiration from GraphSAGE (Hamilton et al., 2017), an inductive variant of GCN (Kipf and Welling, 2016), to develop our aggregate network. The core operations of GraphSAGE can be formulated as follows:

(1)
(2)

where denotes the aggregation function at k-hop, which aggregates neighbor information for the subsequent node embedding update. and are nodes in the graph , here and denote node and edge set of , respectively. is the node embedding of source node at k-th propagation, and denotes neighbor sampling function: . After information aggregation, node embedding of and its neighbors will be concatenated via CONCAT operation and activated by non-linearity, in which the trainable weights can be learned.

As shown in Eq.(1) and Eq.(2), neighbor sampling and aggregation are two main components in GraphSAGE. In terms of sampling, GraphSAGE uniformly samples neighbors with fixed numbers. As for aggregation, GraphSAGE explores three kinds of aggregation functions, namely mean, LSTM and pooling. Mean aggregation simply averages over all neighbor node features, while LSTM and pooling workarounds integrate node features via LSTM architecture or pooling operation. However, we argue that these mechanisms are insufficient in our graph generation situation for ZSL/FSL, as the generated graph should integrate proximity information more precisely, to cope with noise of dummy features and prevent negative knowledge transfer. Furthermore, our TGG performs graph learning over graphs of different granularity, namely class-level prototype graph and instance-level graph , thus, uniform operations might loss discriminative information in such graph translation procedure.

To solve the above issues, we propose to enhance GraphSAGE algorithm with a multi-head attention mechanism (Veličković et al., 2018). Concretely, we design class-level and instance-level attention during aggregation, and combine them analogous to multiple channels in ConvNet. The instance-level attention is defined as follows:

(3)
(4)
(5)
(6)

where

is first obtained by performing linear transformation on the node embedding from the last aggregation

, then a pair-wise additive attention score between two neighbors is computed as in Eq.(4), which concatenates and first, then takes dot product between the concatenation and a trainable weight vector , followed by a LeakyReLU non-linearity. Next, Eq.(5) locally normalizes the attention scores over each node’s neighbors. Finally, in Eq.(6), aggregation similar to Eq.(2) is performed over neighbor embeddings according to the attention score.

In another vein, the class-level attention score can be derived directly from the weights of (see Section 3.1.2), and we just normalize them in each local aggregation like Eq.(5). Instance-level and class-level attention have independent parameters and we combine them as a multi-head attention form by:

(7)

here denotes the attention type with candidates of class-level attention and instance-level attention . The motivation behind Eq.(7) is that weighting neighbor features with multi-level attention helps aggregate proximity information more precisely and efficiently, which deals with information dilution (Kampffmeyer et al., 2018), and is vital for ZSL/FSL generalization when faced with a lack or noise of data. Moreover, such node embedding revision is performed among the seen and unseen classes, in which the distribution of two domains tends to be consistent via neighbor information integration, and thus domain gap can be reduced significantly to alleviate the domain shift issue.

3.2.2. Relation kernel

The aggregate network above revises node embeddings over both seen and unseen classes, by integrating proximity knowledge in an implicit manner. Based on this revised node embedding space, we further generate relations explicitly, to exploit graph manifold for better seen-to-unseen generalization. To this end, we propose a relation kernel module (Figure 3), to explicitly learn edge features and thus generate instance-level graphs.

Taking the permutation invariance and distance properties (e.g., identity) into account, we first design edge feature learning function as:

(8)

where denotes the generated edge between node and in the adjacency matrix of , is a neural network parameterized with , and

is a bandwidth hyperparameter. Mathematically, Eq.(

8) is an instantiation of Gaussian similarity function with Manhattan distance, yielding learnable edge features with . Once is obtained, it will be fed into stacked GCN modules for graph generation:

(9)

here is obtained by add self-connections on ,

is the identity matrix,

, and is the trainable filter in the -th layer of GCN.

Figure 3. Relation kernel of our TGG. is the edge feature learning function, denotes the graph kernel.

Furthermore, an additional graph regularization item is designed in our relation kernel, which is optimized jointly with the downstream classification task:

(10)

here is the final learned in the -th GCN layer, and means the normalized subgraph of that shares node set with . is the graph kernel that measures graph similarities via computing global graph representations. In this work, we use graph2vec (Narayanan et al., 2017) as the graph kernel, which is task agnostic and can be learned in an unsupervised manner. Eq.(10) ensures the generated local relations in are consistent with the similarities derived from , which aids zero-shot relation generation as a priori information and overcomes overfitting.

3.3. Relation Propagation

Once the instance-level graph is generated (Section 3.2), the node embedding and relations can be utilized for ZSL/FSL classification. To make full use of the knowledge within , we propose to explicitly perform relation inference with a novel dual relation propagation and meta-learning, as presented below.

3.3.1. Dual relation propagation

To explicitly utilize the learned relations for improving generalization and further alleviating domain shift, we propose a dual relation propagation between the seen and unseen subgraphs in . Specifically, we evolve standard label propagation algorithm (Zhu and Ghahramani, 2002) by inter-domain dual learning. To keep this paper self-contained, we briefly review the standard label propagation algorithm, then elaborate our dual relation propagation between seen and unseen subgraphs.

Label propagation (LP) is a classic algorithm for semi-supervised learning. Suppose

be the labeled data, , and the unlabeled data. Let denote the set of matrix. LP defines a label matrix with if is a labeled instance with label , otherwise . The goal of LP is to propagate the labels through pre-computed edges, to determine the unknown labels of instances in . LP has been proven (Zhu and Ghahramani, 2002) to have closed-form solution as:

(11)

where is the identity matrix, is the labeled sub-matrix of , and is a hyperparameter that controls the amount of propagated information.

As we introduce unseen domain information in two-folds, namely unseen prototype in class-level graph and dummy feature inputs (for ZSL setting), all nodes in the generated graph are actually labeled. Based on such supervised setting advantages provided by graph generation, we propose dual relation propagation between seen and unseen domains. More concretely, we separately use seen and unseen instances as labeled data for label propagation, and make sure that the resulting label matrices are consistent. The constraint of our dual relation propagation is defined as:

(12)

where and denote the labeled sub-matrices of seen and unseen instances, respectively. means the Frobenius norm of a matrix. Label propagations starting from the seen and unseen subgraph in can be regarded as two propagation learners with reverse learning direction, and minimizing Eq.(12) encourages them to learn from each other ‘how to propagate’.

3.3.2. Meta-learning based training strategy

We now present how our TGG framework unifies FSL, ZSL and GZSL with meta-learning, where graph generation, relation propagation and final classification are jointly optimized in an end-to-end manner. For FSL, there are three datasets, namely training, testing and support sets. The testing and support sets share the same label space (i.e., unseen space), which is disjoint with the seen space of training set. Suppose the support set has labeled instances for each unique classes, the FSL task is called -way, -shot. As for ZSL, we borrow the power of conditional GAN (Mirza and Osindero, 2014) to build a dummy support set for unseen classes, with the side information as conditions.

In traditional meta-learning paradigm with episodic training, each episode simulates the few-shot setting with a subset of the training set. In this paper, we follow the episodic training of meta-learning, but extend its label space during graph generation. Specifically, we involve the unseen prototype in , as well as input unseen class instances from the dummy/real support set in ZSL/FSL, thus the graph generation learning can pick neighbor information from both seen and unseen classes. Furthermore, another associated difference lies in that we also extend label space to the union of seen and unseen domains in episodic task simulation, towards a fully-supervised meta-learning for performance improvement. Thanks to the introduction of dummy unseen instances and the use of graph learning for revising them, ZSL, GZSL and FSL can be solved in TGG uniformly, where graph generation, relation propagation and classification can be jointly optimized end-to-end with episodic training. As a result, domain shift and classifier bias to the seen domain can be reduced significantly (as shown in Section 4).

In each episode, we obtain the final predictions by normalizing the propagation results to probabilistic values with softmax:

(13)

where is the predicted label for the test instance , and is the class number in an episode to be classified. Then, we use cross-entropy for the final classification:

(14)

where is the indicator function and is the ground truth label for instance . is the instances number in a -way -shot episode with test instances. Comprehensively, the objective of our TGG is summarized as follows:

(15)

Essentially, we utilize the generated relations to learn a metric in graph manifold. That is, TGG learns a graph manifold metric in a revised node embedding space, rather pre-defining a fixed metric (e.g., Euclidean) in a projection space. The reasons for applying meta-learning are three-folds. First, traditional graph architectures such as GCN and GraphSAGE are hard to end-to-end solve ZSL and GZSL simultaneously, as the class number must be pre-defined as the output dimension in the last output layer. Second, meta-learning actually performs as an adaptation method, which moves testing adaptation to training stage via episodic task simulation. Third, such meta-learning settings can be utilized to further alleviate the domain shift, since it ensures the test and the train environments are consistent in our TGG.

4. Experiments

Dataset #att Class number Image number
# # Total # #/#
aPY 64 15+5 12 15339 5932 7924/1483
AwA2 85 27+13 10 37332 23527 7913/5882
CUB 312 100+50 50 11788 7057 2679/1764
SUN 102 580+65 72 14340 10320 1440/2580
Table 1. Statistics of datasets.
Dataset aPY AwA2 CUB SUN
Methods ZSL U S HM ZSL U S HM ZSL U S HM ZSL U S HM
SSE (Zhang and Saligrama, 2015) 34.0 0.2 78.9 0.4 61.0 8.1 82.6 14.8 43.9 8.5 46.9 14.4 51.5 2.1 36.4 4.0
LATEM (Xian et al., 2016) 35.2 0.1 73.0 0.2 55.8 11.5 77.3 20.0 49.3 15.2 57.3 24.0 55.3 14.7 28.8 19.5
ALE (Akata et al., 2013) 39.7 4.6 73.7 8.7 62.5 14.0 81.8 23.9 54.9 27.3 62.8 34.4 58.1 21.8 33.1 26.3
DEVISE (Frome et al., 2013) 39.8 4.9 76.9 9.2 59.7 17.1 74.7 27.8 52.0 23.8 53.0 32.8 56.5 16.9 27.4 20.9
SJE (Akata et al., 2015) 32.9 3.7 55.7 6.9 61.9 8.0 73.9 14.4 53.9 23.5 52.9 33.6 53.7 14.7 30.5 19.8
ESZSL (Romera-Paredes and Torr, 2015) 38.3 2.4 70.1 4.6 58.6 5.9 77.8 11.0 53.9 12.6 63.8 21.0 54.5 11.0 27.9 15.8
SYNC (Changpinyo et al., 2016) 23.9 7.4 66.3 13.3 46.6 10.0 90.5 18.0 55.6 11.5 70.9 19.8 56.3 7.9 43.3 13.4
SAE (Kodirov et al., 2017) 34.0 0.4 80.9 0.9 61.0 1.1 82.2 2.2 43.9 7.8 54.0 13.6 51.5 8.8 18.0 11.8
DEM (Zhang et al., 2017) 35.0 11.1 79.4 19.4 67.1 30.5 86.4 45.1 51.7 19.6 57.9 29.2 40.3 34.3 20.5 25.6
RelationNet (Sung et al., 2018) - - - - 64.2 30.0 93.4 45.3 55.6 38.1 61.1 47.0 - - - -
PSR-ZSL (Annadani and Biswas, 2018) 38.4 13.5 51.4 21.4 63.8 20.7 73.8 32.3 56.0 24.6 54.3 33.9 61.4 20.8 37.2 26.7
SP-AEN (Chen et al., 2018) - 13.7 63.4 22.6 - 23.3 90.9 31.1 - 34.7 70.6 46.6 - 24.9 38.2 30.3
CAPD (Rahman et al., 2018) 39.3 26.8 59.5 37.0 52.6 45.2 68.6 54.5 53.8 41.7 44.9 43.3 49.7 27.8 35.8 31.3
GDAN (Huang et al., 2018) - 30.4 75.0 43.4 - 33.2 67.5 44.6 - 39.3 66.7 49.5 - 38.1 89.9 53.4
Our TGG 63.5 58.3 89.6 70.6 77.2 69.8 90.1 78.7 64.1 53.8 77.2 63.4 68.9 65.8 88.2 75.4
Table 2. Accuracy (%) results of ZSL and GZSL evaluated on four benchmark datasets.

4.1. Benchmark datasets

Following the recently proposed experimental settings (Xian et al., 2017) for ZSL, we evaluate our TGG on four benchmark datasets: aPY (Farhadi et al., 2009), AwA2 (Xian et al., 2017), CUB (Wah et al., 2011) and SUN (Patterson and Hays., 2012). Among them, aPY and AWA2 contain coarse-grained classes and are of small and medium size respectively, while both CUB and SUN are medium-size datasets with fine-grained classes. The statistics of them and the associated data splits applied in this paper are provided in Table 1.

4.2. Implementation details

4.2.1. Image feaures and side information

For a fair comparison, we use 2048-dim image features from top-layer pooling units of the 101-layered ResNet (He et al., 2016) provided by (Sung et al., 2018). As for side information, we use continuous valued semantic attributes provided by (Sung et al., 2018), whose dimensions are shown in Table 1. It is noted that our graph generation algorithm is feature-agnostic for both visual feature and side information.

4.2.2. Network architecture and training settings

Our aggregate network applies 2 search depth (i.e., 2-hops) with output dimension of 1024 and 512, respectively. We perform batch normalization after each output layer, followed with ReLU activation function. As for multi-head attention module, two dense layers respectively followed by tanh and LeakyReLU 

(Xu et al., 2015) activations are developed for both class-level and instance-level attention. In the relation kernel module, we use a two-layer MLP with batch normalization and ReLU activation for adjacency matrix building, whose input and output dimensions are consistent with the output of the aggregate network and adjacency matrix size, respectively. GCN module is composed of 2 graph convolutional layers with output channel dimensionality of 512 and 128, respectively. Our whole TGG model is trained end-to-end via ADAM (Kingma and Ba, 2014) optimizer with learning rate 0.001 and weight decay 0.0005. The batch size is set to be 128 for all datasets and we use validation sets for early stopping. Both and in Eq.(15

) are set to be 0.5. We implement our TGG by PyTorch

111https://pytorch.org/ and the source code of our work is available at: https://github.com/zcrwind/tgg-pytorch.

4.3. Evaluation metrics

We follow the standard evaluation metrics used in the literature. For ZSL and FSL, we evaluate the classification performance by the top-1 accuracy, which equals to the percentage of the predicted labels that match the ground truth labels. For GZSL setting, we use Harmonic Mean (HM) of the separately computed accuracies of seen and unseen classes (

and respectively), as proposed in (Xian et al., 2017) as follows:

The main motivation behind HM is that it can estimate the inherent biasness of GZSL methods towards seen classes. That is, classification methods biased to seen classes will lead to that

is much higher than , and thus the HM value drops down significantly. For fair comparison, we report the average results of 10 random trails for ZSL, GZSL and FSL.

4.4. Results and analysis

4.4.1. ZSL and GZSL

We compare our TGG with recent state-of-the-art methods on ZSL and GZSL, and the results are reported in Tabel 2. It is clear that our TGG consistently yields substantial improvements on all datasets for both ZSL and GZSL. More impressively, with respect to unseen classes in GZSL setting on several datasets, the accuracy of our TGG is almost 2 times as that of the second place methods. For example, we respectively achieve the highest of 69.8%/65.8% on AwA2/SUN for GZSL setting, which has a relative improvement of almost 200% over the second place GDAN (Huang et al., 2018), whose associated accuracies are 33.2% and 38.1% respectively. Although our accuracies for seen classes are slightly lower than RelationNet (Sung et al., 2018) and GDAN (Huang et al., 2018) on AwA2 and SUN respectively, we still obtain 78.7% and 75.4% Harmonic Mean (HM) on these two datasets for GZSL, which are respectively 33.4% and 24.0% higher than the two compared methods. This indicates that our TGG can reduce classifier bias to seen classes, and thus manage the trade-off between seen and unseen domains. We attribute this to the introduction of explicit relations and proximity structure modeling, which is vital for alleviating domain shift. Moreover, our TGG surpasses some recent generative methods like SP-AEN (Chen et al., 2018), PSR-ZSL (Annadani and Biswas, 2018) and GDAN (Huang et al., 2018) in both unseen class accuracy and HM score, as our TGG explicitly generates relations of graph topology and is robust to noise in the synthesized dummy feature, while retaining the advantages of supervised training.

k-shot Methods aPY AwA2 CUB SUN
1-shot DeViSE (Frome et al., 2013) - 81.1 54.9 -
CMT (Socher et al., 2013) - 85.6 57.3 -
CAPD (Rahman et al., 2018) 71.2 81.4 46.3 53.7
Our TGG 73.9 86.8 65.5 66.0
3-shot DeViSE (Frome et al., 2013) - 83.8 55.7 -
CMT (Socher et al., 2013) - 86.9 58.4 -
CAPD (Rahman et al., 2018) 83.6 86.9 56.9 66.3
Our TGG 84.7 88.1 69.6 70.2
Table 3. FSL results evaluated on four benchmark datasets.

4.4.2. Fsl

Our TGG can be seamlessly extended to FSL, by replacing the dummy data with real support data in unseen classes. In few/one-shot settings, we follow CAPD (Rahman et al., 2018) to randomly choose three/one instances per unseen class as labeled examples in training. The comparison results are provided in Table 3. Still, our TGG outperforms all the compared methods on all datasets with a quite large margin. Comparing the results of Table 3 and Table 2, we can observe that adding real image feature of unseen classes can be always beneficial, and our TGG gains considerable improvement although the given support data is rare. More interestingly, the 3-shot performance in AwA2 dataset (88.1%) tends to approach that of seen classes in GZSL (90.1%). This phenomena indicates that our graph generation approach can cope with the data imbalance well, with the aid of the class-level graph and attention-based aggregation.

Methods aPY AwA2 CUB SUN
TGG aggregation 35.6 43.3 31.5 29.4
TGG attention 58.9 70.6 59.2 61.8
TGG GCNs 57.4 70.7 58.5 60.2
TGG graph kernel 60.3 74.6 60.4 61.1
TGG dual relation prop 62.7 75.1 62.9 63.0
Our TGG 63.5 77.2 64.1 68.9
Table 4. Ablation studies with ZSL setting on four datasets.

4.5. Ablation studies

We conduct ablation studies on ZSL, to further evaluate the effect of different components in our TGG approach, and the results are exhibited in Table 4. We design ablative experiments from two perspectives, namely architecture and constraints. In terms of architecture, we independently remove the aggregation network, multi-head attention and GCNs from TGG framework, corresponding to the first three rows of Table 4. In terms of constraints, we remove the graph kernel () or replace the dual relation propagation () with standard label propagation algorithm, corresponding to the 4-th and 5-th rows of Table 4 respectively.

From the experimental results, we can obverse that the accuracies drop drastically when the aggregation module is totally removed. This is mainly because the aggregation operation captures proximity structure from class-level prototype graph, it integrates neighborhood information to revise node embeddings and alleviate domain shift, and thus more robust than direct graph generation over the original image features. Similarly, if we simply use mean aggregator without the multi-head attention in aggregation module, the performance will also be impaired significantly (from  4% to  7% on four datasets). This illustrates that attention is vital in such graph translation procedure, as neighborhood information should be finely screened to tackle information dilution and negative knowledge transfer, as well as cope with noise in the dummy instances for ZSL. Moreover, as shown in the third row of Table 4, GCNs in our relation kernel module also play a crucial role, as they further refine topologies of the generated graph at instance level and increase nonlinearity.

From the constraints perspective, the results of the 4-th and 5-th rows in Table 4 demonstrate the effects of and , respectively. As a regularization, graph kernel constraint () encourages the instance-level graph to be consistent with the class-level graph in local structure, which overcomes overfitting and is especially vital for the datasets with fine-grained classes (such as CUB and SUN). Moreover, dual relation propagation () is also beneficial for zero-shot generalization, which stably gains around 2% improvement on four datasets.

Figure 4. Sensitivity experiments of the size.

4.6. Sensitivity experiments

As stated in Section 3.1.2, the initial is densely connected with normalized edge weights. In order to figure out the effect of size on the performance of our TGG, we set different thresholds of edge weights to crop , i.e., one edge will be removed if its weight is smaller than the pre-defined threshold. The experiments are conducted on ZSL setting and the results are shown in Figure 4. We can observe that the size of is crucial for the subsequent graph generation and classification. Our conclusions are two-folds: (1) Using whole edges can be suboptimal, as some neighbor information with sloppy relations will be involved for graph generation, resulting in negative knowledge transfer. (2) If the edges are removed in large or even in total (when the threshold is set to 1.0), the performance drops drastically. This indicates the fact that prototype relations in play a vital role for our instance-level graph generation.

5. Conclusion

In this paper, we have proposed a unified and flexible framework TGG for ZSL, GZSL and FSL via graph generation, towards a comprehensive relation modeling and utilization in an explicit manner. Our TGG not only accounts for the structural matching between the semantic space and the visual feature space, but also enriches it with instance-level relation modeling, which captures intra-class variance for better decision boundary learning. Extensive experiments performed on widely-used zero-shot and few-shot datasets attest the superiority of our approach. In the future work, we attempt to model relations with more advanced graph generation techniques, as well as reduce the computational complexities of our TGG for larger scale transfer learning situations.

Acknowledgements.
This work was supported by the National Natural Science Foundation of China under Grant 61876003. It is a research achievement of Key Laboratory of Science, Techonology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).

References

  • Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid (2013) Label-embedding for attribute-based classification. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 819–826. Cited by: §1, §2.1.1, Table 2.
  • Z. Akata, S. Reed, D. Walter, H. Lee, and B. Schiele (2015) Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2927–2936. Cited by: Table 2.
  • Y. Annadani and S. Biswas (2018) Preserving semantic relations for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7603–7612. Cited by: §4.4.1, Table 2.
  • A. Bojchevski, O. Shchur, D. Zügner, and S. Günnemann (2018) Netgan: generating graphs via random walks. arXiv preprint arXiv:1803.00816. Cited by: §2.2.
  • J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §2.2.
  • S. Changpinyo, W. Chao, B. Gong, and F. Sha (2016) Synthesized classifiers for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5327–5336. Cited by: Table 2.
  • L. Chen, H. Zhang, J. Xiao, W. Liu, and S. Chang (2018) Zero-shot visual recognition using semantics-preserving adversarial embedding networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1043–1052. Cited by: §4.4.1, Table 2.
  • N. De Cao and T. Kipf (2018) MolGAN: an implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973. Cited by: §2.2.
  • M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §2.2.
  • A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth (2009) Describing objects by their attributes. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §4.1.
  • S. Franco, G. Marco, T. Ah Chung, H. Markus, and M. Gabriele (2009) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61. Cited by: §2.2.
  • A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. (2013) Devise: a deep visual-semantic embedding model. In Advances in neural information processing systems, pp. 2121–2129. Cited by: §1, Table 2, Table 3.
  • J. Gao, T. Zhang, and C. Xu (2019) I know the relationships: zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. In Thirty-third AAAI Conference on Artificial Intelligence, Cited by: §3.1.2.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1, §1, §2.1.1, §2.2, §3.1.3.
  • M. Gori, G. Monfardini, and F. Scarselli (2005) A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2, pp. 729–734. Cited by: §2.2.
  • I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pp. 5767–5777. Cited by: §3.1.3.
  • W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §2.2, §3.2.1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §4.2.1.
  • H. Huang, C. Wang, P. S. Yu, and C. Wang (2018) Generative dual adversarial network for generalized zero-shot learning. arXiv preprint arXiv:1811.04857. Cited by: §1, §2.1.1, §3.1.3, §4.4.1, Table 2.
  • M. Kampffmeyer, Y. Chen, X. Liang, H. Wang, Y. Zhang, and E. P. Xing (2018) Rethinking knowledge graph propagation for zero-shot learning. arXiv preprint arXiv:1805.11724. Cited by: §2.1.1, §3.2.1.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.2.
  • T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §2.2, §3.2.1.
  • G. Koch, R. Zemel, and R. Salakhutdinov (2015) Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, Vol. 2. Cited by: §2.1.2.
  • E. Kodirov, T. Xiang, and S. Gong (2017)

    Semantic autoencoder for zero-shot learning

    .
    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3174–3183. Cited by: Table 2.
  • Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §2.2.
  • Y. Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia (2018) Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324. Cited by: §2.2.
  • M. Mirza and S. Osindero (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. Cited by: §3.1.3, §3.3.2.
  • A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, and S. Jaiswal (2017)

    Graph2vec: learning distributed representations of graphs

    .
    arXiv preprint arXiv:1707.05005. Cited by: §3.2.2.
  • G. Patterson and J. Hays. (2012) Sun attribute database: discovering, annotating, and recognizing scene attributes. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §4.1.
  • S. Rahman, S. Khan, and F. Porikli (2018) A unified approach for conventional zero-shot, generalized zero-shot, and few-shot learning. IEEE Transactions on Image Processing 27 (11), pp. 5652–5667. Cited by: §4.4.2, Table 2, Table 3.
  • B. Romera-Paredes and P. Torr (2015) An embarrassingly simple approach to zero-shot learning. In International Conference on Machine Learning, pp. 2152–2161. Cited by: §1, Table 2.
  • E. Schönfeld, S. Ebrahimi, S. Sinha, T. Darrell, and Z. Akata (2019) Generalized zero- and few-shot learning via aligned variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8247–8255. Cited by: §1.
  • J. Snell, K. Swersky, and R. Zemel (2017) Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems, pp. 4077–4087. Cited by: §2.1.2.
  • R. Socher, M. Ganjoo, C. D. Manning, and A. Ng (2013) Zero-shot learning through cross-modal transfer. In Advances in neural information processing systems, pp. 935–943. Cited by: Table 3.
  • R. Speer, J. Chin, and C. Havasi (2017) Conceptnet 5.5: an open multilingual graph of general knowledge. In Thirty-First AAAI Conference on Artificial Intelligence, Cited by: §3.1.2.
  • F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales (2018) Learning to compare: relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199–1208. Cited by: §4.2.1, §4.4.1, Table 2.
  • P. Veličković, G. Cucurull, A. Casanova, A. Romero, and Y. Bengio (2018) Graph attention networks. In International Conference of Learning Representation, Cited by: §3.2.1.
  • O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. (2016) Matching networks for one shot learning. In Advances in neural information processing systems, pp. 3630–3638. Cited by: §1, §2.1.2.
  • C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie (2011) The caltech-ucsd birds-200-2011 dataset. Cited by: §3.1.2, §4.1.
  • W. Wang, Y. Pu, V. K. Verma, K. Fan, Y. Zhang, C. Chen, P. Rai, and L. Carin (2018a) Zero-shot learning via class-conditioned deep generative models. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §1.
  • X. Wang, Y. Ye, and A. Gupta (2018b) Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6857–6866. Cited by: §1, §2.1.1.
  • Y. Xian, Z. Akata, G. Sharma, Q. Nguyen, M. Hein, and B. Schiele (2016) Latent embeddings for zero-shot classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 69–77. Cited by: Table 2.
  • Y. Xian, B. Schiele, and Z. Akata (2017) Zero-shot learning-the good, the bad and the ugly. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4582–4591. Cited by: §1, §2.1.1, §4.1, §4.3.
  • Y. Xian (2019) F-vaegan-d2: a feature generating framework for any-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10275–10284. Cited by: §1.
  • B. Xu, N. Wang, T. Chen, and M. Li (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853. Cited by: §4.2.2.
  • C. Zhang and Y. Peng (2018) Visual data synthesis via gan for zero-shot video classification. arXiv preprint arXiv:1804.10073. Cited by: §1, §3.1.3.
  • L. Zhang, T. Xiang, and S. Gong (2017) Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2021–2030. Cited by: Table 2.
  • Z. Zhang and V. Saligrama (2015) Zero-shot learning via semantic similarity embedding. In Proceedings of the IEEE international conference on computer vision, pp. 4166–4174. Cited by: §1, §2.1.1, Table 2.
  • X. Zhu and Z. Ghahramani (2002) Learning from labeled and unlabeled data with label propagation. Technical report Citeseer. Cited by: §3.3.1, §3.3.1.