Evolving Metric Learning for Incremental and Decremental Features

06/27/2020 ∙ by Jiahua Dong, et al. ∙ University of Arkansas at Little Rock 7

Online metric learning has been widely exploited for large-scale data classification due to the low computational cost. However, amongst online practical scenarios where the features are evolving (e.g., some features are vanished and some new features are augmented), most metric learning models cannot be successfully applied into these scenarios although they can tackle the evolving instances efficiently. To address the challenge, we propose a new online Evolving Metric Learning (EML) model for incremental and decremental features, which can handle the instance and feature evolutions simultaneously by incorporating with a smoothed Wasserstein metric distance. Specifically, our model contains two essential stages: the Transforming stage (T-stage) and the Inheriting stage (I-stage). For the T-stage, we propose to extract important information from vanished features while neglecting non-informative knowledge, and forward it into survived features by transforming them into a low-rank discriminative metric space. It further explores the intrinsic low-rank structure of heterogeneous samples to reduce the computation and memory burden especially for highly-dimensional large-scale data. For the I-stage, we inherit the metric performance of survived features from the T-stage and then expand to include the augmented new features. Moreover, the smoothed Wasserstein distance is utilized to characterize the similarity relations among the complex and heterogeneous data, since the evolving features in the different stages are not strictly aligned. In addition to tackling the challenges in one-shot case, we also extend our model into multi-shot scenario. After deriving an efficient optimization method for both T-stage and I-stage, extensive experiments on several benchmark datasets verify the superiority of our model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 6

page 7

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Metric learning has been successfully extended into many fields, e.g., face identification [10.1007/978-3-642-19309-5_55], object recognition [Xu:2018:BDM:3327144.3327333] and medical diagnosis [Boukouvalas2011DistanceML]. To efficiently solve the large-scale streaming data problem, learning a discriminative metric in an online manner (i.e., online metric learning [Chechik:2010:LSO:1756006.1756042, NIPS2008_3446]) attracts lots of appealing attentions. Generally, most online metric learning models pay attention to the fast metric updating strategies [Weinberger2009, LI2018302, NIPS2009_3703, NIPS2011_4392] or fast similarity searching methods [NIPS2008_3446, Davis2007IML1273496, NIPS2009_3703] for large-scale streaming data.

However, these existing online metric learning methods [8392504, 8552662] only focus on instance evolution, and ignore the feature evolution in many real applications where some features are vanished and some new features are augmented. Take the human motion recognition [DBLP:journals/corr/abs-1904-12602] as an example (i.e., Fig. 1), the emerging of new kinect sensor and the sudden damage of depth camera will respectively lead to a corresponding increase and decrease in the feature dimensionality of the input data, which heavily cripples the performance of the pre-trained model [DBLP:journals/corr/abs-1904-12602]. Another interesting example is that different kinds of sensors (e.g., radioisotope, trace metal and biological sensors [s5010004]) are deployed for the research of dynamic environment monitoring in full aspects, some sensors (features) expire whereas some new sensors (features) are deployed when different electrochemical conditions and lifespans occur. A fixed or static online metric learning model will fail to make full use of sensors (features) evolved in this way. Therefore, how to establish a novel metric learning model to simultaneously handle instance evolution, and incremental and decremental features amongst these online practical systems is our focus in this paper.

Fig. 1: Illustration example of feature evolution on human motion recognition task, where the blue, red and green colors respectively indicate different kinds of features collected from depth, RGB and kinect senors with different lifespans. The features collected from different sensors are incremental in the T-stage, and decremental in the I-stage.

To address the challenges above, as shown in Fig. 2, we present a new online Evolving Metric Learning (EML) model for incremental and decremental features, which can exploit streaming data with both instance and feature evolutions in an online manner. To be specific, the proposed EML model consists of two significant stages, i.e., the Transforming stage (T-stage) and the Inheriting stage (I-stage). 1) In the T-stage where features are decremental, we propose to explore the important information and data structure from vanished features, and transform them into a low-rank discriminative metric space of survived features, which can be used to assist the learning of following metric tasks. Moreover, it explores the intrinsic low-rank structure of the streaming data, which efficiently reduces the computation and memory costs especially for highly-dimensional large-scale samples. 2) For the I-stage where features are incremental, based on the learned discriminative metric space in the T-stage, we inherit the metric performance of survived features from T-stage and then expand to include new augmented features. Furthermore, to better explore the similarity relations amongst the heterogeneous data, a smoothed Wasserstein distance is applied into both T-stage and I-stage where the evolving features in different stages are strictly unaligned and heterogeneous. For the model optimization, we derive an efficient optimization method to solve both T-stage and I-stage. Besides, our model can be successfully extended from one-shot scenario into multi-shot scenario. Comprehensive experimental results on several benchmark datasets can strongly support the effectiveness of our proposed EML model.

The main contributions of this paper are summarized as follows:

  • We propose an online Evolving Metric Learning (EML) model for incremental and decremental features to tackle both instance and feature evolutions simultaneously. To our best knowledge, this is the first attempt to tackle this crucial, but rarely-researched challenge in the metric learning field.

  • We present two stages for both feature and instance evolutions, i.e., the Transforming stage (T-stage) and the Inheriting stage (I-stage), which can not only make full use of the vanished features in the T-stage, but also take advantage of streaming data with new augmented features in the I-stage.

  • A smoothed Wasserstein distance is incorporated into metric learning to characterize the similarity relations for heterogeneous evolving features in different stages. After deriving an alternating direction optimization algorithm to optimize our EML model, extensive experiments on benchmark datasets validate the effectiveness of our proposed model.

Ii Related Work

This section first provides a brief overview about online metric learning for instance evolution. Then some representative works about feature evolution are introduced.

Ii-a Online Metric Learning

Online metric learning has been widely explored for instance evolution to learn large-scale streaming data, which can mainly be divided into two categories: Mahalanobis distance-based and bilinear similarity-based methods. For the Mahalanobis distance-based methods, POLA [ShalevShwartz2004] is the first attempt to learn the optimal metric in an online way. Then several variants [NIPS2008_3446, Davis2007IML1273496, 8617698] extend this idea by the fast similarity searching strategies, e.g., [NIPS2009_3703] propose a regularized online metric learning model with the provable regret bound. Besides, pairwise constraint [NIPS2009_3703] and triplet constraint [NIPS2011_4392] are adopted to learn a discriminative metric function. Generally, triplet constraints are more effective than pairwise constraints to learn a discriminative metric function [NIPS2011_4392, Qian2015]. For the bilinear similarity-based models, OASIS [Chechik:2010:LSO:1756006.1756042]

is proposed to learn a similarity metric for image similarity, and SOML

[Gao2014SOMLSO] aims to learn a diagonal matrix for high dimensional cases with the similar setting as OASIS. Besides, [6579606] presents an Online Multiple Kernel Similarity to tackle multi-modal tasks. However, these recently-proposed Mahalanobis distance-based and bilinear similarity-based methods cannot exploit the discriminative similarity relations for the strictly unaligned heterogeneous data in different evolution stages.

Ii-B Feature Evolution

For the feature evolution, with the assumption that there exists samples from both vanished feature space and augmented feature space in an overlapping period, [Hou:2017:LFE:3294771.3294906] develops a evolvable feature streaming learning model by reconstructing the vanished features and exploiting it along with new emerging features for large-scale streaming data. [Hou2018OnePassLW] proposes an one-pass incremental and decremental learning model for streaming data, which consists of compressing stage and expanding stage. Different from [Hou:2017:LFE:3294771.3294906], [Hou2018OnePassLW] assumes that there are overlapping features instead of overlapping period. Similarly with [Hou2018OnePassLW], [Ye2018RectifyHM] focuses on learning the mapping function from two different feature spaces by using optimal transport technique. [Zhang2015TowardsMT, 7465766] intend to deal with trapezoidal data stream where both instance and feature can doubly increase. However, the new emerging data always have overlapping features with the previously existing data. [8410016]

develops a feature incremental random forest model to improve performance for a small amount of data with newly incremental features, which enables the model generalize well to the emergence of incremental features.

Amongst the discussion above, there are no any feature evolution models highly related to our work except for OPID (OPIDe) [Hou2018OnePassLW]. However, there are several key differences between [Hou2018OnePassLW] and our EML model: 1) Our work is the first attempt to explore both instance and feature evolutions simultaneously via T-stage and I-stage in the metric learning field, when compared with [Hou2018OnePassLW]. 2) Due to the strictly unaligned evolving features in the different stages, we utilize the smoothed Wasserstein distance to characterize the similarity relations among the complex and heterogeneous data, rather than the Euclidean distance in [Hou2018OnePassLW]. 3) Compared with [Hou2018OnePassLW], the low-rank regularizer for distance matrix could efficiently learn a discriminative low-rank metric space, while neglecting non-informative knowledge for heterogeneous data in different feature evolution stages.

Iii Evolving Metric Learning (EML)

In this section, we first review online metric learning, and then detailedly introduce how to tackle both instance and feature evolutions via our proposed EML model.

Iii-a Revisit Online Metric Learning

Metric learning aims to learn an optimal distance measure matrix according to the different measure functions, e.g., Mahalanobis distance function: , where and are the -th and -th samples, respectively. is the symmetric positive semi-definite matrix, which can be decomposed as [NIPS2008_3446], where ( is the rank of ) is the transformation matrix. Then the Mahalanobis distance function between and can be rewritten as . Given an online constructed triplet , could be updated in an online manner via the Passive-Aggressive algorithm [Crammer:2006:OPA:1248547.1248566], i.e.,

(1)

where

is a hinge loss function.

represents , and belong to the same class, and and belong to the different classes. is the regularization parameter.

However, most existing online metric learning models (e.g., Eq. (1)) only focus on instance evolution with a fixed feature dimensionality, which is unable to be used in the feature evolution scenario, i.e., streaming data with incremental and decremental features. Furthermore, the learned distance matrix of Eq. (1) is not discriminative enough to explore similarity relations of the complex and heterogeneous samples, whose evolving features in different evolution stages are not strictly aligned [xu2018multi].

Iii-B The Proposed EML Model

In this subsection, we first present the introductions of integrating a smoothed Wasserstein distance into online metric formulation (i.e., Eq. (1)) to characterize the similarity relations of heterogeneous data with feature evolution in the different stages. Then the details about how to tackle feature evolution via T-stage and I-stage in one-shot scenario are elaborated, followed by the extension of multi-shot case.

Iii-B1 Online Wasserstein Metric Learning

Wasserstein distance [5703094] is an optimal transportation to move all the earth from the source to the destination, which requires the minimum amount of efforts. Formally, given two signatures and , the smoothed Wasserstein distance [Cuturi:2013:SDL:2999792.2999868] between and is:

(2)

where is a distance matrix, and each denotes the cost of moving one unit of earth from the source sample to the target sample . is the flow network matrix, and each represents the amount of earth moved from to . and

are normalized marginal probability mass vectors.

is a balance parameter, and is the strictly concave entropic function.

In the Eq. (2), the Mahalanobis distance is employed as ground distance to construct smoothed Wasserstein distance. Thus, each element of in Eq. (2) represents the squared Mahalanobis distance between the source sample of and the target sample of , i.e., . Given the online constructed triplet via [pmlr-v51-rolet16], where the samples of and belong to the same class, and the samples of and belong to different classes. After substituting Mahalanobis distance in Eq. (1) with the smooth Wasserstein distance defined in Eq. (2), online Wasserstein metric learning can be:

(3)

where When compared with the triplet , each signature in consists of several samples belonging to the same class rather than only one sample.

Fig. 2: The illustration of our EML model in one-shot scenario, which evolves features and instances simultaneously via T-stage and I-stage. Different colors denote different kinds of features, e.g., blue, red and green colors denote the vanished, survived and augmented features, respectively. The purple color indicates labels and the number of corresponding samples.

Iii-B2 Transforming Stage Inheriting Stage

Two essential stages (i.e., T-stage and I-stage) of our proposed EML model for steaming data with feature evolution are elaborated below.

I. Transforming Stage (T-stage): As shown in Fig. 2, suppose that denotes the data stream in the T-stage, where and denote the samples and labels in the -th batch, respectively. is the total batches in T-stage and is the number of samples in the -th batch. Obviously, each instance of contains two kinds of features, i.e., vanished and survived features, and and indicate the corresponding dimensions of vanished features and survived features .

If we directly combine both vanished and survived features to learn a unified metric function, it cannot be used in I-stage where some features are vanished and some other features are augmented. We thus propose to extract important information from vanished features and forward it into survived features by transforming them into a common discriminative metric space. In other words, we want to train a model using only survived features to represent the important information of both vanished and survived features.

In the -th batch of T-stage, inspired by [pmlr-v51-rolet16], the triplet for survived features is constructed in an online manner, where the samples of and belong to the same class while the samples of and belong to different classes. and are the numbers of samples in each signature. Similarly, we can construct the triplet for all features (including both vanished and survived features), where the samples of and belong to same class while the samples of and belong to different classes.

Let and denote the distance matrices trained on survived features and all features in T-stage. Since the dimensions of and are different, it is reasonable to add some essential consistency constraints on the optimal distance matrices and to extract important information from vanished features, and forward it into survived features. Generally, based on the smoothed Wasserstein metric learning in Eq. (3), the formulation of the -th batch for the T-stage can be expressed as follows:

(4)

where and denote the triplet losses of smoothed Wasserstein metric learning on survived features and all features, respectively. denotes the regularization term, which explores the intrinsic low-rank structure of heterogeneous samples. and are balance parameters. in Eq. (4) is designed to enable the consistence constraint for and , which aims to use only survived features to represent the important information of both vanished and survived features.

Specifically, constructs an essential triplet loss of smoothed Wasserstein metric learning on different feature spaces, i.e., all features and survived features. We attempt to compute the smoothed Wasserstein distance between different heterogeneous distributions from vanished features and all features. For example, denotes the smoothed Wasserstein distance between from all features and from survived features, where indicates the Mahalanobis distance between the -th source sample of and the -th target sample of . Likewise, and have similar definitions with . Formally, the consistence constraint is expressed as:

(5)

II. Inheriting Stage (I-stage): Suppose are the data in the -th batch of I-stage, where indicates the samples and is the corresponding labels, as shown in Fig. 2. and represent the survived features and new augmented features in the -th batch. and are the dimension of the new augmented features and the number of samples. Thus, the goal of this stage is to use for training and make the prediction for the -th batch data whose number of samples is same as that of .

To predict the label of the -th batch data, we propose to inherit the metric performance of optimal distance matrix learned on survived features in T-stage, since a set of common survived features exist in both T-stage and I-stage. Although we can construct the triplets directly from the -th batch for training, this simple strategy has two shortcomings: 1) the trained metric model is difficult to be extended into multi-shot scenario; 2) the metric model learned only with the -th batch data will have the worse performance due to the lack of full usage of data in T-stage.

The stack strategy [Breiman1996, Zhou:2012:EMF:2381019] is employed to better inherit the metric performance learned in T-stage. Concretely, let as the transformed discriminative metric space, which can be regarded as a new representation of for stacking. can then be represented as . Likewise, is characterized as . We can learn an optimal distance matrix on with online constructed triplet , and test the performance on , where the samples of and belong to same class while the samples of and belong to different classes. Formally, at the -th iterative step, the objective function of learning in I-stage can be formulated as:

(6)

where and are the balance parameters. In our experiments, and are set as the same value in both Eq. (4) and Eq. (6) for simplification. denotes the regularization term, which aims to explore the intrinsic low-rank structure of heterogeneous samples in I-stage.

Fig. 3: The illustration of our EML model in multi-shot scenario. When , Stage 1 and Stage 2 share survived features with red color, and Stage 2 and Stage 3 share augmented features with green colors.

Iii-B3 Multi-shot Scenario

This subsection extends our model into multi-shot scenario. Suppose that there are stages for training, i.e., multiple alternating T-stages and I-stages. The -th stage contains batches, where the features also contain two parts, i.e., survived features and augmented features. Notice that the augmented feature in -th stage is the survived features in -th stage. is denoted as the total number of training batches. The illustration example of multi-shot scenario when is depicted in Fig. 3. Generally, we have two tasks in the multi-shot scenario, i.e., 1) Task I

: Similarly to the task in one-shot case, we aim to classify testing data

in -th stage with training data and other batches data of stages; 2) Task II: Different from the one-shot scenario, we attempt to make predictions in any batch of any training stage.

Iv Model Optimization

This section presents an alternating optimization strategy to update our EML model amongst two stages, i.e., T-stage and I-stage, followed by the computational complexity analysis of our model. The whole optimization procedure of our proposed EML model is summarized in Algorithm 1.

Note that the low-rank minimization in Eq. (4) and Eq. (6) is a well-known NP hard problem. Take as an example, in Eq. (6) can be efficiently surrogated by trace norm

. Different from traditional Singular Value Thresholding (SVT)

[doi:10.1137/080738970], we develop a regularization term to guarantee the low-rank property, i.e., . As a result, in Eq. (6) can be formulated as , where . Similarly, low rank optimization of and shares same strategy with . and can be respectively surrogated by and , where and .

Iv-a Optimizing T-stage via an Alternating Strategy

Iv-A1 Updating by fixing

With the fixed and , the optimization problem in Eq. (4) for solving variable can be concretely expressed as:

(7)

The optimal solution of can be relaxedly achieved by nulling the gradient of Eq. (7):

(8)

where .

Iv-A2 Updating by fixing

With the obtained distance matrix and flow matrix , the optimization problem for variable in Eq. (4) can be denoted as:

(9)

Similarly, the updating operator for can be given as:

(10)

where .

Iv-A3 Updating by fixing

When the distance matrices and are fixed, Eq. (4) can be split into some independent traditional smoothed Wasserstein distance subproblems, which can be solved by the method [pmlr-v51-rolet16]. We omit the detailed process of solving smoothed Wasserstein distance subproblems for simplicity.

0:  The data , the parameters ;
0:   and ;
1:  Initialize: ;
2:   Transforming Stage (T-stage):
3:  for  do
4:     For data , calculate the smoothed Wasserstein distance, and construct the triplets for training;
5:     repeat
6:        Fix and , solve for distance flow-network ;
7:        Update via Eq. (8);
8:        Update via Eq. (10);
9:        Update and via and ;
10:     until Converge
11:  end for
12:   Inheriting Stage (I-stage):
13:  Represent as to calculate smoothed Wasserstein distance, and construct the training triplets;
14:  repeat
15:     Fix , solve for the distance flow-network ;
16:     Update via Eq. (12);
17:     Update via ;
18:  until Converge
Algorithm 1 The Optimization of Our Proposed EML Model

Iv-B Optimizing I-stage via an Alternating Strategy

Iv-B1 Updating by fixing

Given the fixed and , the formulation for in Eq. (6) can be rewritten as:

(11)

By nulling the gradient of Eq. (11), the optimal solution of Eq. (11) for can be given as:

(12)

where .

Iv-B2 Updating by fixing

The optimization procedure of variable in I-stage is same as that in T-stage: with the fixed , we split the formulation Eq. (6) into some independent traditional smoothed Wasserstein distance subproblem, and solve the variable via [pmlr-v51-rolet16].

Iv-C Computational Complexity Analysis

The main computational cost in our EML model involves the updating operations in both T-stage and I-stage. Specifically, in the T-stage, the computational costs of updating and are and , respectively. For the I-stage, solving the variable in Eq. (6) takes . Besides, the computational cost of solving in both T-stage and I-stage is , where . Since is usually small when comparing with the number of features and samples, our proposed model is thus computationally efficient in an online manner.

V Experiments

This section first introduces detailed experimental configurations and competing methods. Then the experimental results along with some analyses about our EML model in both one-shot and multi-shot scenarios are provided.

V-a Configurations and Competing Methods

The experimental configurations of our EML model in one-shot scenario and some competing methods are detailedly introduced in this subsection.

V-A1 Experimental Configurations

As shown in Table I, we conduct extensive comparisons on one real world human motion recognition dataset [DBLP:journals/corr/abs-1904-12602] (i.e., EV-Action) and five synthetic benchmark datasets111http://archive.ics.uci.edu/ml/ containing: three digit datasets (i.e., Mnist, Gisette and USPS), one DNA dataset (i.e., Splice) and one image dataset (i.e., Satimage). Specifically, EV-Action dataset [DBLP:journals/corr/abs-1904-12602] is a large-scale human action dataset with 5300 samples, which are collected from three sensors, i.e., depth camera, RGB camera, and skeleton senors. EV-Action consists of 20 common action classes, where 10 actions are finished by single subject and the others are accomplished by the same subjects interacting with other objects. This dataset is a typical application for features evolution in the real world, where the features collected from depth camera, RGB camera, and skeleton senors are respectively regarded as vanished, survived and augmented features. Some samples about human actions are visualized as Fig. 4.

Fig. 4: The examples of human motions in EV-Action dataset, where the first, second and third rows denote the samples collected from depth camera, RGB camera and kinect sensor, respectively.
Datasets c
EV-Action 20 4200 1024 1024 75
Mnist0vs5 2 3200 114 228 113
Mnist0vs3vs5 3 4800 123 245 121
Splice 2 2240 10 40 10
Gisette 2 6000 1239 2478 1238
USPS0vs5 2 960 64 128 64
USPS0vs3vs5 3 1440 64 128 64
Satimage 3 1080 10 18 8
TABLE I: The experimental settings in one-shot scenario.
Dataset Pegasos [Shalev-Shwartz2011] OPMV [Zhu2015OnePassML] TCA [5640675] BDML [Xu:2018:BDM:3327144.3327333] OPML [LI2018302] OPIDe [Hou2018OnePassLW] OPID [Hou2018OnePassLW] Ours
500 57.381.51 56.371.91 53.882.04 56.420.71 54.101.71 57.841.06 57.571.08 58.870.68
EV-Action 600 57.461.60 56.941.82 54.611.73 56.810.65 55.371.64 57.220.95 56.711.40 58.650.84
700 57.221.34 56.681.87 54.371.69 56.630.77 55.821.62 57.091.13 56.851.27 58.320.82
Mnist 80 97.740.73 97.390.92 96.531.75 97.001.66 96.451.72 98.680.88 98.880.99 99.850.91
0vs5 160 98.111.03 95.821.84 93.082.94 98.250.80 96.831.38 97.940.97 98.750.90 99.780.57
320 97.680.79 96.471.79 92.433.82 98.240.75 96.981.03 97.380.58 97.210.66 99.270.37
Mnist 120 91.473.92 95.871.82 91.263.87 92.232.86 92.422.22 94.581.78 94.971.30 96.911.38
0vs3vs5 240 89.953.08 93.961.18 90.851.74 92.871.40 91.991.64 93.451.41 93.481.35 95.370.92
480 90.121.93 93.281.69 91.143.95 93.211.06 92.741.17 93.300.86 93.370.79 95.540.87
80 79.654.13 80.133.86 76.934.52 65.655.53 69.604.38 81.223.73 80.503.53 82.653.32
Splice 160 82.253.26 81.952.84 80.933.47 71.554.07 78.212.53 84.002.03 83.912.05 85.252.06
320 82.323.18 78.724.37 81.533.38 72.163.40 80.862.01 85.551.32 85.941.38 87.031.52
100 97.531.33 95.272.85 94.113.35 90.253.13 94.173.02 97.141.28 97.561.26 97.291.25
Gisette 200 95.142.97 94.053.36 93.033.16 91.501.25 93.613.19 95.590.95 95.391.06 96.820.91
300 96.841.35 93.713.11 94.373.72 93.832.12 93.772.96 96.360.69 95.330.93 97.890.43
USPS 120 98.520.67 95.272.67 96.421.81 95.901.65 93.722.32 96.171.44 96.511.25 97.231.64
0vs5 160 97.840.82 95.651.72 95.462.13 96.381.23 93.044.05 96.781.31 96.931.00 98.910.67
240 97.930.72 96.171.28 95.852.07 96.781.18 93.623.01 94.931.28 95.061.10 98.940.70
USPS 180 94.681.20 92.461.07 93.881.37 90.622.48 92.061.64 94.471.77 94.131.92 95.730.88
0vs3vs5 240 94.391.09 91.692.31 92.941.58 91.481.68 91.231.73 92.081.93 92.501.66 95.521.26
300 95.470.94 92.251.60 93.261.44 92.131.09 91.601.71 92.951.12 92.671.46 94.051.46
60 94.252.56 96.481.47 97.251.08 97.141.59 97.471.59 98.172.19 97.602.31 99.200.91
Satimage 90 96.491.49 96.831.18 96.521.32 97.621.52 97.691.16 98.581.12 97.292.08 99.711.06
120 98.031.13 97.381.94 97.121.87 97.121.48 97.151.49 98.451.14 96.851.94 99.521.07
TABLE II:

Comparisons between our model and state-of-the-art methods in terms of accuracy (%) on seven datasets: mean and standard errors averaged over fifty random runs in one-shot scenario. Models with the best performance are bolded.

For a fair comparison with [Hou2018OnePassLW], we adopt the same experimental settings with [Hou2018OnePassLW] in both one-shot and multi-shot scenarios, which are elaborated as follows: 1) The number of samples in each batch is same, i.e., , and the number of samples in each class is equal for all training and test batches; 2) In T-stage, the total number of training samples is fixed and the number of samples in each batch is varied. Companied with this, the number of training and test samples also changes in the last testing phase; 3) We assign the first features as vanished features, the next features as survived features and the rest as new augmented features. The first quarter and the last quarter are corresponding vanished and augmented features in our experiments. 4) All the experimental results are averaged over fifty random repetitions.

V-A2 Competing Methods

We validate the superiority of our model by comparing it with: One-pass Pegasos [Shalev-Shwartz2011] assumes that all vanished features are known in I-stage and all augmented features are known in T-stage; OPMV [Zhu2015OnePassML] regards the vanished and survived features as the first view, the survived and augmented features as the second view; TCA [5640675] assumes the data samples in the T-stage and I-stage come from the source and target domain; BDML [Xu:2018:BDM:3327144.3327333] and OPML [LI2018302] are the metric learning methods, which only utilize the samples with the augmented features, and ignore the previous vanished features; OPID and OPIDe [Hou2018OnePassLW] propose an one-pass incremental and decremental model for feature evolution.

Dataset Ours-woT Ours-woI Ours-woW Ours
500 56.681.74 54.361.61 57.930.85 58.330.76
EV-Action 600 56.231.81 55.701.49 57.701.04 57.940.88
700 57.021.56 55.931.76 57.830.92 58.120.86
Mnist 80 97.851.24 96.701.71 98.900.97 99.070.94
0vs5 160 97.541.46 96.841.85 98.871.06 99.220.61
320 97.233.34 96.880.96 98.950.83 99.270.37
Mnist 120 94.551.48 92.782.11 96.021.85 96.531.49
0vs3vs5 240 93.491.07 92.881.31 94.881.37 95.370.92
480 94.320.81 93.371.13 95.131.22 95.540.87
80 81.583.10 70.834.47 82.453.38 82.653.32
Splice 160 84.072.51 78.873.01 84.872.19 85.252.06
320 84.852.38 81.561.99 85.941.61 86.401.59
100 95.221.30 92.471.68 96.841.40 97.291.25
Gisette 200 94.381.52 92.961.75 96.271.53 96.820.91
300 96.110.95 95.081.19 97.140.87 97.790.46
USPS 120 95.421.82 94.822.02 96.261.33 97.231.64
0vs5 160 96.041.33 94.951.70 97.031.47 98.310.82
240 96.351.06 95.171.16 97.240.96 98.870.74
USPS 180 93.361.77 91.972.00 94.861.17 95.280.96
0vs3vs5 240 93.131.38 92.011.45 94.331.54 94.961.37
300 92.991.35 91.811.67 93.471.83 94.051.46
60 96.501.59 97.431.36 98.311.10 98.970.95
Satimage 90 96.782.72 97.311.10 98.191.16 98.711.13
120 96.221.91 97.231.22 98.021.22 98.531.20
TABLE III: Ablation study of our EML model in one-shot scenario.

V-B Experiments in One-shot Scenario

In this subsection, we present the comprehensive experimental analysis, ablation studies, effects of hyper-parameters and convergence investigations of our EML model in one-shot scenario, followed by computational costs of model optimization.

V-B1 Experimental Analysis

The experimental results for one-shot scenario are shown in Table II. From the presented results, we have the following observations: 1) Although our model cannot access the vanished features in T-stage and the augmented features in I-stage, both transforming and inheriting strategies can efficiently exploit useful information of vanished feature and expand it into augmented features. 2) Our model can be successfully applied into both high-dimensional (e.g., EV-Action and Gisette) and low-dimensional (e.g., Satimage) feature evolution, which are the challenging tasks to explore the data structure with the existing features; 3) When we use the learned distance matrix in T-stage to assist training in I-stage, the testing performance of our model increases significantly, even though the training samples in I-stage are relatively rare, i.e., contains a small number of samples in I-stage. 4) Our model performs better than OPID and OPIDe [Hou2018OnePassLW], since T-stage could explore important information from vanished features, and I-stage efficiently inherits the metric performance from T-stage to take advantage of augmented features.

Fig. 5: Effects of parameters {} when (left), and {} when (right) on the USPS0vs5 dataset.
Fig. 6: The convergence analysis of our model on the USPS (left) and Mnist (right) dataset in one-shot scenario.
Pegasos [Shalev-Shwartz2011] OPMV [Zhu2015OnePassML] TCA [5640675] BDML [Xu:2018:BDM:3327144.3327333] OPML [LI2018302] OPIDe [Hou2018OnePassLW] OPID [Hou2018OnePassLW] Ours
EV-Action () 27.110.04 28.580.04 37.720.09 36.180.18 22.950.07 26.470.09 26.330.14 25.480.06
Mnist0vs5 () 6.180.07 7.450.06 14.960.12 16.270.10 3.840.04 5.130.11 4.950.07 4.680.05
USPS0vs5 (