
Data Cleansing for Models Trained with SGD
Data cleansing is a typical approach used to improve the accuracy of machine learning models, which, however, requires extensive domain knowledge to identify the influential instances that affect the models. In this paper, we propose an algorithm that can suggest influential instances without using any domain knowledge. With the proposed method, users only need to inspect the instances suggested by the algorithm, implying that users do not need extensive knowledge for this procedure, which enables even nonexperts to conduct data cleansing and improve the model. The existing methods require the loss function to be convex and an optimal model to be obtained, which is not always the case in modern machine learning. To overcome these limitations, we propose a novel approach specifically designed for the models trained with stochastic gradient descent (SGD). The proposed method infers the influential instances by retracing the steps of the SGD while incorporating intermediate models computed in each step. Through experiments, we demonstrate that the proposed method can accurately infer the influential instances. Moreover, we used MNIST and CIFAR10 to show that the models can be effectively improved by removing the influential instances suggested by the proposed method.
06/20/2019 ∙ by Satoshi Hara, et al. ∙ 3 ∙ shareread it

On Tensor Train Rank Minimization: Statistical Efficiency and Scalable Algorithm
Tensor train (TT) decomposition provides a spaceefficient representation for higherorder tensors. Despite its advantage, we face two crucial limitations when we apply the TT decomposition to machine learning problems: the lack of statistical theory and of scalable algorithms. In this paper, we address the limitations. First, we introduce a convex relaxation of the TT decomposition problem and derive its error bound for the tensor completion task. Next, we develop an alternating optimization method with a randomization technique, in which the time complexity is as efficient as the space complexity is. In experiments, we numerically confirm the derived bounds and empirically demonstrate the performance of our method with a real higherorder tensor.
08/01/2017 ∙ by Masaaki Imaizumi, et al. ∙ 0 ∙ shareread it

Finding Alternate Features in Lasso
We propose a method for finding alternate features missing in the Lasso optimal solution. In ordinary Lasso problem, one global optimum is obtained and the resulting features are interpreted as taskrelevant features. However, this can overlook possibly relevant features not selected by the Lasso. With the proposed method, we can provide not only the Lasso optimal solution but also possible alternate features to the Lasso solution. We show that such alternate features can be computed efficiently by avoiding redundant computations. We also demonstrate how the proposed method works in the 20 newsgroup data, which shows that reasonable features are found as alternate features.
11/18/2016 ∙ by Satoshi Hara, et al. ∙ 0 ∙ shareread it

Joint Word Representation Learning using a Corpus and a Semantic Lexicon
Methods for learning word representations using large text corpora have received much attention lately due to their impressive performance in numerous natural language processing (NLP) tasks such as, semantic similarity measurement, and word analogy detection. Despite their success, these datadriven word representation learning methods do not consider the rich semantic relational structure between words in a cooccurring context. On the other hand, already much manual effort has gone into the construction of semantic lexicons such as the WordNet that represent the meanings of words by defining the various relationships that exist among the words in a language. We consider the question, can we improve the word representations learnt using a corpora by integrating the knowledge from semantic lexicons?. For this purpose, we propose a joint word representation learning method that simultaneously predicts the cooccurrences of two words in a sentence subject to the relational constrains given by the semantic lexicon. We use relations that exist between words in the lexicon to regularize the word representations learnt from the corpus. Our proposed method statistically significantly outperforms previously proposed methods for incorporating semantic lexicons into word representations on several benchmark datasets for semantic similarity and word analogy.
11/19/2015 ∙ by Danushka Bollegala, et al. ∙ 0 ∙ shareread it

Embedding Semantic Relations into Word Representations
Learning representations for semantic relations is important for various tasks such as analogy detection, relational search, and relation classification. Although there have been several proposals for learning representations for individual words, learning word representations that explicitly capture the semantic relations between words remains under developed. We propose an unsupervised method for learning vector representations for words such that the learnt representations are sensitive to the semantic relations that exist between two words. First, we extract lexical patterns from the cooccurrence contexts of two words in a corpus to represent the semantic relations that exist between those two words. Second, we represent a lexical pattern as the weighted sum of the representations of the words that cooccur with that lexical pattern. Third, we train a binary classifier to detect relationally similar vs. nonsimilar lexical pattern pairs. The proposed method is unsupervised in the sense that the lexical pattern pairs we use as train data are automatically sampled from a corpus, without requiring any manual intervention. Our proposed method statistically significantly outperforms the current stateoftheart word representations on three benchmark datasets for proportional analogy detection, demonstrating its ability to accurately capture the semantic relations among words.
05/01/2015 ∙ by Danushka Bollegala, et al. ∙ 0 ∙ shareread it

Learning Word Representations from Relational Graphs
Attributes of words and relations between two words are central to numerous tasks in Artificial Intelligence such as knowledge representation, similarity measurement, and analogy detection. Often when two words share one or more attributes in common, they are connected by some semantic relations. On the other hand, if there are numerous semantic relations between two words, we can expect some of the attributes of one of the words to be inherited by the other. Motivated by this close connection between attributes and relations, given a relational graph in which words are inter connected via numerous semantic relations, we propose a method to learn a latent representation for the individual words. The proposed method considers not only the cooccurrences of words as done by existing approaches for word representation learning, but also the semantic relations in which two words cooccur. To evaluate the accuracy of the word representations learnt using the proposed method, we use the learnt word representations to solve semantic word analogy problems. Our experimental results show that it is possible to learn better word representations by using semantic semantics between words.
12/07/2014 ∙ by Danushka Bollegala, et al. ∙ 0 ∙ shareread it

ClassiNet  Predicting Missing Features for ShortText Classification
The fundamental problem in shorttext classification is feature sparseness  the lack of feature overlap between a trained model and a test instance to be classified. We propose ClassiNet  a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex v_i in the ClassiNet where a onetoone correspondence exists between feature predictors and vertices. The weight of the directed edge e_ij connecting a vertex v_i to a vertex v_j represents the conditional probability that given v_i exists in an instance, v_j also exists in the same instance. We show that ClassiNets generalize word cooccurrence graphs by considering implicit cooccurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance x⃗, we find similar features from ClassiNet that did not appear in x⃗, and append those features in the representation of x⃗. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given shorttext. We evaluate ClassiNets on several benchmark datasets for shorttext classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in shorttext classification tasks, without having to use any external resources such as thesauri for finding related features.
04/14/2018 ∙ by Danushka Bollegala, et al. ∙ 0 ∙ shareread it

Neural Photometric Stereo Reconstruction for General Reflectance Surfaces
We present a novel convolutional neural network architecture for photometric stereo (Woodham, 1980), a problem of recovering 3D object surface normals from multiple images observed under varying illuminations. Despite its long history in computer vision, the problem still shows fundamental challenges for surfaces with unknown general reflectance properties (BRDFs). Leveraging deep neural networks to learn complicated reflectance models is promising, but studies in this direction are very limited due to difficulties in acquiring accurate ground truth for training and also in designing networks invariant to permutation of input images. In order to address these challenges, we propose a reconstruction based unsupervised learning framework where surface normals and BRDFs are predicted by the network and fed into the rendering equation to synthesize observed images. The network is trained during testing by minimizing reconstruction loss between observed and synthesized images. Thus, our learning process does not require ground truth normals or even pretraining on external images. Our method is shown to achieve the stateoftheart performance on a challenging realworld scene benchmark.
02/28/2018 ∙ by Tatsunori Taniai, et al. ∙ 0 ∙ shareread it

Maximally Invariant Data Perturbation as Explanation
While several feature scoring methods are proposed to explain the output of complex machine learning models, most of them lack formal mathematical definitions. In this study, we propose a novel definition of the feature score using the maximally invariant data perturbation, which is inspired from the idea of adversarial example. In adversarial example, one seeks the smallest data perturbation that changes the model's output. In our proposed approach, we consider the opposite: we seek the maximally invariant data perturbation that does not change the model's output. In this way, we can identify important input features as the ones with small allowable data perturbations. To find the maximally invariant data perturbation, we formulate the problem as linear programming. The experiment on the image classification with VGG16 shows that the proposed method could identify relevant parts of the images effectively.
06/19/2018 ∙ by Satoshi Hara, et al. ∙ 0 ∙ shareread it

Subspace Selection via DRSubmodular Maximization on Lattices
The subspace selection problem seeks a subspace that maximizes an objective function under some constraint. This problem includes several important machine learning problems such as the principal component analysis and sparse dictionary selection problem. Often, these problems can be solved by greedy algorithms. Here, we are interested in why these problems can be solved by greedy algorithms, and what classes of objective functions and constraints admit this property. To answer this question, we formulate the problems as optimization problems on lattices. Then, we introduce a new class of functions, directional DRsubmodular functions, to characterize the approximability of problems. We see that the principal component analysis, sparse dictionary selection problem, and these generalizations have directional DRsubmodularities. We show that, under several constraints, the directional DRsubmodular function maximization problem can be solved efficiently with provable approximation factors.
05/18/2018 ∙ by So Nakashima, et al. ∙ 0 ∙ shareread it

Algorithmic MetaTheorems for Monotone Submodular Maximization
We consider a monotone submodular maximization problem whose constraint is described by a logic formula on a graph. Formally, we prove the following three `algorithmic metatheorems.' (1) If the constraint is specified by a monadic secondorder logic on a graph of bounded treewidth, the problem is solved in n^O(1) time with an approximation factor of O( n). (2) If the constraint is specified by a firstorder logic on a graph of low degree, the problem is solved in O(n^1 + ϵ) time for any ϵ > 0 with an approximation factor of 2. (3) If the constraint is specified by a firstorder logic on a graph of bounded expansion, the problem is solved in n^O( k) time with an approximation factor of O( k), where k is the number of variables and O(·) suppresses only constants independent of k.
07/12/2018 ∙ by Masakazu Ishihata, et al. ∙ 0 ∙ shareread it
Takanori Maehara
is this you? claim profile