
Learning with Changing Features
In this paper we study the setting where features are added or change interpretation over time, which has applications in multiple domains such as retail, manufacturing, finance. In particular, we propose an approach to provably determine the time instant from which the new/changed features start becoming relevant with respect to an output variable in an agnostic (supervised) learning setting. We also suggest an efficient version of our approach which has the same asymptotic performance. Moreover, our theory also applies when we have more than one such change point. Independent post analysis of a change point identified by our method for a large retailer revealed that it corresponded in time with certain unflattering news stories about a brand that resulted in the change in customer behavior. We also applied our method to data from an advanced manufacturing plant identifying the time instant from which downstream features became relevant. To the best of our knowledge this is the first work that formally studies change point detection in a distribution independent agnostic setting, where the change point is based on the changing relationship between input and output.
04/29/2017 ∙ by Amit Dhurandhar, et al. ∙ 0 ∙ shareread it

Dynamic matrix factorization with social influence
Matrix factorization is a key component of collaborative filteringbased recommendation systems because it allows us to complete sparse userbyitem ratings matrices under a lowrank assumption that encodes the belief that similar users give similar ratings and that similar items garner similar ratings. This paradigm has had immeasurable practical success, but it is not the complete story for understanding and inferring the preferences of people. First, peoples' preferences and their observable manifestations as ratings evolve over time along general patterns of trajectories. Second, an individual person's preferences evolve over time through influence of their social connections. In this paper, we develop a unified process model for both types of dynamics within a state space approach, together with an efficient optimization scheme for estimation within that model. The model combines elements from recent developments in dynamic matrix factorization, opinion dynamics and social learning, and trustbased recommendation. The estimation builds upon recent advances in numerical nonlinear optimization. Empirical results on a largescale data set from the Epinions website demonstrate consistent reduction in root mean squared error by consideration of the two types of dynamics.
04/21/2016 ∙ by Aleksandr Y. Aravkin, et al. ∙ 0 ∙ shareread it

Statistical Learning under Nonstationary Mixing Processes
We study a special case of the problem of statistical learning without the i.i.d. assumption. Specifically, we suppose a learning method is presented with a sequence of data points, and required to make a prediction (e.g., a classification) for each one, and can then observe the loss incurred by this prediction. We go beyond traditional analyses, which have focused on stationary mixing processes or nonstationary product processes, by combining these two relaxations to allow nonstationary mixing processes. We are particularly interested in the case of βmixing processes, with the sum of changes in marginal distributions growing sublinearly in the number of samples. Under these conditions, we propose a learning method, and establish that for bounded VC subgraph classes, the cumulative excess risk grows sublinearly in the number of predictions, at a quantified rate.
12/26/2015 ∙ by Steve Hanneke, et al. ∙ 0 ∙ shareread it

Minimax Analysis of Active Learning
This work establishes distributionfree upper and lower bounds on the minimax label complexity of active learning with general hypothesis classes, under various noise models. The results reveal a number of surprising facts. In particular, under the noise model of Tsybakov (2004), the minimax label complexity of active learning with a VC class is always asymptotically smaller than that of passive learning, and is typically significantly smaller than the best previouslypublished upper bounds in the active learning literature. In highnoise regimes, it turns out that all active learning problems of a given VC dimension have roughly the same minimax label complexity, which contrasts with wellknown results for bounded noise. In lownoise regimes, we find that the label complexity is wellcharacterized by a simple combinatorial complexity measure we call the star number. Interestingly, we find that almost all of the complexity measures previously explored in the active learning literature have worstcase values exactly equal to the star number. We also propose new active learning strategies that nearly achieve these minimax label complexities.
10/03/2014 ∙ by Steve Hanneke, et al. ∙ 0 ∙ shareread it

Surrogate Losses in Passive and Active Learning
Active learning is a type of sequential design for supervised machine learning, in which the learning algorithm sequentially requests the labels of selected instances from a large pool of unlabeled data points. The objective is to produce a classifier of relatively low risk, as measured under the 01 loss, ideally using fewer label requests than the number of random labeled data points sufficient to achieve the same. This work investigates the potential uses of surrogate loss functions in the context of active learning. Specifically, it presents an active learning algorithm based on an arbitrary classificationcalibrated surrogate loss function, along with an analysis of the number of label requests sufficient for the classifier returned by the algorithm to achieve a given risk under the 01 loss. Interestingly, these results cannot be obtained by simply optimizing the surrogate risk via active learning to an extent sufficient to provide a guarantee on the 01 loss, as is common practice in the analysis of surrogate losses for passive learning. Some of the results have additional implications for the use of surrogate losses in passive learning.
07/16/2012 ∙ by Steve Hanneke, et al. ∙ 0 ∙ shareread it

Bayesian Active Distance Metric Learning
Distance metric learning is an important component for many tasks, such as statistical classification and contentbased image retrieval. Existing approaches for learning distance metrics from pairwise constraints typically suffer from two major problems. First, most algorithms only offer point estimation of the distance metric and can therefore be unreliable when the number of training examples is small. Second, since these algorithms generally select their training examples at random, they can be inefficient if labeling effort is limited. This paper presents a Bayesian framework for distance metric learning that estimates a posterior distribution for the distance metric from labeled pairwise constraints. We describe an efficient algorithm based on the variational method for the proposed Bayesian approach. Furthermore, we apply the proposed Bayesian framework to active distance metric learning by selecting those unlabeled example pairs with the greatest uncertainty in relative distance. Experiments in classification demonstrate that the proposed framework achieves higher classification accuracy and identifies more informative training examples than the nonBayesian approach and stateoftheart distance metric learning algorithms.
06/20/2012 ∙ by Liu Yang, et al. ∙ 0 ∙ shareread it

A Note on Property Testing Sum of Squares and Multivariate Polynomial Interpolation
In this paper, we investigate property testing whether or not a degree d multivariate poly nomial is a sum of squares or is far from a sum of squares. We show that if we require that the property tester always accepts YES instances and uses random samples, n^Ω(d) samples are required, which is not much fewer than it would take to completely determine the polynomial. To prove this lower bound, we show that with high probability, multivariate polynomial in terpolation matches arbitrary values on random points and the resulting polynomial has small norm. We then consider a particular polynomial which is nonnegative yet not a sum of squares and use pseudoexpectation values to prove it is far from being a sum of squares.
09/10/2017 ∙ by Aaron Potechin, et al. ∙ 0 ∙ shareread it

A Prediction Model of the Project Lifespan in Open Source Software Ecosystem
In nature ecosystems, animal lifespans are determined by genes and some other biological characteristics. Similarly, the software project lifespans are related to some internal or external characteristics. Analyzing the relations between these characteristics and the project lifespan, may help developers, investors, and contributors to control the development cycle of the software project. The paper provides an insight on the project lifespan for a free open source software ecosystem. The statistical analysis of some project characteristics in GitHub is presented, and we find that the choices of programming languages, the number of files, the label format of the project, and the relevant membership expressions can impact the lifespan of a project. Based on these discovered characteristics, we also propose a prediction model to estimate the project lifespan in open source software ecosystems. These results may help developers reschedule the project in open source software ecosystem.
10/31/2017 ∙ by Zhifang Liao, et al. ∙ 0 ∙ shareread it

Response Ranking with Deep Matching Networks and External Knowledge in Informationseeking Conversation Systems
Intelligent personal assistant systems with either textbased or voicebased conversational interfaces are becoming increasingly popular around the world. Retrievalbased conversation models have the advantages of returning fluent and informative responses. Most existing studies in this area are on open domain "chitchat" conversations or task / transaction oriented conversations. More research is needed for informationseeking conversations. There is also a lack of modeling external knowledge beyond the dialog utterances among current conversational models. In this paper, we propose a learning framework on the top of deep neural matching networks that leverages external knowledge for response ranking in informationseeking conversation systems. We incorporate external knowledge into deep neural models with pseudorelevance feedback and QA correspondence knowledge distillation. Extensive experiments with three informationseeking conversation data sets including both open benchmarks and commercial data show that, our methods outperform various baseline methods including several deep text matching models and the stateoftheart method on response selection in multiturn conversations. We also perform analysis over different response types, model variations and ranking examples. Our models and research findings provide new insights on how to utilize external knowledge with deep neural models for response selection and have implications for the design of the next generation of informationseeking conversation systems.
05/01/2018 ∙ by Liu Yang, et al. ∙ 0 ∙ shareread it

aNMM: Ranking Short Answer Texts with AttentionBased Neural Matching Model
As an alternative to question answering methods based on feature engineering, deep learning approaches such as convolutional neural networks (CNNs) and Long ShortTerm Memory Models (LSTMs) have recently been proposed for semantic matching of questions and answers. To achieve good results, however, these models have been combined with additional features such as word overlap or BM25 scores. Without this combination, these models perform significantly worse than methods based on linguistic feature engineering. In this paper, we propose an attention based neural matching model for ranking short answer text. We adopt valueshared weighting scheme instead of positionshared weighting scheme for combining different matching signals and incorporate question term importance learning using question attention network. Using the popular benchmark TREC QA data, we show that the relatively simple aNMM model can significantly outperform other neural network models that have been used for the question answering task, and is competitive with models that are combined with additional features. When aNMM is combined with additional features, it outperforms all baselines.
01/05/2018 ∙ by Liu Yang, et al. ∙ 0 ∙ shareread it

Neuralnetinduced Gaussian process regression for function approximation and PDE solution
Neuralnetinduced Gaussian process (NNGP) regression inherits both the high expressivity of deep neural networks (deep NNs) as well as the uncertainty quantification property of Gaussian processes (GPs). We generalize the current NNGP to first include a larger number of hyperparameters and subsequently train the model by maximum likelihood estimation. Unlike previous works on NNGP that targeted classification, here we apply the generalized NNGP to function approximation and to solving partial differential equations (PDEs). Specifically, we develop an analytical iteration formula to compute the covariance function of GP induced by deep NN with an errorfunction nonlinearity. We compare the performance of the generalized NNGP for function approximations and PDE solutions with those of GPs and fullyconnected NNs. We observe that for smooth functions the generalized NNGP can yield the same order of accuracy with GP, while both NNGP and GP outperform deep NN. For nonsmooth functions, the generalized NNGP is superior to GP and comparable or superior to deep NN.
06/22/2018 ∙ by Guofei Pang, et al. ∙ 0 ∙ shareread it
Liu Yang
is this you? claim profile