Liu Yang

is this you? claim profile


  • Learning with Changing Features

    In this paper we study the setting where features are added or change interpretation over time, which has applications in multiple domains such as retail, manufacturing, finance. In particular, we propose an approach to provably determine the time instant from which the new/changed features start becoming relevant with respect to an output variable in an agnostic (supervised) learning setting. We also suggest an efficient version of our approach which has the same asymptotic performance. Moreover, our theory also applies when we have more than one such change point. Independent post analysis of a change point identified by our method for a large retailer revealed that it corresponded in time with certain unflattering news stories about a brand that resulted in the change in customer behavior. We also applied our method to data from an advanced manufacturing plant identifying the time instant from which downstream features became relevant. To the best of our knowledge this is the first work that formally studies change point detection in a distribution independent agnostic setting, where the change point is based on the changing relationship between input and output.

    04/29/2017 ∙ by Amit Dhurandhar, et al. ∙ 0 share

    read it

  • Dynamic matrix factorization with social influence

    Matrix factorization is a key component of collaborative filtering-based recommendation systems because it allows us to complete sparse user-by-item ratings matrices under a low-rank assumption that encodes the belief that similar users give similar ratings and that similar items garner similar ratings. This paradigm has had immeasurable practical success, but it is not the complete story for understanding and inferring the preferences of people. First, peoples' preferences and their observable manifestations as ratings evolve over time along general patterns of trajectories. Second, an individual person's preferences evolve over time through influence of their social connections. In this paper, we develop a unified process model for both types of dynamics within a state space approach, together with an efficient optimization scheme for estimation within that model. The model combines elements from recent developments in dynamic matrix factorization, opinion dynamics and social learning, and trust-based recommendation. The estimation builds upon recent advances in numerical nonlinear optimization. Empirical results on a large-scale data set from the Epinions website demonstrate consistent reduction in root mean squared error by consideration of the two types of dynamics.

    04/21/2016 ∙ by Aleksandr Y. Aravkin, et al. ∙ 0 share

    read it

  • Statistical Learning under Nonstationary Mixing Processes

    We study a special case of the problem of statistical learning without the i.i.d. assumption. Specifically, we suppose a learning method is presented with a sequence of data points, and required to make a prediction (e.g., a classification) for each one, and can then observe the loss incurred by this prediction. We go beyond traditional analyses, which have focused on stationary mixing processes or nonstationary product processes, by combining these two relaxations to allow nonstationary mixing processes. We are particularly interested in the case of β-mixing processes, with the sum of changes in marginal distributions growing sublinearly in the number of samples. Under these conditions, we propose a learning method, and establish that for bounded VC subgraph classes, the cumulative excess risk grows sublinearly in the number of predictions, at a quantified rate.

    12/26/2015 ∙ by Steve Hanneke, et al. ∙ 0 share

    read it

  • Minimax Analysis of Active Learning

    This work establishes distribution-free upper and lower bounds on the minimax label complexity of active learning with general hypothesis classes, under various noise models. The results reveal a number of surprising facts. In particular, under the noise model of Tsybakov (2004), the minimax label complexity of active learning with a VC class is always asymptotically smaller than that of passive learning, and is typically significantly smaller than the best previously-published upper bounds in the active learning literature. In high-noise regimes, it turns out that all active learning problems of a given VC dimension have roughly the same minimax label complexity, which contrasts with well-known results for bounded noise. In low-noise regimes, we find that the label complexity is well-characterized by a simple combinatorial complexity measure we call the star number. Interestingly, we find that almost all of the complexity measures previously explored in the active learning literature have worst-case values exactly equal to the star number. We also propose new active learning strategies that nearly achieve these minimax label complexities.

    10/03/2014 ∙ by Steve Hanneke, et al. ∙ 0 share

    read it

  • Surrogate Losses in Passive and Active Learning

    Active learning is a type of sequential design for supervised machine learning, in which the learning algorithm sequentially requests the labels of selected instances from a large pool of unlabeled data points. The objective is to produce a classifier of relatively low risk, as measured under the 0-1 loss, ideally using fewer label requests than the number of random labeled data points sufficient to achieve the same. This work investigates the potential uses of surrogate loss functions in the context of active learning. Specifically, it presents an active learning algorithm based on an arbitrary classification-calibrated surrogate loss function, along with an analysis of the number of label requests sufficient for the classifier returned by the algorithm to achieve a given risk under the 0-1 loss. Interestingly, these results cannot be obtained by simply optimizing the surrogate risk via active learning to an extent sufficient to provide a guarantee on the 0-1 loss, as is common practice in the analysis of surrogate losses for passive learning. Some of the results have additional implications for the use of surrogate losses in passive learning.

    07/16/2012 ∙ by Steve Hanneke, et al. ∙ 0 share

    read it

  • Bayesian Active Distance Metric Learning

    Distance metric learning is an important component for many tasks, such as statistical classification and content-based image retrieval. Existing approaches for learning distance metrics from pairwise constraints typically suffer from two major problems. First, most algorithms only offer point estimation of the distance metric and can therefore be unreliable when the number of training examples is small. Second, since these algorithms generally select their training examples at random, they can be inefficient if labeling effort is limited. This paper presents a Bayesian framework for distance metric learning that estimates a posterior distribution for the distance metric from labeled pairwise constraints. We describe an efficient algorithm based on the variational method for the proposed Bayesian approach. Furthermore, we apply the proposed Bayesian framework to active distance metric learning by selecting those unlabeled example pairs with the greatest uncertainty in relative distance. Experiments in classification demonstrate that the proposed framework achieves higher classification accuracy and identifies more informative training examples than the non-Bayesian approach and state-of-the-art distance metric learning algorithms.

    06/20/2012 ∙ by Liu Yang, et al. ∙ 0 share

    read it

  • A Note on Property Testing Sum of Squares and Multivariate Polynomial Interpolation

    In this paper, we investigate property testing whether or not a degree d multivariate poly- nomial is a sum of squares or is far from a sum of squares. We show that if we require that the property tester always accepts YES instances and uses random samples, n^Ω(d) samples are required, which is not much fewer than it would take to completely determine the polynomial. To prove this lower bound, we show that with high probability, multivariate polynomial in- terpolation matches arbitrary values on random points and the resulting polynomial has small norm. We then consider a particular polynomial which is non-negative yet not a sum of squares and use pseudo-expectation values to prove it is far from being a sum of squares.

    09/10/2017 ∙ by Aaron Potechin, et al. ∙ 0 share

    read it

  • A Prediction Model of the Project Life-span in Open Source Software Ecosystem

    In nature ecosystems, animal life-spans are determined by genes and some other biological characteristics. Similarly, the software project life-spans are related to some internal or external characteristics. Analyzing the relations between these characteristics and the project life-span, may help developers, investors, and contributors to control the development cycle of the software project. The paper provides an insight on the project life-span for a free open source software ecosystem. The statistical analysis of some project characteristics in GitHub is presented, and we find that the choices of programming languages, the number of files, the label format of the project, and the relevant membership expressions can impact the life-span of a project. Based on these discovered characteristics, we also propose a prediction model to estimate the project life-span in open source software ecosystems. These results may help developers reschedule the project in open source software ecosystem.

    10/31/2017 ∙ by Zhifang Liao, et al. ∙ 0 share

    read it

  • Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems

    Intelligent personal assistant systems with either text-based or voice-based conversational interfaces are becoming increasingly popular around the world. Retrieval-based conversation models have the advantages of returning fluent and informative responses. Most existing studies in this area are on open domain "chit-chat" conversations or task / transaction oriented conversations. More research is needed for information-seeking conversations. There is also a lack of modeling external knowledge beyond the dialog utterances among current conversational models. In this paper, we propose a learning framework on the top of deep neural matching networks that leverages external knowledge for response ranking in information-seeking conversation systems. We incorporate external knowledge into deep neural models with pseudo-relevance feedback and QA correspondence knowledge distillation. Extensive experiments with three information-seeking conversation data sets including both open benchmarks and commercial data show that, our methods outperform various baseline methods including several deep text matching models and the state-of-the-art method on response selection in multi-turn conversations. We also perform analysis over different response types, model variations and ranking examples. Our models and research findings provide new insights on how to utilize external knowledge with deep neural models for response selection and have implications for the design of the next generation of information-seeking conversation systems.

    05/01/2018 ∙ by Liu Yang, et al. ∙ 0 share

    read it

  • aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model

    As an alternative to question answering methods based on feature engineering, deep learning approaches such as convolutional neural networks (CNNs) and Long Short-Term Memory Models (LSTMs) have recently been proposed for semantic matching of questions and answers. To achieve good results, however, these models have been combined with additional features such as word overlap or BM25 scores. Without this combination, these models perform significantly worse than methods based on linguistic feature engineering. In this paper, we propose an attention based neural matching model for ranking short answer text. We adopt value-shared weighting scheme instead of position-shared weighting scheme for combining different matching signals and incorporate question term importance learning using question attention network. Using the popular benchmark TREC QA data, we show that the relatively simple aNMM model can significantly outperform other neural network models that have been used for the question answering task, and is competitive with models that are combined with additional features. When aNMM is combined with additional features, it outperforms all baselines.

    01/05/2018 ∙ by Liu Yang, et al. ∙ 0 share

    read it

  • Neural-net-induced Gaussian process regression for function approximation and PDE solution

    Neural-net-induced Gaussian process (NNGP) regression inherits both the high expressivity of deep neural networks (deep NNs) as well as the uncertainty quantification property of Gaussian processes (GPs). We generalize the current NNGP to first include a larger number of hyperparameters and subsequently train the model by maximum likelihood estimation. Unlike previous works on NNGP that targeted classification, here we apply the generalized NNGP to function approximation and to solving partial differential equations (PDEs). Specifically, we develop an analytical iteration formula to compute the covariance function of GP induced by deep NN with an error-function nonlinearity. We compare the performance of the generalized NNGP for function approximations and PDE solutions with those of GPs and fully-connected NNs. We observe that for smooth functions the generalized NNGP can yield the same order of accuracy with GP, while both NNGP and GP outperform deep NN. For non-smooth functions, the generalized NNGP is superior to GP and comparable or superior to deep NN.

    06/22/2018 ∙ by Guofei Pang, et al. ∙ 0 share

    read it