
Stochastic Variational Inference for Hidden Markov Models
Variational inference algorithms have proven successful for Bayesian analysis in large data settings, with recent advances using stochastic variational inference (SVI). However, such methods have largely been studied in independent or exchangeable data settings. We develop an SVI algorithm to learn the parameters of hidden Markov models (HMMs) in a timedependent data setting. The challenge in applying stochastic optimization in this setting arises from dependencies in the chain, which must be broken to consider minibatches of observations. We propose an algorithm that harnesses the memory decay of the chain to adaptively bound errors arising from edge effects. We demonstrate the effectiveness of our algorithm on synthetic experiments and a large genomics dataset where a batch algorithm is computationally infeasible.
11/06/2014 ∙ by Nicholas J. Foti, et al. ∙ 0 ∙ shareread it

Generalized Linear Model Regression under Distancetoset Penalties
Estimation in generalized linear models (GLM) is complicated by the presence of constraints. One can handle constraints by maximizing a penalized loglikelihood. Penalties such as the lasso are effective in high dimensions, but often lead to unwanted shrinkage. This paper explores instead penalizing the squared distance to constraint sets. Distance penalties are more flexible than algebraic and regularization penalties, and avoid the drawback of shrinkage. To optimize distance penalized objectives, we make use of the majorizationminimization principle. Resulting algorithms constructed within this framework are amenable to acceleration and come with global convergence guarantees. Applications to shape constraints, sparse regression, and rankrestricted matrix regression on synthetic and real data showcase strong empirical performance, even under nonconvex constraints.
11/03/2017 ∙ by Jason Xu, et al. ∙ 0 ∙ shareread it

Combination of Hyperband and Bayesian Optimization for Hyperparameter Optimization in Deep Learning
Deep learning has achieved impressive results on many problems. However, it requires high degree of expertise or a lot of experience to tune well the hyperparameters, and such manual tuning process is likely to be biased. Moreover, it is not practical to try out as many different hyperparameter configurations in deep learning as in other machine learning scenarios, because evaluating each single hyperparameter configuration in deep learning would mean training a deep neural network, which usually takes quite long time. Hyperband algorithm achieves stateoftheart performance on various hyperparameter optimization problems in the field of deep learning. However, Hyperband algorithm does not utilize history information of previous explored hyperparameter configurations, thus the solution found is suboptimal. We propose to combine Hyperband algorithm with Bayesian optimization (which does not ignore history when sampling next trial configuration). Experimental results show that our combination approach is superior to other hyperparameter optimization approaches including Hyperband algorithm.
01/05/2018 ∙ by Jiazhuo Wang, et al. ∙ 0 ∙ shareread it

Automatic Conflict Detection in Police BodyWorn Video
Automatic conflict detection has grown in relevance with the advent of bodyworn technology, but existing metrics such as turntaking and overlap are poor indicators of conflict in policepublic interactions. Moreover, standard techniques to compute them fall short when applied to such diversified and noisy contexts. We develop a pipeline catered to this task combining adaptive noise removal, nonspeech filtering and new measures of conflict based on the repetition and intensity of phrases in speech. We demonstrate the effectiveness of our approach on bodyworn audio data collected by the Los Angeles Police Department.
11/14/2017 ∙ by Alistair Letcher, et al. ∙ 0 ∙ shareread it

Structural Risk Minimization for C^1,1(R^d) Regression
One means of fitting functions to highdimensional data is by providing smoothness constraints. Recently, the following smooth function approximation problem was proposed by herbert2014computing: given a finite set E ⊂R^d and a function f: E →R, interpolate the given information with a function f∈Ċ^1, 1(R^d) (the class of firstorder differentiable functions with Lipschitz gradients) such that f(a) = f(a) for all a ∈ E, and the value of Lip(∇f) is minimal. An algorithm is provided that constructs such an approximating function f and estimates the optimal Lipschitz constant Lip(∇f) in the noiseless setting. We address statistical aspects of reconstructing the approximating function f from a closelyrelated class C^1, 1(R^d) given samples from noisy data. We observe independent and identically distributed samples y(a) = f(a) + ξ(a) for a ∈ E, where ξ(a) is a noise term and the set E ⊂R^d is fixed and known. We obtain uniform bounds relating the empirical risk and true risk over the class F_M = {f ∈ C^1, 1(R^d) Lip(∇ f) ≤M}, where the quantity M grows with the number of samples at a rate governed by the metric entropy of the class C^1, 1(R^d). Finally, we provide an implementation using Vaidya's algorithm, supporting our results via numerical experiments on simulated data.
03/29/2018 ∙ by Adam Gustafson, et al. ∙ 0 ∙ shareread it

Automatic Conflict Detection in Police BodyWorn Audio
Automatic conflict detection has grown in relevance with the advent of bodyworn technology, but existing metrics such as turntaking and overlap are poor indicators of conflict in policepublic interactions. Moreover, standard techniques to compute them fall short when applied to such diversified and noisy contexts. We develop a pipeline catered to this task combining adaptive noise removal, nonspeech filtering and new measures of conflict based on the repetition and intensity of phrases in speech. We demonstrate the effectiveness of our approach on bodyworn audio data collected by the Los Angeles Police Department.
11/14/2017 ∙ by Alistair Letcher, et al. ∙ 0 ∙ shareread it
Jason Xu
is this you? claim profile