Alexander Jung

is this you? claim profile


Assistant Professor of Computer Science at Aalto University

  • Learning Networked Exponential Families with Network Lasso

    The data arising in many important big-data applications, ranging from social networks to network medicine, consist of high-dimensional data points related by an intrinsic (complex) network structure. In order to jointly leverage the information conveyed in the network structure as well as the statistical power contained in high-dimensional data points, we propose networked exponential families. We apply the network Lasso to learn networked exponential families as a probabilistic model for heterogeneous datasets with intrinsic network structure. In order to allow for accurate learning from high-dimensional data we borrow statistical strength, via the intrinsic network structure, across the dataset. The resulting method aims at regularized empirical risk minimization using the total variation of the model parameters as regularizer. This minimization problem is a non-smooth convex optimization problem which we solve using a primal-dual splitting method. This method is appealing for big data applications as it can be implemented as a highly scalable message passing algorithm.

    05/22/2019 ∙ by Alexander Jung, et al. ∙ 22 share

    read it

  • Semi-supervised Learning in Network-Structured Data via Total Variation Minimization

    We propose and analyze a method for semi-supervised learning from partially-labeled network-structured data. Our approach is based on a graph signal recovery interpretation under a clustering hypothesis that labels of data points belonging to the same well-connected subset (cluster) are similar valued. This lends naturally to learning the labels by total variation (TV) minimization, which we solve by applying a recently proposed primal-dual method for non-smooth convex optimization. The resulting algorithm allows for a highly scalable implementation using message passing over the underlying empirical graph, which renders the algorithm suitable for big data applications. By applying tools of compressed sensing, we derive a sufficient condition on the underlying network structure such that TV minimization recovers clusters in the empirical graph of the data. In particular, we show that the proposed primal-dual method amounts to maximizing network flows over the empirical graph of the dataset. Moreover, the learning accuracy of the proposed algorithm is linked to the set of network flows between data points having known labels. The effectiveness and scalability of our approach is verified by numerical experiments.

    01/28/2019 ∙ by Alexander Jung, et al. ∙ 18 share

    read it

  • Localized Linear Regression in Networked Data

    The network Lasso (nLasso) has been proposed recently as an efficient learning algorithm for massive networked data sets (big data over networks). It extends the well-known least least absolute shrinkage and selection operator (Lasso) from learning sparse (generalized) linear models to network models. Efficient implementations of the nLasso have been obtained using convex optimization methods. These implementations naturally lend to highly scalable message passing methods. In this paper, we analyze the performance of nLasso when applied to localized linear regression problems involving networked data. Our main result is a sufficient conditions on the network structure and available label information such that nLasso accurately learns a localized linear regression model from few labeled data points.

    03/26/2019 ∙ by Alexander Jung, et al. ∙ 14 share

    read it

  • Classifying Partially Labeled Networked Data via Logistic Network Lasso

    We apply the network Lasso to classify partially labeled data points which are characterized by high-dimensional feature vectors. In order to learn an accurate classifier from limited amounts of labeled data, we borrow statistical strength, via an intrinsic network structure, across the dataset. The resulting logistic network Lasso amounts to a regularized empirical risk minimization problem using the total variation of a classifier as a regularizer. This minimization problem is a non-smooth convex optimization problem which we solve using a primal-dual splitting method. This method is appealing for big data applications as it can be implemented as a highly scalable message passing algorithm.

    03/26/2019 ∙ by Nguyen Tran, et al. ∙ 14 share

    read it

  • Analysis of Network Lasso For Semi-Supervised Regression

    We characterize the statistical properties of network Lasso for semi-supervised regression problems involving network- structured data. This characterization is based on the con- nectivity properties of the empirical graph which encodes the similarities between individual data points. Loosely speaking, network Lasso is accurate if the available label informa- tion is well connected with the boundaries between clusters of the network-structure datasets. We make this property precise using the notion of network flows. In particular, the existence of a sufficiently large network flow over the empirical graph implies a network compatibility condition which, in turn, en- sures accuracy of network Lasso.

    08/22/2018 ∙ by Alexander Jung, et al. ∙ 12 share

    read it

  • Classifying Process Instances Using Recurrent Neural Networks

    Process Mining consists of techniques where logs created by operative systems are transformed into process models. In process mining tools it is often desired to be able to classify ongoing process instances, e.g., to predict how long the process will still require to complete, or to classify process instances to different classes based only on the activities that have occurred in the process instance thus far. Recurrent neural networks and its subclasses, such as Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM), have been demonstrated to be able to learn relevant temporal features for subsequent classification tasks. In this paper we apply recurrent neural networks to classifying process instances. The proposed model is trained in a supervised fashion using labeled process instances extracted from event log traces. This is the first time we know of GRU having been used in classifying business process instances. Our main experimental results shows that GRU outperforms LSTM remarkably in training time while giving almost identical accuracies to LSTM models. Additional contributions of our paper are improving the classification model training time by filtering infrequent activities, which is a technique commonly used, e.g., in Natural Language Processing (NLP).

    09/16/2018 ∙ by Markku Hinkka, et al. ∙ 6 share

    read it

  • On The Complexity of Sparse Label Propagation

    This paper investigates the computational complexity of sparse label propagation which has been proposed recently for processing network structured data. Sparse label propagation amounts to a convex optimization problem and might be considered as an extension of basis pursuit from sparse vectors to network structured datasets. Using a standard first-order oracle model, we characterize the number of iterations for sparse label propagation to achieve a prescribed accuracy. In particular, we derive an upper bound on the number of iterations required to achieve a certain accuracy and show that this upper bound is sharp for datasets having a chain structure (e.g., time series).

    04/25/2018 ∙ by Alexander Jung, et al. ∙ 4 share

    read it

  • Classifying Big Data over Networks via the Logistic Network Lasso

    We apply network Lasso to solve binary classification (clustering) problems on network structured data. To this end, we generalize ordinary logistic regression to non-Euclidean data defined over a complex network structure. The resulting logistic network Lasso classifier amounts to solving a non-smooth convex optimization problem. A scalable classification algorithm is obtained by applying the alternating direction methods of multipliers to solve this optimization problem.

    05/07/2018 ∙ by Henrik Ambos, et al. ∙ 4 share

    read it

  • Graph Signal Sampling via Reinforcement Learning

    We formulate the problem of sampling and recovering clustered graph signal as a multi-armed bandit (MAB) problem. This formulation lends naturally to learning sampling strategies using the well-known gradient MAB algorithm. In particular, the sampling strategy is represented as a probability distribution over the individual arms of the MAB and optimized using gradient ascent. Some illustrative numerical experiments indicate that the sampling strategies based on the gradient MAB algorithm outperform existing sampling methods.

    05/15/2018 ∙ by Oleksii Abramenko, et al. ∙ 2 share

    read it

  • Predicting Electricity Outages Caused by Convective Storms

    We consider the problem of predicting power outages in an electrical power grid due to hazards produced by convective storms. These storms produce extreme weather phenomena such as intense wind, tornadoes and lightning over a small area. In this paper, we discuss the application of state-of-the-art machine learning techniques, such as random forest classifiers and deep neural networks, to predict the amount of damage caused by storms. We cast this application as a classification problem where the goal is to classify storm cells into a finite number of classes, each corresponding to a certain amount of expected damage. The classification method use as input features estimates for storm cell location and movement which has to be extracted from the raw data. A main challenge of this application is that the training data is heavily imbalanced as the occurrence of extreme weather events is rare. In order to address this issue, we applied SMOTE technique.

    05/21/2018 ∙ by Roope Tervo, et al. ∙ 2 share

    read it

  • Recovery Conditions and Sampling Strategies for Network Lasso

    The network Lasso is a recently proposed convex optimization method for machine learning from massive network structured datasets, i.e., big data over networks. It is a variant of the well-known least absolute shrinkage and selection operator (Lasso), which is underlying many methods in learning and signal processing involving sparse models. Highly scalable implementations of the network Lasso can be obtained by state-of-the art proximal methods, e.g., the alternating direction method of multipliers (ADMM). By generalizing the concept of the compatibility condition put forward by van de Geer and Buehlmann as a powerful tool for the analysis of plain Lasso, we derive a sufficient condition, i.e., the network compatibility condition, on the underlying network topology such that network Lasso accurately learns a clustered underlying graph signal. This network compatibility condition relates the location of the sampled nodes with the clustering structure of the network. In particular, the NCC informs the choice of which nodes to sample, or in machine learning terms, which data points provide most information if labeled.

    09/03/2017 ∙ by Alexandru Mara, et al. ∙ 0 share

    read it