Reinhard Furrer

is this you? claim profile


  • varrank: an R package for variable ranking based on mutual information with applications to observed systemic datasets

    This article describes the R package varrank. It has a flexible implementation of heuristic approaches which perform variable ranking based on mutual information. The package is particularly suitable for exploring multivariate datasets requiring a holistic analysis. The core functionality is a general implementation of the minimum redundancy maximum relevance (mRMRe) model. This approach is based on information theory metrics. It is compatible with discrete and continuous data which are discretised using a large choice of possible rules. The two main problems that can be addressed by this package are the selection of the most representative variables for modeling a collection of variables of interest, i.e., dimension reduction, and variable ranking with respect to a set of variables of interest.

    04/19/2018 ∙ by Gilles Kratzer, et al. ∙ 4 share

    read it

  • eggCounts: a Bayesian hierarchical toolkit to model faecal egg count reductions

    This is a vignette for the R package eggCounts version 2.0. The package implements a suite of Bayesian hierarchical models dealing with faecal egg count reductions. The models are designed for a variety of practical situations, including individual treatment efficacy, zero inflation, small sample size (less than 10) and potential outliers. The functions are intuitive to use and their output are easy to interpret, such that users are protected from being exposed to complex Bayesian hierarchical modelling tasks. In addition, the package includes plotting functions to display data and results in a visually appealing manner. The models are implemented in Stan modelling language, which provides efficient sampling technique to obtain posterior samples. This vignette briefly introduces different models, and provides a short walk-through analysis with example data.

    04/30/2018 ∙ by Craig Wang, et al. ∙ 0 share

    read it

  • optimParallel: an R Package Providing Parallel Versions of the Gradient-Based Optimization Methods of optim()

    The R package optimParallel provides a parallel version of the gradient-based optimization methods of optim(). The main function of the package is optimParallel(), which has the same usage and output as optim(). Using optimParallel() can significantly reduce optimization times. We introduce the R package and illustrate its implementation, which takes advantage of the lexical scoping mechanism of R.

    04/30/2018 ∙ by Florian Gerber, et al. ∙ 0 share

    read it

  • Comparison between Suitable Priors for Additive Bayesian Networks

    Additive Bayesian networks are types of graphical models that extend the usual Bayesian generalized linear model to multiple dependent variables through the factorisation of the joint probability distribution of the underlying variables. When fitting an ABN model, the choice of the prior of the parameters is of crucial importance. If an inadequate prior - like a too weakly informative one - is used, data separation and data sparsity lead to issues in the model selection process. In this work a simulation study between two weakly and a strongly informative priors is presented. As weakly informative prior we use a zero mean Gaussian prior with a large variance, currently implemented in the R-package abn. The second prior belongs to the Student's t-distribution, specifically designed for logistic regressions and, finally, the strongly informative prior is again Gaussian with mean equal to true parameter value and a small variance. We compare the impact of these priors on the accuracy of the learned additive Bayesian network in function of different parameters. We create a simulation study to illustrate Lindley's paradox based on the prior choice. We then conclude by highlighting the good performance of the informative Student's t-prior and the limited impact of the Lindley's paradox. Finally, suggestions for further developments are provided.

    09/18/2018 ∙ by Gilles Kratzer, et al. ∙ 0 share

    read it

  • Information-Theoretic Scoring Rules to Learn Additive Bayesian Network Applied to Epidemiology

    Bayesian network modelling is a well adapted approach to study messy and highly correlated datasets which are very common in, e.g., systems epidemiology. A popular approach to learn a Bayesian network from an observational datasets is to identify the maximum a posteriori network in a search-and-score approach. Many scores have been proposed both Bayesian or frequentist based. In an applied perspective, a suitable approach would allow multiple distributions for the data and is robust enough to run autonomously. A promising framework to compute scores are generalized linear models. Indeed, there exists fast algorithms for estimation and many tailored solutions to common epidemiological issues. The purpose of this paper is to present an R package abn that has an implementation of multiple frequentist scores and some realistic simulations that show its usability and performance. It includes features to deal efficiently with data separation and adjustment which are very common in systems epidemiology.

    08/03/2018 ∙ by Gilles Kratzer, et al. ∙ 0 share

    read it

  • Is a single unique Bayesian network enough to accurately represent your data?

    Bayesian network (BN) modelling is extensively used in systems epidemiology. Usually it consists in selecting and reporting the best-fitting structure conditional to the data. A major practical concern is avoiding overfitting, on account of its extreme flexibility and its modelling richness. Many approaches have been proposed to control for overfitting. Unfortunately, they essentially all rely on very crude decisions that result in too simplistic approaches for such complex systems. In practice, with limited data sampled from complex system, this approach seems too simplistic. An alternative would be to use the Monte Carlo Markov chain model choice (MC3) over the network to learn the landscape of reasonably supported networks, and then to present all possible arcs with their MCMC support. This paper presents an R implementation, called mcmcabn, of a flexible structural MC3 that is accessible to non-specialists.

    02/18/2019 ∙ by Gilles Kratzer, et al. ∙ 0 share

    read it

  • Combining Heterogeneous Spatial Datasets with Process-based Spatial Fusion Models: A Unifying Framework

    In modern spatial statistics, the structure of data that is collected has become more heterogeneous. Depending on the type of spatial data, different modeling strategies for spatial data are used. For example, a kriging approach for geostatistical data; a Gaussian Markov random field model for lattice data; or a log Gaussian Cox process for point-pattern data. Despite these different modeling choices, the nature of underlying scientific data-generating (latent) processes is often the same, which can be represented by some continuous spatial surfaces. In this paper, we introduce a unifying framework for process-based multivariate spatial fusion models. The framework can jointly analyze all three aforementioned types of spatial data (or any combinations thereof). Moreover, the framework accommodates different conditional distributions for geostatistical and lattice data. We show that some established approaches, such as linear models of coregionalization, can be viewed as special cases of our proposed framework. We offer flexible and scalable implementations in R using Stan and INLA. Simulation studies confirm that the predictive performance of latent processes improves as we move from univariate spatial models to multivariate spatial fusion models. The introduced framework is illustrated using a cross-sectional study linked with a national cohort dataset in Switzerland, we examine differences in underlying spatial risk patterns between respiratory disease and lung cancer.

    06/02/2019 ∙ by Craig Wang, et al. ∙ 0 share

    read it