
varrank: an R package for variable ranking based on mutual information with applications to observed systemic datasets
This article describes the R package varrank. It has a flexible implementation of heuristic approaches which perform variable ranking based on mutual information. The package is particularly suitable for exploring multivariate datasets requiring a holistic analysis. The core functionality is a general implementation of the minimum redundancy maximum relevance (mRMRe) model. This approach is based on information theory metrics. It is compatible with discrete and continuous data which are discretised using a large choice of possible rules. The two main problems that can be addressed by this package are the selection of the most representative variables for modeling a collection of variables of interest, i.e., dimension reduction, and variable ranking with respect to a set of variables of interest.
04/19/2018 ∙ by Gilles Kratzer, et al. ∙ 4 ∙ shareread it

eggCounts: a Bayesian hierarchical toolkit to model faecal egg count reductions
This is a vignette for the R package eggCounts version 2.0. The package implements a suite of Bayesian hierarchical models dealing with faecal egg count reductions. The models are designed for a variety of practical situations, including individual treatment efficacy, zero inflation, small sample size (less than 10) and potential outliers. The functions are intuitive to use and their output are easy to interpret, such that users are protected from being exposed to complex Bayesian hierarchical modelling tasks. In addition, the package includes plotting functions to display data and results in a visually appealing manner. The models are implemented in Stan modelling language, which provides efficient sampling technique to obtain posterior samples. This vignette briefly introduces different models, and provides a short walkthrough analysis with example data.
04/30/2018 ∙ by Craig Wang, et al. ∙ 0 ∙ shareread it

optimParallel: an R Package Providing Parallel Versions of the GradientBased Optimization Methods of optim()
The R package optimParallel provides a parallel version of the gradientbased optimization methods of optim(). The main function of the package is optimParallel(), which has the same usage and output as optim(). Using optimParallel() can significantly reduce optimization times. We introduce the R package and illustrate its implementation, which takes advantage of the lexical scoping mechanism of R.
04/30/2018 ∙ by Florian Gerber, et al. ∙ 0 ∙ shareread it

Comparison between Suitable Priors for Additive Bayesian Networks
Additive Bayesian networks are types of graphical models that extend the usual Bayesian generalized linear model to multiple dependent variables through the factorisation of the joint probability distribution of the underlying variables. When fitting an ABN model, the choice of the prior of the parameters is of crucial importance. If an inadequate prior  like a too weakly informative one  is used, data separation and data sparsity lead to issues in the model selection process. In this work a simulation study between two weakly and a strongly informative priors is presented. As weakly informative prior we use a zero mean Gaussian prior with a large variance, currently implemented in the Rpackage abn. The second prior belongs to the Student's tdistribution, specifically designed for logistic regressions and, finally, the strongly informative prior is again Gaussian with mean equal to true parameter value and a small variance. We compare the impact of these priors on the accuracy of the learned additive Bayesian network in function of different parameters. We create a simulation study to illustrate Lindley's paradox based on the prior choice. We then conclude by highlighting the good performance of the informative Student's tprior and the limited impact of the Lindley's paradox. Finally, suggestions for further developments are provided.
09/18/2018 ∙ by Gilles Kratzer, et al. ∙ 0 ∙ shareread it

InformationTheoretic Scoring Rules to Learn Additive Bayesian Network Applied to Epidemiology
Bayesian network modelling is a well adapted approach to study messy and highly correlated datasets which are very common in, e.g., systems epidemiology. A popular approach to learn a Bayesian network from an observational datasets is to identify the maximum a posteriori network in a searchandscore approach. Many scores have been proposed both Bayesian or frequentist based. In an applied perspective, a suitable approach would allow multiple distributions for the data and is robust enough to run autonomously. A promising framework to compute scores are generalized linear models. Indeed, there exists fast algorithms for estimation and many tailored solutions to common epidemiological issues. The purpose of this paper is to present an R package abn that has an implementation of multiple frequentist scores and some realistic simulations that show its usability and performance. It includes features to deal efficiently with data separation and adjustment which are very common in systems epidemiology.
08/03/2018 ∙ by Gilles Kratzer, et al. ∙ 0 ∙ shareread it

Is a single unique Bayesian network enough to accurately represent your data?
Bayesian network (BN) modelling is extensively used in systems epidemiology. Usually it consists in selecting and reporting the bestfitting structure conditional to the data. A major practical concern is avoiding overfitting, on account of its extreme flexibility and its modelling richness. Many approaches have been proposed to control for overfitting. Unfortunately, they essentially all rely on very crude decisions that result in too simplistic approaches for such complex systems. In practice, with limited data sampled from complex system, this approach seems too simplistic. An alternative would be to use the Monte Carlo Markov chain model choice (MC3) over the network to learn the landscape of reasonably supported networks, and then to present all possible arcs with their MCMC support. This paper presents an R implementation, called mcmcabn, of a flexible structural MC3 that is accessible to nonspecialists.
02/18/2019 ∙ by Gilles Kratzer, et al. ∙ 0 ∙ shareread it

Combining Heterogeneous Spatial Datasets with Processbased Spatial Fusion Models: A Unifying Framework
In modern spatial statistics, the structure of data that is collected has become more heterogeneous. Depending on the type of spatial data, different modeling strategies for spatial data are used. For example, a kriging approach for geostatistical data; a Gaussian Markov random field model for lattice data; or a log Gaussian Cox process for pointpattern data. Despite these different modeling choices, the nature of underlying scientific datagenerating (latent) processes is often the same, which can be represented by some continuous spatial surfaces. In this paper, we introduce a unifying framework for processbased multivariate spatial fusion models. The framework can jointly analyze all three aforementioned types of spatial data (or any combinations thereof). Moreover, the framework accommodates different conditional distributions for geostatistical and lattice data. We show that some established approaches, such as linear models of coregionalization, can be viewed as special cases of our proposed framework. We offer flexible and scalable implementations in R using Stan and INLA. Simulation studies confirm that the predictive performance of latent processes improves as we move from univariate spatial models to multivariate spatial fusion models. The introduced framework is illustrated using a crosssectional study linked with a national cohort dataset in Switzerland, we examine differences in underlying spatial risk patterns between respiratory disease and lung cancer.
06/02/2019 ∙ by Craig Wang, et al. ∙ 0 ∙ shareread it
Reinhard Furrer
is this you? claim profile