Deep Hurdle Networks for Zero-Inflated Multi-Target Regression: Application to Multiple Species Abundance Estimation

by   Shufeng Kong, et al.

A key problem in computational sustainability is to understand the distribution of species across landscapes over time. This question gives rise to challenging large-scale prediction problems since (i) hundreds of species have to be simultaneously modeled and (ii) the survey data are usually inflated with zeros due to the absence of species for a large number of sites. The problem of tackling both issues simultaneously, which we refer to as the zero-inflated multi-target regression problem, has not been addressed by previous methods in statistics and machine learning. In this paper, we propose a novel deep model for the zero-inflated multi-target regression problem. To this end, we first model the joint distribution of multiple response variables as a multivariate probit model and then couple the positive outcomes with a multivariate log-normal distribution. By penalizing the difference between the two distributions' covariance matrices, a link between both distributions is established. The whole model is cast as an end-to-end learning framework and we provide an efficient learning algorithm for our model that can be fully implemented on GPUs. We show that our model outperforms the existing state-of-the-art baselines on two challenging real-world species distribution datasets concerning bird and fish populations.


page 1

page 2

page 3

page 4


Clarifying species dependence under joint species distribution modeling

Joint species distribution modeling is attracting increasing attention t...

Deep Multi-Species Embedding

Understanding how species are distributed across landscapes over time is...

Bayesian Multi-Species N-Mixture Models for Unmarked Animal Communities

We propose an extension of the N-mixture model which allows for the esti...

Joint species distribution modeling with additive multivariate Gaussian process priors and heteregenous data

In this work, we propose JSDMs where the responses to environmental cova...

Multi-Entity Dependence Learning with Rich Context via Conditional Variational Auto-encoder

Multi-Entity Dependence Learning (MEDL) explores conditional correlation...

HOT-VAE: Learning High-Order Label Correlation for Multi-Label Classification via Attention-Based Variational Autoencoders

Understanding how environmental characteristics affect bio-diversity pat...

Zero-inflated Beta distribution regression modeling

A frequent challenge encountered with ecological data is how to interpre...