Robust Similarity and Distance Learning via Decision Forests

07/27/2020
by   Tyler M. Tomita, et al.
8

Canonical distances such as Euclidean distance often fail to capture the appropriate relationships between items, subsequently leading to subpar inference and prediction. Many algorithms have been proposed for automated learning of suitable distances, most of which employ linear methods to learn a global metric over the feature space. While such methods offer nice theoretical properties, interpretability, and computationally efficient means for implementing them, they are limited in expressive capacity. Methods which have been designed to improve expressiveness sacrifice one or more of the nice properties of the linear methods. To bridge this gap, we propose a highly expressive novel decision forest algorithm for the task of distance learning, which we call Similarity and Metric Random Forests (SMERF). We show that the tree construction procedure in SMERF is a proper generalization of standard classification and regression trees. Thus, the mathematical driving forces of SMERF are examined via its direct connection to regression forests, for which theory has been developed. Its ability to approximate arbitrary distances and identify important features is empirically demonstrated on simulated data sets. Last, we demonstrate that it accurately predicts links in networks.

READ FULL TEXT

page 5

page 6

page 10

page 11

page 13

page 14

page 15

page 16

research
02/12/2018

Random Hinge Forest for Differentiable Learning

We propose random hinge forests, a simple, efficient, and novel variant ...
research
07/30/2018

Local Linear Forests

Random forests are a powerful method for non-parametric regression, but ...
research
05/12/2014

Consistency of random forests

Random forests are a learning algorithm proposed by Breiman [Mach. Learn...
research
06/15/2018

The agreement distance of rooted phylogenetic networks

The minimal number of rooted subtree prune and regraft (rSPR) operations...
research
09/24/2013

Random Forests on Distance Matrices for Imaging Genetics Studies

We propose a non-parametric regression methodology, Random Forests on Di...
research
09/25/2019

Manifold Forests: Closing the Gap on Neural Networks

Decision forests (DF), in particular random forests and gradient boostin...
research
12/19/2022

Fixed and adaptive landmark sets for finite pseudometric spaces

Topological data analysis (TDA) is an expanding field that leverages pri...

Please sign up or login with your details

Forgot password? Click here to reset