On the Unreported-Profile-is-Negative Assumption for Predictive Cheminformatics

04/03/2017
by   Chao Lan, et al.
0

In cheminformatics, compound-target binding profiles has been a main source of data for research. For data repositories that only provide positive profiles, a popular assumption is that unreported profiles are all negative. In this paper, we caution audience not to take this assumption for granted, and present empirical evidence of its ineffectiveness from a machine learning perspective. Our examination is based on a setting where binding profiles are used as features to train predictive models; we show (1) prediction performance degrades when the assumption fails and (2) explicit recovery of unreported profiles improves prediction performance. In particular, we propose a framework that jointly recovers profiles and learns predictive model, and show it achieves further performance improvement. The presented study not only suggests applying matrix recovery methods to recover unreported profiles, but also initiates a new missing feature problem which we called Learning with Positive and Unknown Features.

READ FULL TEXT
research
03/12/2018

Classifying Online Dating Profiles on Tinder using FaceNet Facial Embeddings

A method to produce personalized classification models to automatically ...
research
07/11/2022

A blended distance to define "people-like-me"

Curve matching is a prediction technique that relies on predictive mean ...
research
06/13/2012

Observation Subset Selection as Local Compilation of Performance Profiles

Deciding what to sense is a crucial task, made harder by dependencies an...
research
11/03/2020

Comparison of pharmacist evaluation of medication orders with predictions of a machine learning model

The objective of this work was to assess the clinical performance of an ...
research
02/20/2022

Deconstructing Distributions: A Pointwise Framework of Learning

In machine learning, we traditionally evaluate the performance of a sing...
research
06/04/2020

Unsupervised clustering of Roman pottery profiles from their SSAE representation

In this paper we introduce the ROman COmmonware POTtery (ROCOPOT) databa...
research
12/01/2020

Scalable Data Discovery Using Profiles

We study the problem of discovering joinable datasets at scale. This is,...

Please sign up or login with your details

Forgot password? Click here to reset