DrugDBEmbed : Semantic Queries on Relational Database using Supervised Column Encodings

07/05/2020
by   Bortik Bandyopadhyay, et al.
0

Traditional relational databases contain a lot of latent semantic information that have largely remained untapped due to the difficulty involved in automatically extracting such information. Recent works have proposed unsupervised machine learning approaches to extract such hidden information by textifying the database columns and then projecting the text tokens onto a fixed dimensional semantic vector space. However, in certain databases, task-specific class labels may be available, which unsupervised approaches are unable to lever in a principled manner. Also, when embeddings are generated at individual token level, then column encoding of multi-token text column has to be computed by taking the average of the vectors of the tokens present in that column for any given row. Such averaging approach may not produce the best semantic vector representation of the multi-token text column, as observed while encoding paragraphs or documents in natural language processing domain. With these shortcomings in mind, we propose a supervised machine learning approach using a Bi-LSTM based sequence encoder to directly generate column encodings for multi-token text columns of the DrugBank database, which contains gold standard drug-drug interaction (DDI) labels. Our text data driven encoding approach achieves very high Accuracy on the supervised DDI prediction task for some columns and we use those supervised column encodings to simulate and evaluate the Analogy SQL queries on relational data to demonstrate the efficacy of our technique.

READ FULL TEXT

page 10

page 11

research
03/23/2016

Enabling Cognitive Intelligence Queries in Relational Databases using Low-dimensional Word Embeddings

We apply distributed language embedding methods from Natural Language Pr...
research
12/19/2017

Cognitive Database: A Step towards Endowing Relational Databases with Artificial Intelligence Capabilities

We propose Cognitive Databases, an approach for transparently enabling A...
research
11/01/2018

Embedding Individual Table Columns for Resilient SQL Chatbots

Most of the world's data is stored in relational databases. Accessing th...
research
08/11/2020

Hybrid Ranking Network for Text-to-SQL

In this paper, we study how to leverage pre-trained language models in T...
research
03/20/2019

Column2Vec: Structural Understanding via Distributed Representations of Database Schemas

We present Column2Vec, a distributed representation of database columns ...
research
05/25/2019

Sherlock: A Deep Learning Approach to Semantic Data Type Detection

Correctly detecting the semantic type of data columns is crucial for dat...
research
10/31/2017

Extracting Syntactic Patterns from Databases

Many database columns contain string or numerical data that conforms to ...

Please sign up or login with your details

Forgot password? Click here to reset