Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks

01/24/2019
by   Paul Azunre, et al.
0

A character-level convolutional neural network (CNN) motivated by applications in "automated machine learning" (AutoML) is proposed to semantically classify columns in tabular data. Simulated data containing a set of base classes is first used to learn an initial set of weights. Hand-labeled data from the CKAN repository is then used in a transfer-learning paradigm to adapt the initial weights to a more sophisticated representation of the problem (e.g., including more classes). In doing so, realistic data imperfections are learned and the set of classes handled can be expanded from the base set with reduced labeled data and computing power requirements. Results show the effectiveness and flexibility of this approach in three diverse domains: semantic classification of tabular data, age prediction from social media posts, and email spam classification. In addition to providing further evidence of the effectiveness of transfer learning in natural language processing (NLP), our experiments suggest that analyzing the semantic structure of language at the character level without additional metadata---i.e., network structure, headers, etc.---can produce competitive accuracy for type classification, spam classification, and social media age prediction. We present our open-source toolkit SIMON, an acronym for Semantic Inference for the Modeling of ONtologies, which implements this approach in a user-friendly and scalable/parallelizable fashion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

Using Deep Networks and Transfer Learning to Address Disinformation

We apply an ensemble pipeline composed of a character-level convolutiona...
research
07/04/2019

Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study

Mental illness affects a significant portion of the worldwide population...
research
06/26/2018

EmbNum: Semantic labeling for numerical values with deep metric learning

Semantic labeling is a task of matching unknown data source to labeled d...
research
05/11/2016

Tweet2Vec: Character-Based Distributed Representations for Social Media

Text from social media provides a set of challenges that can cause tradi...
research
07/04/2019

Application of Transfer Learning for Automatic Triage of Social Media Posts

Mental illness affects a significant portion of the worldwide population...
research
02/12/2018

Deep Neural Networks for Bot Detection

The problem of detecting bots, automated social media accounts governed ...

Please sign up or login with your details

Forgot password? Click here to reset