Categorical Metadata Representation for Customized Text Classification

02/14/2019
by   Jihyeok Kim, et al.
0

The performance of text classification has improved tremendously using intelligently engineered neural-based models, especially those injecting categorical metadata as additional information, e.g., using user/product information for sentiment classification. These information have been used to modify parts of the model (e.g., word embeddings, attention mechanisms) such that results can be customized according to the metadata. We observe that current representation methods for categorical metadata, which are devised for human consumption, are not as effective as claimed in popular classification methods, outperformed even by simple concatenation of categorical features in the final layer of the sentence encoder. We conjecture that categorical features are harder to represent for machine use, as available context only indirectly describes the category, and even such context is often scarce (for tail category). To this end, we propose to use basis vectors to effectively incorporate categorical metadata on various parts of a neural-based model. This additionally decreases the number of parameters dramatically, especially when the number of categorical features is large. Extensive experiments on various datasets with different properties are performed and show that through our method, we can represent categorical metadata more effectively to customize parts of the model, including unexplored ones, and increase the performance of the model greatly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2021

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Multi-label text classification refers to the problem of assigning each ...
research
09/22/2021

Exploring Heterogeneous Metadata for Video Recommendation with Two-tower Model

Online video services acquire new content on a daily basis to increase e...
research
11/07/2021

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

We study the problem of weakly supervised text classification, which aim...
research
06/08/2018

Text Classification based on Word Subspace with Term-Frequency

Text classification has become indispensable due to the rapid increase o...
research
03/16/2022

CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals

We propose a framework to modularize the training of neural language mod...
research
03/08/2023

A Categorical Framework of General Intelligence

Can machines think? Since Alan Turing asked this question in 1950, nobod...
research
11/04/2021

Benchmarking Multimodal AutoML for Tabular Data with Text Fields

We consider the use of automated supervised learning systems for data ta...

Please sign up or login with your details

Forgot password? Click here to reset