DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging

07/14/2017
by   Sheng Chen, et al.
0

Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and search. In this work, we propose a novel yet simple approach called DocTag2Vec to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two popular models for learning distributed representation of words and documents. In DocTag2Vec, we simultaneously learn the representation of words, documents, and tags in a joint vector space during training, and employ the simple k-nearest neighbor search to predict tags for unseen documents. In contrast to previous multi-label learning methods, DocTag2Vec directly deals with raw text instead of provided feature vector, and in addition, enjoys advantages like the learning of tag representation, and the ability of handling newly created tags. To demonstrate the effectiveness of our approach, we conduct experiments on several datasets and show promising results against state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

Method for Customizable Automated Tagging: Addressing the Problem of Over-tagging and Under-tagging Text Documents

Using author provided tags to predict tags for a new document often resu...
research
09/15/2021

Cross-Register Projection for Headline Part of Speech Tagging

Part of speech (POS) tagging is a familiar NLP task. State of the art ta...
research
12/13/2016

Learning to Hash-tag Videos with Tag2Vec

User-given tags or labels are valuable resources for semantic understand...
research
05/30/2023

Cross Encoding as Augmentation: Towards Effective Educational Text Classification

Text classification in education, usually called auto-tagging, is the au...
research
05/31/2016

Fast Zero-Shot Image Tagging

The well-known word analogy experiments show that the recent word vector...
research
11/27/2019

Multi-label Classification for Automatic Tag Prediction in the Context of Programming Challenges

One of the best ways for developers to test and improve their skills in ...
research
06/15/2022

Born for Auto-Tagging: Faster and better with new objective functions

Keyword extraction is a task of text mining. It is applied to increase s...

Please sign up or login with your details

Forgot password? Click here to reset