Learning Document Embeddings With CNNs

11/11/2017
by   Chundi Liu, et al.
0

We propose a new model for unsupervised document embedding. Existing approaches either require complex inference or use recurrent neural networks that are difficult to parallelize. We take a different route and use recent advances in language modelling to develop a convolutional neural network embedding model. This allows us to train deeper architectures that are fully parallelizable. Stacking layers together increases the receptive field allowing each successive layer to model increasingly longer range semantic dependencies within the document. Empirically, we demonstrate superior results on two publicly available benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2016

Word and Document Embeddings based on Neural Network Approaches

Data representation is a fundamental task in machine learning. The repre...
research
06/23/2021

Should You Go Deeper? Optimizing Convolutional Neural Network Architectures without Training by Receptive Field Analysis

Applying artificial neural networks (ANN) to specific tasks, researchers...
research
09/13/2015

Learning Contextual Dependencies with Convolutional Hierarchical Recurrent Neural Networks

Existing deep convolutional neural networks (CNNs) have shown their grea...
research
06/10/2020

Training with Multi-Layer Embeddings for Model Reduction

Modern recommendation systems rely on real-valued embeddings of categori...
research
09/11/2019

BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding

For understanding generic documents, information like font sizes, column...
research
06/27/2017

Training a Fully Convolutional Neural Network to Route Integrated Circuits

We present a deep, fully convolutional neural network that learns to rou...
research
06/28/2018

Expolring Architectures for CNN-Based Word Spotting

The goal in word spotting is to retrieve parts of document images which ...

Please sign up or login with your details

Forgot password? Click here to reset