Deep Learning for Technical Document Classification

06/27/2021
by   Shuo Jiang, et al.
0

In large technology companies, the requirements for managing and organizing technical documents created by engineers and managers in supporting relevant decision making have increased dramatically in recent years, which has led to a higher demand for more scalable, accurate, and automated document classification. Prior studies have primarily focused on processing text for classification and small-scale databases. This paper describes a novel multimodal deep learning architecture, called TechDoc, for technical document classification, which utilizes both natural language and descriptive images to train hierarchical classifiers. The architecture synthesizes convolutional neural networks and recurrent neural networks through an integrated training process. We applied the architecture to a large multimodal technical document database and trained the model for classifying documents based on the hierarchical International Patent Classification system. Our results show that the trained neural network presents a greater classification accuracy than those using a single modality and several earlier text classification methods. The trained model can potentially be scaled to millions of real-world technical documents with both text and figures, which is useful for data and knowledge management in large technology companies and organizations.

READ FULL TEXT
09/24/2017

HDLTex: Hierarchical Deep Learning for Text Classification

The continually increasing number of documents produced each year necess...
01/06/2021

On-Device Document Classification using multimodal features

From small screenshots to large videos, documents take up a bulk of spac...
07/15/2019

Multimodal deep networks for text and image-based document classification

Classification of document images is a critical step for archival of old...
05/14/2016

Rationale-Augmented Convolutional Neural Networks for Text Classification

We present a new Convolutional Neural Network (CNN) model for text class...
08/31/2018

Seeing Colors: Learning Semantic Text Encoding for Classification

The question we answer with this work is: can we convert a text document...
04/13/2005

Learning from Web: Review of Approaches

Knowledge discovery is defined as non-trivial extraction of implicit, pr...
04/15/2022

Email Spam Detection Using Hierarchical Attention Hybrid Deep Learning Method

Email is one of the most widely used ways to communicate, with millions ...