A Multilingual Encoding Method for Text Classification and Dialect Identification Using Convolutional Neural Network

03/18/2019
by   Amr Adel Helmy, et al.
0

This thesis presents a language-independent text classification model by introduced two new encoding methods "BUNOW" and "BUNOC" used for feeding the raw text data into a new CNN spatial architecture with vertical and horizontal convolutional process instead of commonly used methods like one hot vector or word representation (i.e. word2vec) with temporal CNN architecture. The proposed model can be classified as hybrid word-character model in its work methodology because it consumes less memory space by using a fewer neural network parameters as in character level representation, in addition to providing much faster computations with fewer network layers depth, as in word level representation. A promising result achieved compared to state of art models in two different morphological benchmarked dataset one for Arabic language and one for English language.

READ FULL TEXT
research
03/11/2019

An Innovative Word Encoding Method For Text Classification Using Convolutional Neural Network

Text classification plays a vital role today especially with the intensi...
research
11/14/2016

Character-level Convolutional Network for Text Classification Applied to Chinese Corpus

This article provides an interesting exploration of character-level conv...
research
08/08/2017

Which Encoding is the Best for Text Classification in Chinese, English, Japanese and Korean?

This article offers an empirical study on the different ways of encoding...
research
05/25/2018

UMDSub at SemEval-2018 Task 2: Multilingual Emoji Prediction Multi-channel Convolutional Neural Network on Subword Embedding

This paper describes the UMDSub system that participated in Task 2 of Se...
research
10/24/2018

Image-based Natural Language Understanding Using 2D Convolutional Neural Networks

We propose a new approach to natural language understanding in which we ...
research
04/30/2018

Staircase Network: structural language identification via hierarchical attentive units

Language recognition system is typically trained directly to optimize cl...
research
05/31/2019

Investigating an Effective Character-level Embedding in Korean Sentence Classification

Different from the writing systems of many Romance and Germanic language...

Please sign up or login with your details

Forgot password? Click here to reset