Do Neural Nets Learn Statistical Laws behind Natural Language?

07/16/2017
by   Shuntaro Takahashi, et al.
0

The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf's law and Heaps' law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2019

Evaluating Computational Language Models with Scaling Properties of Natural Language

In this article, we evaluate computational models of natural language wi...
research
08/28/2018

Disfluency Detection using a Noisy Channel Model and a Deep Neural Language Model

This paper presents a model for disfluency detection in spontaneous spee...
research
09/18/2023

Do learned speech symbols follow Zipf's law?

In this study, we investigate whether speech symbols, learned through de...
research
12/11/2017

Long-Range Correlation Underlying Childhood Language and Generative Models

Long-range correlation, a property of time series exhibiting long-term m...
research
07/09/2018

A deep learning approach for understanding natural language commands for mobile service robots

Using natural language to give instructions to robots is challenging, si...
research
04/10/2018

Natural Language Statistical Features of LSTM-generated Texts

Long Short-Term Memory (LSTM) networks have recently shown remarkable pe...
research
07/31/2020

A Study on Effects of Implicit and Explicit Language Model Information for DBLSTM-CTC Based Handwriting Recognition

Deep Bidirectional Long Short-Term Memory (D-BLSTM) with a Connectionist...

Please sign up or login with your details

Forgot password? Click here to reset