Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks

07/21/2017
by   Nils Reimers, et al.
0

Selecting optimal parameters for a neural network architecture can often make the difference between mediocre and state-of-the-art performance. However, little is published which parameters and design choices should be evaluated or selected making the correct hyperparameter optimization often a "black art that requires expert experiences" (Snoek et al., 2012). In this paper, we evaluate the importance of different network design choices and hyperparameters for five common linguistic sequence tagging tasks (POS, Chunking, NER, Entity Recognition, and Event Detection). We evaluated over 50.000 different setups and found, that some parameters, like the pre-trained word embeddings or the last layer of the network, have a large impact on the performance, while other parameters, for example the number of LSTM layers or the number of recurrent units, are of minor importance. We give a recommendation on a configuration that performs well among different tasks.

READ FULL TEXT

page 15

page 19

research
03/04/2016

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

State-of-the-art sequence labeling systems traditionally require large a...
research
07/31/2017

Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging

In this paper we show that reporting a single performance score is insuf...
research
04/29/2017

Semi-supervised sequence tagging with bidirectional language models

Pre-trained word embeddings learned from unlabeled text have become a st...
research
12/21/2016

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling

In this paper we propose and carefully evaluate a sequence labeling fram...
research
01/22/2021

Artificial intelligence prediction of stock prices using social media

The primary objective of this work is to develop a Neural Network based ...
research
04/11/2018

Word2Vec applied to Recommendation: Hyperparameters Matter

Skip-gram with negative sampling, a popular variant of Word2vec original...
research
12/26/2020

Assessment of the Relative Importance of different hyper-parameters of LSTM for an IDS

Recurrent deep learning language models like the LSTM are often used to ...

Please sign up or login with your details

Forgot password? Click here to reset