Exploiting the Logits: Joint Sign Language Recognition and Spell-Correction

07/01/2020
by   Christina Runkel, et al.
0

Machine learning techniques have excelled in the automatic semantic analysis of images, reaching human-level performances on challenging benchmarks. Yet, the semantic analysis of videos remains challenging due to the significantly higher dimensionality of the input data, respectively, the significantly higher need for annotated training examples. By studying the automatic recognition of German sign language videos, we demonstrate that on the relatively scarce training data of 2.800 videos, modern deep learning architectures for video analysis (such as ResNeXt) along with transfer learning on large gesture recognition tasks, can achieve about 75 this leaves us with a probability of under 25 correctly, spell-correction systems are crucial for producing readable outputs. The contribution of this paper is to propose a convolutional neural network for spell-correction that expects the softmax outputs of the character recognition network (instead of a misspelled word) as an input. We demonstrate that purely learning on softmax inputs in combination with scarce training data yields overfitting as the network learns the inputs by heart. In contrast, training the network on several variants of the logits of the classification output i.e. scaling by a constant factor, adding of random noise, mixing of softmax and hardmax inputs or purely training on hardmax inputs, leads to better generalization while benefitting from the significant information hidden in these outputs (that have 98 the comparably low character accuracy.

READ FULL TEXT
research
09/20/2022

Recognizing multiclass Static Sign Language words for deaf and dumb people of Bangladesh based on transfer learning techniques

Sign language is a language used for communication of the deaf and dumb ...
research
03/26/2021

BART based semantic correction for Mandarin automatic speech recognition system

Although automatic speech recognition (ASR) systems achieved significant...
research
09/15/2014

Transfer Learning for Video Recognition with Scarce Training Data for Deep Convolutional Neural Network

Unconstrained video recognition and Deep Convolution Network (DCN) are t...
research
01/02/2020

Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep Character Recognition

Pre-training a deep neural network on the ImageNet dataset is a common p...
research
08/07/2016

Robsut Wrod Reocginiton via semi-Character Recurrent Neural Network

Language processing mechanism by humans is generally more robust than co...
research
02/27/2022

A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning

Large datasets as required for deep learning of lip reading do not exist...

Please sign up or login with your details

Forgot password? Click here to reset