Online Gesture Recognition using Transformer and Natural Language Processing

05/05/2023
by   G. C. M. Silvestre, et al.
0

The Transformer architecture is shown to provide a powerful machine transduction framework for online handwritten gestures corresponding to glyph strokes of natural language sentences. The attention mechanism is successfully used to create latent representations of an end-to-end encoder-decoder model, solving multi-level segmentation while also learning some language features and syntax rules. The additional use of a large decoding space with some learned Byte-Pair-Encoding (BPE) is shown to provide robustness to ablated inputs and syntax rules. The encoder stack was directly fed with spatio-temporal data tokens potentially forming an infinitely large input vocabulary, an approach that finds applications beyond that of this work. Encoder transfer learning capabilities is also demonstrated on several languages resulting in faster optimisation and shared parameters. A new supervised dataset of online handwriting gestures suitable for generic handwriting recognition tasks was used to successfully train a small transformer model to an average normalised Levenshtein accuracy of 96

READ FULL TEXT
research
11/04/2022

A Transformer Architecture for Online Gesture Recognition of Mathematical Expressions

The Transformer architecture is shown to provide a powerful framework as...
research
06/09/2020

Graph-Aware Transformer: Is Attention All Graphs Need?

Graphs are the natural data structure to represent relational and struct...
research
01/25/2022

ViT-HGR: Vision Transformer-based Hand Gesture Recognition from High Density Surface EMG Signals

Recently, there has been a surge of significant interest on application ...
research
06/18/2020

Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition

Code-switching (CS) occurs when a speaker alternates words of two or mor...
research
11/24/2021

Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages

This paper presents a novel training method for end-to-end scene text re...
research
07/27/2021

PiSLTRc: Position-informed Sign Language Transformer with Content-aware Convolution

Since the superiority of Transformer in learning long-term dependency, t...
research
03/31/2021

Gesture Similarity Analysis on Event Data Using a Hybrid Guided Variational Auto Encoder

While commercial mid-air gesture recognition systems have existed for at...

Please sign up or login with your details

Forgot password? Click here to reset