Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

08/17/2023
by   Ziyin Zhang, et al.
0

Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward. However, the problem of how to better optimize a text recognition model from the perspective of loss functions is largely overlooked. CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with accuracy degradation. This is because CTC loss emphasizes the optimization of the entire sequence target while neglecting to learn individual characters. We propose a self-distillation scheme for CTC-based model to address this issue. It incorporates a framewise regularization term in CTC loss to emphasize individual supervision, and leverages the maximizing-a-posteriori of latent alignment to solve the inconsistency problem that arises in distillation between CTC-based models. We refer to the regularized CTC loss as Distillation Connectionist Temporal Classification (DCTC) loss. DCTC loss is module-free, requiring no extra parameters, longer inference lag, or additional training data or phases. Extensive experiments on public benchmarks demonstrate that DCTC can boost text recognition model accuracy by up to 2.6

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

Recent years have witnessed great strides in self-supervised learning (S...
research
12/06/2022

Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition

Very deep models for speaker recognition (SR) have demonstrated remarkab...
research
03/18/2022

Semi-Supervised Learning with Mutual Distillation for Monocular Depth Estimation

We propose a semi-supervised learning framework for monocular depth esti...
research
11/21/2018

A Novel Integrated Framework for Learning both Text Detection and Recognition

In this paper, we propose a novel integrated framework for learning both...
research
08/13/2023

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Generative Language Models (GLMs) have shown impressive performance in t...
research
06/11/2020

On Improving Temporal Consistency for Online Face Liveness Detection

In this paper, we focus on improving the online face liveness detection ...
research
12/05/2020

Reciprocal Supervised Learning Improves Neural Machine Translation

Despite the recent success on image classification, self-training has on...

Please sign up or login with your details

Forgot password? Click here to reset