Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs

07/27/2023
by   Or Sharir, et al.
0

Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. For example, an AI writing assistant is required to update its suggestions in real time as a document is edited. Re-running the model each time is expensive, even with compression techniques like knowledge distillation, pruning, or quantization. Instead, we take an incremental computing approach, looking to reuse calculations as the inputs change. However, the dense connectivity of conventional architectures poses a major obstacle to incremental computation, as even minor input changes cascade through the network and restrict information reuse. To address this, we use vector quantization to discretize intermediate values in the network, which filters out noisy and unnecessary modifications to hidden neurons, facilitating the reuse of their values. We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of the modified inputs. Our experiments with adapting the OPT-125M pre-trained language model demonstrate comparable accuracy on document classification while requiring 12.1X (median) fewer operations for processing sequences of atomic edits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2021

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

Incremental processing allows interactive systems to respond based on pa...
research
09/22/2021

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network

Incremental text-to-speech (TTS) synthesis generates utterances in small...
research
05/20/2021

Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking

An emerging recipe for achieving state-of-the-art effectiveness in neura...
research
07/21/2023

Model Compression Methods for YOLOv5: A Review

Over the past few years, extensive research has been devoted to enhancin...
research
05/18/2023

TAPIR: Learning Adaptive Revision for Incremental Natural Language Understanding with a Two-Pass Model

Language is by its very nature incremental in how it is produced and pro...
research
02/27/2013

Incremental Dynamic Construction of Layered Polytree Networks

Certain classes of problems, including perceptual data understanding, ro...

Please sign up or login with your details

Forgot password? Click here to reset