Neural Network Compression Framework for fast model inference

02/20/2020
by   Alexander Kozlov, et al.
0

In this work we present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF). It leverages recent advances of various network compression methods and implements some of them, such as sparsity, quantization, and binarization. These methods allow getting more hardware-friendly models which can be efficiently run on general-purpose hardware computation units (CPU, GPU) or special Deep Learning accelerators. We show that the developed methods can be successfully applied to a wide range of models to accelerate the inference time while keeping the original accuracy. The framework can be used within the training samples, which are supplied with it, or as a standalone package that can be seamlessly integrated into the existing training code with minimal adaptations. Currently, a PyTorch <cit.> version of NNCF is available as a part of OpenVINO Training Extensions at https://github.com/opencv/openvino_training_extensions/tree/develop/pytorch_toolkit/nncf

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2022

EvoJAX: Hardware-Accelerated Neuroevolution

Evolutionary computation has been shown to be a highly effective method ...
research
03/30/2022

A Fast Transformer-based General-Purpose Lossless Compressor

Deep-learning-based compressor has received interests recently due to mu...
research
03/04/2023

A Fast Training-Free Compression Framework for Vision Transformers

Token pruning has emerged as an effective solution to speed up the infer...
research
09/09/2021

Bag of Tricks for Optimizing Transformer Efficiency

Improving Transformer efficiency has become increasingly attractive rece...
research
04/26/2023

Guaranteed Quantization Error Computation for Neural Network Model Compression

Neural network model compression techniques can address the computation ...
research
06/02/2022

DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks

Efficient deep neural network (DNN) models equipped with compact operato...
research
02/27/2020

Vortex: OpenCL Compatible RISC-V GPGPU

The current challenges in technology scaling are pushing the semiconduct...

Please sign up or login with your details

Forgot password? Click here to reset