Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

03/07/2023
by   Jierun Chen, et al.
0

To design fast neural networks, many works have been focusing on reducing the number of floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does not necessarily lead to a similar level of reduction in latency. This mainly stems from inefficiently low floating-point operations per second (FLOPS). To achieve faster networks, we revisit popular operators and demonstrate that such low FLOPS is mainly due to frequent memory access of the operators, especially the depthwise convolution. We hence propose a novel partial convolution (PConv) that extracts spatial features more efficiently, by cutting down redundant computation and memory access simultaneously. Building upon our PConv, we further propose FasterNet, a new family of neural networks, which attains substantially higher running speed than others on a wide range of devices, without compromising on accuracy for various vision tasks. For example, on ImageNet-1k, our tiny FasterNet-T0 is 3.1×, 3.1×, and 2.5× faster than MobileViT-XXS on GPU, CPU, and ARM processors, respectively, while being 2.9% more accurate. Our large FasterNet-L achieves impressive 83.5% top-1 accuracy, on par with the emerging Swin-B, while having 49% higher inference throughput on GPU, as well as saving 42% compute time on CPU. Code is available at <https://github.com/JierunChen/FasterNet>.

READ FULL TEXT
research
09/24/2018

No Multiplication? No Floating Point? No Problem! Training Networks for Efficient Inference

For successful deployment of deep neural networks on highly--resource-co...
research
01/10/2022

GhostNets on Heterogeneous Devices via Cheap Operations

Deploying convolutional neural networks (CNNs) on mobile devices is diff...
research
08/16/2019

daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices

It is always well believed that Binary Neural Networks (BNNs) could dras...
research
04/02/2014

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

We present techniques for speeding up the test-time evaluation of large ...
research
05/11/2023

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

Vision transformers have shown great success due to their high model cap...
research
01/31/2023

Tricking AI chips into Simulating the Human Brain: A Detailed Performance Analysis

Challenging the Nvidia monopoly, dedicated AI-accelerator chips have beg...
research
03/30/2020

TResNet: High Performance GPU-Dedicated Architecture

Many deep learning models, developed in recent years, reach higher Image...

Please sign up or login with your details

Forgot password? Click here to reset