TResNet: High Performance GPU-Dedicated Architecture

03/30/2020
by   Tal Ridnik, et al.
0

Many deep learning models, developed in recent years, reach higher ImageNet accuracy than ResNet50, with fewer or comparable FLOPS count. While FLOPs are often seen as a proxy for network efficiency, when measuring actual GPU training and inference throughput, vanilla ResNet50 is usually significantly faster than its recent competitors, offering better throughput-accuracy trade-off. In this work, we introduce a series of architecture modifications that aim to boost neural networks' accuracy, while retaining their GPU training and inference efficiency. We first demonstrate and discuss the bottlenecks induced by FLOPs-optimizations. We then suggest alternative designs that better utilize GPU structure and assets. Finally, we introduce a new family of GPU-dedicated models, called TResNet, which achieve better accuracy and efficiency than previous ConvNets. Using a TResNet model, with similar GPU throughput to ResNet50, we reach 80.7 models also transfer well and achieve state-of-the-art accuracy on competitive datasets such as Stanford cars (96.0 Oxford-Flowers (99.1 https://github.com/mrT23/TResNet

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2023

YOLOv6 v3.0: A Full-Scale Reloading

The YOLO community has been in high spirits since our first two releases...
research
05/28/2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Convolutional Neural Networks (ConvNets) are commonly developed at a fix...
research
04/01/2021

EfficientNetV2: Smaller Models and Faster Training

This paper introduces EfficientNetV2, a new family of convolutional netw...
research
09/14/2020

AutoML for Multilayer Perceptron and FPGA Co-design

State-of-the-art Neural Network Architectures (NNAs) are challenging to ...
research
03/07/2023

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

To design fast neural networks, many works have been focusing on reducin...
research
07/26/2023

YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems

We present YOLOBench, a benchmark comprised of 550+ YOLO-based object de...
research
06/11/2020

JIT-Masker: Efficient Online Distillation for Background Matting

We design a real-time portrait matting pipeline for everyday use, partic...

Please sign up or login with your details

Forgot password? Click here to reset