TResNet: High Performance GPU-Dedicated Architecture

03/30/2020 ∙ by Tal Ridnik, et al. ∙ 0

Many deep learning models, developed in recent years, reach higher ImageNet accuracy than ResNet50, with fewer or comparable FLOPS count. While FLOPs are often seen as a proxy for network efficiency, when measuring actual GPU training and inference throughput, vanilla ResNet50 is usually significantly faster than its recent competitors, offering better throughput-accuracy trade-off. In this work, we introduce a series of architecture modifications that aim to boost neural networks' accuracy, while retaining their GPU training and inference efficiency. We first demonstrate and discuss the bottlenecks induced by FLOPs-optimizations. We then suggest alternative designs that better utilize GPU structure and assets. Finally, we introduce a new family of GPU-dedicated models, called TResNet, which achieve better accuracy and efficiency than previous ConvNets. Using a TResNet model, with similar GPU throughput to ResNet50, we reach 80.7 models also transfer well and achieve state-of-the-art accuracy on competitive datasets such as Stanford cars (96.0 Oxford-Flowers (99.1



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


TResNet: High Performance GPU-Dedicated Architecture

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.