Collage Inference: Achieving low tail latency during distributed image classification using coded redundancy models

06/05/2019
by   Krishna Narra, et al.
0

Reducing the latency variance in machine learning inference is a key requirement in many applications. Variance is harder to control in a cloud deployment in the presence of stragglers. In spite of this challenge, inference is increasingly being done in the cloud, due to the advent of affordable machine learning as a service (MLaaS) platforms. Existing approaches to reduce variance rely on replication which is expensive and partially negates the affordability of MLaaS. In this work, we argue that MLaaS platforms also provide unique opportunities to cut the cost of redundancy. In MLaaS platforms, multiple inference requests are concurrently received by a load balancer which can then create a more cost-efficient redundancy coding across a larger collection of images. We propose a novel convolutional neural network model, Collage-CNN, to provide a low-cost redundancy framework. A Collage-CNN model takes a collage formed by combining multiple images and performs multi-image classification in one shot, albeit at slightly lower accuracy. We then augment a collection of traditional single image classifiers with a single Collage-CNN classifier which acts as a low-cost redundant backup. Collage-CNN then provides backup classification results if a single image classification straggles. Deploying the Collage-CNN models in the cloud, we demonstrate that the 99th percentile tail latency of inference can be reduced by 1.47X compared to replication based approaches while providing high accuracy. Also, variation in inference latency can be reduced by 9X with a slight increase in average inference latency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2019

Collage Inference: Tolerating Stragglers in Distributed Neural Network Inference using Coding

MLaaS (ML-as-a-Service) offerings by cloud computing platforms are becom...
research
10/02/2017

Effective Straggler Mitigation: Which Clones Should Attack and When?

Redundancy for straggler mitigation, originally in data download and mor...
research
10/16/2020

Performance evaluation and application of computation based low-cost homogeneous machine learning model algorithm for image classification

The image classification machine learning model was trained with the int...
research
10/01/2017

Straggler Mitigation by Delayed Relaunch of Tasks

Redundancy for straggler mitigation, originally in data download and mor...
research
05/02/2019

Parity Models: A General Framework for Coding-Based Resilience in ML Inference

Machine learning models are becoming the primary workhorses for many app...
research
06/25/2019

Straggler Mitigation at Scale

Runtime performance variability at the servers has been a major issue, h...
research
03/04/2019

CodeNet: Training Large Scale Neural Networks in Presence of Soft-Errors

This work proposes the first strategy to make distributed training of ne...

Please sign up or login with your details

Forgot password? Click here to reset