One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-off in Machine Learning Cloud Service APIs via Tolerance Tiers

06/26/2019
by   Matthew Halpern, et al.
4

Today's cloud service architectures follow a "one size fits all" deployment strategy where the same service version instantiation is provided to the end users. However, consumers are broad and different applications have different accuracy and responsiveness requirements, which as we demonstrate renders the "one size fits all" approach inefficient in practice. We use a production-grade speech recognition engine, which serves several thousands of users, and an open source computer vision based system, to explain our point. To overcome the limitations of the "one size fits all" approach, we recommend Tolerance Tiers where each MLaaS tier exposes an accuracy/responsiveness characteristic, and consumers can programmatically select a tier. We evaluate our proposal on the CPU-based automatic speech recognition (ASR) engine and cutting-edge neural networks for image classification deployed on both CPUs and GPUs. The results show that our proposed approach provides an MLaaS cloud service architecture that can be tuned by the end API user or consumer to outperform the conventional "one size fits all" approach.

READ FULL TEXT

page 4

page 5

page 7

page 8

page 10

page 11

research
09/14/2020

EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition

We present EasyASR, a distributed machine learning platform for training...
research
08/12/2020

Online Automatic Speech Recognition with Listen, Attend and Spell Model

The Listen, Attend and Spell (LAS) model and other attention-based autom...
research
04/07/2015

Voice based self help System: User Experience Vs Accuracy

In general, self help systems are being increasingly deployed by service...
research
10/19/2021

Speech Pattern based Black-box Model Watermarking for Automatic Speech Recognition

As an effective method for intellectual property (IP) protection, model ...
research
05/16/2020

Dynamic Sparsity Neural Networks for Automatic Speech Recognition

In automatic speech recognition (ASR), model pruning is a widely adopted...
research
02/17/2022

MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition

We propose multi-layer perceptron (MLP)-based architectures suitable for...
research
01/02/2018

M2: Malleable Metal as a Service

Existing bare-metal cloud services that provide users with physical node...

Please sign up or login with your details

Forgot password? Click here to reset