InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference

09/28/2022
by   Mu Yuan, et al.
0

Mobile-centric AI applications have high requirements for resource-efficiency of model inference. Input filtering is a promising approach to eliminate the redundancy so as to reduce the cost of inference. Previous efforts have tailored effective solutions for many applications, but left two essential questions unanswered: (1) theoretical filterability of an inference workload to guide the application of input filtering techniques, thereby avoiding the trial-and-error cost for resource-constrained mobile applications; (2) robust discriminability of feature embedding to allow input filtering to be widely effective for diverse inference tasks and input content. To answer them, we first formalize the input filtering problem and theoretically compare the hypothesis complexity of inference models and input filters to understand the optimization potential. Then we propose the first end-to-end learnable input filtering framework that covers most state-of-the-art methods and surpasses them in feature embedding with robust discriminability. We design and implement InFi that supports six input modalities and multiple mobile-centric deployments. Comprehensive evaluations confirm our theoretical results and show that InFi outperforms strong baselines in applicability, accuracy, and efficiency. InFi achieve 8.5x throughput and save 95 over 90

READ FULL TEXT

page 1

page 2

page 10

page 11

page 12

page 15

research
12/28/2020

How to Train Your Differentiable Filter

In many robotic applications, it is crucial to maintain a belief about t...
research
09/16/2021

Dr. Top-k: Delegate-Centric Top-k on GPUs

Recent top-k computation efforts explore the possibility of revising var...
research
06/30/2022

Proteus: A Self-Designing Range Filter

We introduce Proteus, a novel self-designing approximate range filter, w...
research
06/18/2021

AutoTune: Improving End-to-end Performance and Resource Efficiency for Microservice Applications

Most large web-scale applications are now built by composing collections...
research
07/17/2023

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

Voxel-based methods have achieved state-of-the-art performance for 3D ob...
research
01/14/2020

Run-time Deep Model Multiplexing

We propose a framework to design a light-weight neural multiplexer that ...
research
11/24/2019

Pixel Adaptive Filtering Units

State-of-the-art methods for computer vision rely heavily on the transla...

Please sign up or login with your details

Forgot password? Click here to reset