Enabling Serverless Deployment of Large-Scale AI Workloads

10/04/2022
by   Sotiris Moschoyiannis, et al.
0

We propose a set of optimization techniques for transforming a generic AI codebase so that it can be successfully deployed to a restricted serverless environment, without compromising capability or performance. These involve (1) slimming the libraries and frameworks (e.g., pytorch) used, down to pieces pertaining to the solution; (2) dynamically loading pre-trained AI/ML models into local temporary storage, during serverless function invocation; (3) using separate frameworks for training and inference, with ONNX model formatting; and, (4) performance-oriented tuning for data storage and lookup. The techniques are illustrated via worked examples that have been deployed live on geospatial data from the transportation domain. This draws upon a real-world case study in intelligent transportation looking at on-demand, real-time predictions of flows of train movements across the UK rail network. Evaluation of the proposed techniques shows the response time, for varying volumes of queries involving prediction, to remain almost constant (at 50 ms), even as the database scales up to the 250M entries. The query response time is important in this context as the target is predicting train delays. It is even more important in a serverless environment due to the stringent constraints on serverless functions' runtime before timeout. The similarities of a serverless environment to other resource constrained environments (e.g., IoT, telecoms) means the techniques can be applied to a range of use cases.

READ FULL TEXT

page 7

page 8

page 12

research
06/28/2023

BLEND: Efficient and blended IoT data storage and communication with application layer security

Many IoT use cases demand both secure storage and secure communication. ...
research
08/18/2021

Deployment of Deep Neural Networks for Object Detection on Edge AI Devices with Runtime Optimization

Deep neural networks have proven increasingly important for automotive s...
research
06/07/2020

Kafka-ML: connecting the data stream with ML/AI frameworks

Machine Learning (ML) and Artificial Intelligence (AI) have a dependency...
research
05/18/2021

Transformers à Grande Vitesse

Robust travel time predictions are of prime importance in managing any t...
research
06/09/2023

EfficientBioAI: Making Bioimaging AI Models Efficient in Energy, Latency and Representation

Artificial intelligence (AI) has been widely used in bioimage image anal...
research
11/17/2020

Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database

Mature industrial sectors (e.g., aviation) collect their real world fail...
research
02/20/2023

Solving Recurrent MIPs with Semi-supervised Graph Neural Networks

We propose an ML-based model that automates and expedites the solution o...

Please sign up or login with your details

Forgot password? Click here to reset