BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

04/03/2022
by   Jason Dai, et al.
29

Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger dataset (for both experimentation and production deployment). These usually entail many manual and error-prone steps for the data scientists to fully take advantage of the available hardware resources (e.g., SIMD instructions, multi-processing, quantization, memory allocation optimization, data partitioning, distributed computing, etc.). To address this challenge, we have open sourced BigDL 2.0 at https://github.com/intel-analytics/BigDL/ under Apache 2.0 license (combining the original BigDL and Analytics Zoo projects); using BigDL 2.0, users can simply build conventional Python notebooks on their laptops (with possible AutoML support), which can then be transparently accelerated on a single node (with up-to 9.6x speedup in our experiments), and seamlessly scaled out to a large cluster (across several hundreds servers in real-world use cases). BigDL 2.0 has already been adopted by many real-world users (such as Mastercard, Burger King, Inspur, etc.) in production.

READ FULL TEXT

page 2

page 3

page 5

research
10/04/2021

PyTorrent: A Python Library Corpus for Large-scale Language Models

A large scale collection of both semantic and natural language resources...
research
04/26/2023

HiQ – A Declarative, Non-intrusive, Dynamic and Transparent Observability and Optimization System

This paper proposes a non-intrusive, declarative, dynamic and transparen...
research
05/28/2020

A Distributed Multi-GPU System for Large-Scale Node Embedding at Tencent

Scaling node embedding systems to efficiently process networks in real-w...
research
07/18/2023

Cloud-native RStudio on Kubernetes for Hopsworks

In order to fully benefit from cloud computing, services are designed fo...
research
08/21/2020

Revisiting Process versus Product Metrics: a Large Scale Analysis

Numerous methods can build predictive models from software data. But wha...
research
06/09/2023

Open Data on GitHub: Unlocking the Potential of AI

GitHub is the world's largest platform for collaborative software develo...
research
07/25/2021

LightOn Optical Processing Unit: Scaling-up AI and HPC with a Non von Neumann co-processor

We introduce LightOn's Optical Processing Unit (OPU), the first photonic...

Please sign up or login with your details

Forgot password? Click here to reset