BigDL: A Distributed Deep Learning Framework for Big Data

04/16/2018
by   Jason, et al.
0

In this paper, we present BigDL, a distributed deep learning framework for Big Data platforms and workflows. It is implemented on top of Apache Spark, and allows users to write their deep learning applications as standard Spark programs (running directly on large-scale big data clusters in a distributed fashion). It provides an expressive, "data-analytics integrated" deep learning programming model, so that users can easily build the end-to-end analytics + AI pipelines under a unified programming paradigm; by implementing an AllReduce like operation using existing primitives in Spark (e.g., shuffle, broadcast, and in-memory data persistence), it also provides a highly efficient "parameter server" style architecture, so as to achieve highly scalable, data-parallel distributed training. Since its initial open source release, BigDL users have built many analytics and deep learning applications (e.g., object detection, sequence-to-sequence generation, neural recommendations, fraud detection, etc.) on Spark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2022

The Principle of Least Sensing: A Privacy-Friendly Sensing Paradigm for Urban Big Data Analytics

With the worldwide emergence of data protection regulations, how to cond...
research
07/19/2020

High Performance Data Engineering Everywhere

The amazing advances being made in the fields of machine and deep learni...
research
09/23/2019

Machine Learning Pipelines with Modern Big Data Tools for High Energy Physics

The effective utilization at scale of complex machine learning (ML) tech...
research
12/12/2018

STEP : A Distributed Multi-threading Framework Towards Efficient Data Analytics

Various general-purpose distributed systems have been proposed to cope w...
research
02/23/2016

Mobile Big Data Analytics Using Deep Learning and Apache Spark

The proliferation of mobile devices, such as smartphones and Internet of...
research
03/21/2020

Translation of Array-Based Loops to Distributed Data-Parallel Programs

Large volumes of data generated by scientific experiments and simulation...
research
09/23/2019

Machine Learning Pipelines with Modern Big DataTools for High Energy Physics

The effective utilization at scale of complex machine learning (ML) tech...

Please sign up or login with your details

Forgot password? Click here to reset