Deep Lake: a Lakehouse for Deep Learning

09/22/2022
by   Sasun Hambardzumyan, et al.
11

Traditional data lakes provide critical data infrastructure for analytical workloads by enabling time travel, running SQL queries, ingesting data with ACID transactions, and visualizing petabyte-scale datasets on cloud storage. They allow organizations to break down data silos, unlock data-driven decision-making, improve operational efficiency, and reduce costs. However, as deep learning usage increases, traditional data lakes are not well-designed for applications such as natural language processing (NLP), audio processing, computer vision, and applications involving non-tabular datasets. This paper presents Deep Lake, an open-source lakehouse for deep learning applications developed at Activeloop. Deep Lake maintains the benefits of a vanilla data lake with one key difference: it stores complex data, such as images, videos, annotations, as well as tabular data, in the form of tensors and rapidly streams the data over the network to (a) Tensor Query Language, (b) in-browser visualization engine, or (c) deep learning frameworks without sacrificing GPU utilization. Datasets stored in Deep Lake can be accessed from PyTorch, TensorFlow, JAX, and integrate with numerous MLOps tools.

READ FULL TEXT
research
03/02/2020

Natural Language Processing Advancements By Deep Learning: A Survey

Natural Language Processing (NLP) helps empower intelligent machines by ...
research
03/14/2020

ML-AQP: Query-Driven Approximate Query Processing based on Machine Learning

As more and more organizations rely on data-driven decision making, larg...
research
09/05/2023

TensorBank:Tensor Lakehouse for Foundation Model Training

Storing and streaming high dimensional data for foundation model trainin...
research
10/14/2019

Characterizing Deep Learning Training Workloads on Alibaba-PAI

Modern deep learning models have been exploited in various domains, incl...
research
11/09/2022

Profiling and Improving the PyTorch Dataloader for high-latency Storage: A Technical Report

A growing number of Machine Learning Frameworks recently made Deep Learn...
research
05/22/2018

RPC Considered Harmful: Fast Distributed Deep Learning on RDMA

Deep learning emerges as an important new resource-intensive workload an...
research
07/26/2017

TensorLayer: A Versatile Library for Efficient Deep Learning Development

Deep learning has enabled major advances in the fields of computer visio...

Please sign up or login with your details

Forgot password? Click here to reset