Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs

10/23/2018
by   Keno Fischer, et al.
0

Google's Cloud TPUs are a promising new hardware architecture for machine learning workloads. They have powered many of Google's milestone machine learning achievements in recent years. Google has now made TPUs available for general use on their cloud platform and as of very recently has opened them up further to allow use by non-TensorFlow frontends. We describe a method and implementation for offloading suitable sections of Julia programs to TPUs via this new API and the Google XLA compiler. Our method is able to completely fuse the forward pass of a VGG19 model expressed as a Julia program into a single TPU executable to be offloaded to the device. Our method composes well with existing compiler-based automatic differentiation techniques on Julia code, and we are thus able to also automatically obtain the VGG19 backwards pass and similarly offload it to the TPU. Targeting TPUs using our compiler, we are able to evaluate the VGG19 forward pass on a batch of 100 images in 0.23s which compares favorably to the 52.4s required for the original model on the CPU. Our implementation is less than 1000 lines of Julia, with no TPU specific changes made to the core Julia compiler or any other Julia packages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2020

Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients

Applying differentiable programming techniques and machine learning algo...
research
08/28/2021

Compiler-Driven FPGA Virtualization with SYNERGY

FPGAs are increasingly common in modern applications, and cloud provider...
research
01/13/2021

MLGO: a Machine Learning Guided Compiler Optimizations Framework

Leveraging machine-learning (ML) techniques for compiler optimizations h...
research
12/17/2017

TensorFlow-Serving: Flexible, High-Performance ML Serving

We describe TensorFlow-Serving, a system to serve machine learning model...
research
06/30/2020

TDO-CIM: Transparent Detection and Offloading for Computation In-memory

Computation in-memory is a promising non-von Neumann approach aiming at ...
research
11/07/2020

Exploring the limits of Concurrency in ML Training on Google TPUs

Recent results in language understanding using neural networks have requ...
research
07/29/2022

IMUNet: Efficient Regression Architecture for IMU Navigation and Positioning

Data-driven based method for navigation and positioning has absorbed att...

Please sign up or login with your details

Forgot password? Click here to reset