Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units

04/18/2023
by   Mohammed E. Elbtity, et al.
0

Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in convolutional neural networks (CNNs). However, they struggle to maintain the same efficiency in fully connected (FC) layers, leading to suboptimal hardware utilization. In-memory analog computing (IMAC) architectures, on the other hand, have demonstrated notable speedup in executing FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and mixed-precision architecture that integrates an IMAC unit with an edge TPU to enhance mobile CNN performance. To leverage the strengths of TPUs for convolutional layers and IMAC circuits for dense layers, we propose a unified learning algorithm that incorporates mixed-precision training techniques to mitigate potential accuracy drops when deploying models on the TPU-IMAC architecture. The simulations demonstrate that the TPU-IMAC configuration achieves up to 2.59× performance improvements, and 88% memory reductions compared to conventional TPU architectures for various CNN models while maintaining comparable accuracy. The TPU-IMAC architecture shows potential for various applications where energy efficiency and high performance are essential, such as edge computing and real-time processing in mobile devices. The unified training algorithm and the integration of IMAC and TPU architectures contribute to the potential impact of this research on the broader machine learning landscape.

READ FULL TEXT
research
05/24/2021

An In-Memory Analog Computing Co-Processor for Energy-Efficient CNN Inference on Mobile Devices

In this paper, we develop an in-memory analog computing (IMAC) architect...
research
09/07/2023

Mapping of CNNs on multi-core RRAM-based CIM architectures

RRAM-based multi-core systems improve the energy efficiency and performa...
research
05/22/2017

Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices

In a previous work we have detailed the requirements to obtain a maximal...
research
09/16/2023

Exploration of TPUs for AI Applications

Tensor Processing Units (TPUs) are specialized hardware accelerators for...
research
12/21/2015

Quantized Convolutional Neural Networks for Mobile Devices

Recently, convolutional neural networks (CNN) have demonstrated impressi...
research
10/30/2018

A mixed signal architecture for convolutional neural networks

Deep neural network (DNN) accelerators with improved energy and delay ar...
research
09/16/2023

Reducing Memory Requirements for the IPU using Butterfly Factorizations

High Performance Computing (HPC) benefits from different improvements du...

Please sign up or login with your details

Forgot password? Click here to reset