Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models

07/05/2023
by   Huwan Peng, et al.
0

Large language models (LLMs) such as ChatGPT have demonstrated unprecedented capabilities in multiple AI tasks. However, hardware inefficiencies have become a significant factor limiting the democratization of LLMs. We propose Chiplet Cloud, an ASIC supercomputer architecture that optimizes total cost of ownership (TCO) per token for serving generative LLMs. Chiplet Cloud fits all model parameters inside the on-chip SRAMs to eliminate bandwidth limitations while moderating the die size to improve system costs while leveraging software mappings to overcome data communication overhead. We propose a comprehensive design methodology that accurately explores a spectrum of major design trade-offs in the joint space of hardware-software and generates a detailed performance-cost analysis on all valid design points. We evaluate Chiplet Cloud on four popular LLMs. Compared to GPU and TPU, our architecture can achieve up to 94x and 15x improvement in TCO/Token respectively, significantly reducing the cost for realistically serving modern LLMs.

READ FULL TEXT
research
09/12/2023

Do Generative Large Language Models need billions of parameters?

This paper presents novel systems and methodologies for the development ...
research
05/16/2023

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification

The high computational and memory requirements of generative large langu...
research
12/05/2019

Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models

Deep learning models are increasingly used for end-user applications, su...
research
09/20/2023

Software Compartmentalization Trade-Offs with Hardware Capabilities

Compartmentalization is a form of defensive software design in which an ...
research
05/10/2023

Fast Distributed Inference Serving for Large Language Models

Large language models (LLMs) power a new generation of interactive AI ap...
research
04/13/2022

Scalable Training of Language Models using JAX pjit and TPUv4

Modern large language models require distributed training strategies due...
research
05/04/2023

"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process

As the popularity of large language models (LLMs) soars across various a...

Please sign up or login with your details

Forgot password? Click here to reset