Joint Foundation Model Caching and Inference of Generative AI Services for Edge Intelligence

05/20/2023
by   Minrui Xu, et al.
0

With the rapid development of artificial general intelligence (AGI), various multimedia services based on pretrained foundation models (PFMs) need to be effectively deployed. With edge servers that have cloud-level computing power, edge intelligence can extend the capabilities of AGI to mobile edge networks. However, compared with cloud data centers, resource-limited edge servers can only cache and execute a small number of PFMs, which typically consist of billions of parameters and require intensive computing power and GPU memory during inference. To address this challenge, in this paper, we propose a joint foundation model caching and inference framework that aims to balance the tradeoff among inference latency, accuracy, and resource consumption by managing cached PFMs and user requests efficiently during the provisioning of generative AI services. Specifically, considering the in-context learning ability of PFMs, a new metric named the Age of Context (AoC), is proposed to model the freshness and relevance between examples in past demonstrations and current service requests. Based on the AoC, we propose a least context caching algorithm to manage cached PFMs at edge servers with historical prompts and inference results. The numerical results demonstrate that the proposed algorithm can reduce system costs compared with existing baselines by effectively utilizing contextual information.

READ FULL TEXT
research
04/18/2023

Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services

Aiming at achieving artificial general intelligence (AGI) for Metaverse,...
research
05/09/2022

JCSP: Joint Caching and Service Placement for Edge Computing Systems

With constrained resources, what, where, and how to cache at the edge is...
research
07/12/2023

Efficient Task Offloading Algorithm for Digital Twin in Edge/Cloud Computing Environment

In the era of Internet of Things (IoT), Digital Twin (DT) is envisioned ...
research
07/22/2021

Online Service Caching and Routing at the Edge with Switching Cost

This paper studies a problem of jointly optimizing two important operati...
research
06/12/2020

PLVER: Joint Stable Allocation and Content Replication for Edge-assisted Live Video Delivery

The live streaming services have gained extreme popularity in recent yea...
research
06/11/2019

Measuring and exploiting the cloud consolidation of the Web

We present measurements showing that the top one million most popular We...
research
01/03/2023

AI-Driven Confidential Computing across Edge-to-Cloud Continuum

With the meteoric growth of technology, individuals and organizations ar...

Please sign up or login with your details

Forgot password? Click here to reset