i-Code Studio: A Configurable and Composable Framework for Integrative AI

05/23/2023
by   Yuwei Fang, et al.
0

Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities. Integrative AI is one important direction to approach AGI, through combining multiple models to tackle complex multimodal tasks. However, there is a lack of a flexible and composable platform to facilitate efficient and effective model composition and coordination. In this paper, we propose the i-Code Studio, a configurable and composable framework for Integrative AI. The i-Code Studio orchestrates multiple pre-trained models in a finetuning-free fashion to conduct complex multimodal tasks. Instead of simple model composition, the i-Code Studio provides an integrative, flexible, and composable setting for developers to quickly and easily compose cutting-edge services and technologies tailored to their specific requirements. The i-Code Studio achieves impressive results on a variety of zero-shot multimodal tasks, such as video-to-text retrieval, speech-to-speech translation, and visual question answering. We also demonstrate how to quickly build a multimodal agent based on the i-Code Studio that can communicate and personalize for users.

READ FULL TEXT

page 5

page 6

research
02/27/2023

Language Is Not All You Need: Aligning Perception with Language Models

A big convergence of language, multimodal perception, action, and world ...
research
10/20/2022

Composing Ensembles of Pre-trained Models via Iterative Consensus

Large pre-trained models exhibit distinct and complementary capabilities...
research
04/01/2022

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

Large foundation models can exhibit unique capabilities depending on the...
research
06/30/2023

Multimodal Prompt Retrieval for Generative Visual Question Answering

Recent years have witnessed impressive results of pre-trained vision-lan...
research
05/13/2023

On the Hidden Mystery of OCR in Large Multimodal Models

Large models have recently played a dominant role in natural language pr...
research
08/28/2020

AllenAct: A Framework for Embodied AI Research

The domain of Embodied AI, in which agents learn to complete tasks throu...
research
03/29/2021

Platform for Situated Intelligence

We introduce Platform for Situated Intelligence, an open-source framewor...

Please sign up or login with your details

Forgot password? Click here to reset