IteraTTA: An interface for exploring both text prompts and audio priors in generating music with text-to-audio models

07/24/2023
by   Hiromu Yakura, et al.
0

Recent text-to-audio generation techniques have the potential to allow novice users to freely generate music audio. Even if they do not have musical knowledge, such as about chord progressions and instruments, users can try various text prompts to generate audio. However, compared to the image domain, gaining a clear understanding of the space of possible music audios is difficult because users cannot listen to the variations of the generated audios simultaneously. We therefore facilitate users in exploring not only text prompts but also audio priors that constrain the text-to-audio music generation process. This dual-sided exploration enables users to discern the impact of different text prompts and audio priors on the generation results through iterative comparison of them. Our developed interface, IteraTTA, is specifically designed to aid users in refining text prompts and selecting favorable audio priors from the generated audios. With this, users can progressively reach their loosely-specified goals while understanding and exploring the space of possible results. Our implementation and discussions highlight design considerations that are specifically required for text-to-audio models and how interaction techniques can contribute to their effectiveness.

READ FULL TEXT

page 1

page 4

research
08/03/2023

MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

Diffusion models have shown promising results in cross-modal generation ...
research
01/30/2023

SingSong: Generating musical accompaniments from singing

We present SingSong, a system that generates instrumental music to accom...
research
06/03/2021

LyricJam: A system for generating lyrics for live instrumental music

We describe a real-time system that receives a live audio stream from a ...
research
01/12/2023

Rock Guitar Tablature Generation via Natural Language Processing

Deep learning has recently empowered and democratized generative modelin...
research
06/26/2021

An Audio Envelope Generator Derived from Industrial Process Control

Audio envelopes serve a crucial role in ensuring the versatility of synt...
research
01/11/2022

Music2Video: Automatic Generation of Music Video with fusion of audio and text

Creation of images using generative adversarial networks has been widely...
research
02/26/2021

Music-Circles: Can Music Be Represented With Numbers?

The world today is experiencing an abundance of music like no other time...

Please sign up or login with your details

Forgot password? Click here to reset