AI based Presentation Creator With Customized Audio Content Delivery

06/27/2021
by   Muvazima Mansoor, et al.
0

In this paper, we propose an architecture to solve a novel problem statement that has stemmed more so in recent times with an increase in demand for virtual content delivery due to the COVID-19 pandemic. All educational institutions, workplaces, research centers, etc. are trying to bridge the gap of communication during these socially distanced times with the use of online content delivery. The trend now is to create presentations, and then subsequently deliver the same using various virtual meeting platforms. The time being spent in such creation of presentations and delivering is what we try to reduce and eliminate through this paper which aims to use Machine Learning (ML) algorithms and Natural Language Processing (NLP) modules to automate the process of creating a slides-based presentation from a document, and then use state-of-the-art voice cloning models to deliver the content in the desired author's voice. We consider a structured document such as a research paper to be the content that has to be presented. The research paper is first summarized using BERT summarization techniques and condensed into bullet points that go into the slides. Tacotron inspired architecture with Encoder, Synthesizer, and a Generative Adversarial Network (GAN) based vocoder, is used to convey the contents of the slides in the author's voice (or any customized voice). Almost all learning has now been shifted to online mode, and professionals are now working from the comfort of their homes. Due to the current situation, teachers and professionals have shifted to presentations to help them in imparting information. In this paper, we aim to reduce the considerable amount of time that is taken in creating a presentation by automating this process and subsequently delivering this presentation in a customized voice, using a content delivery mechanism that can clone any voice using a short audio clip.

READ FULL TEXT

page 1

page 2

research
03/26/2021

Say It All: Feedback for Improving Non-Visual Presentation Accessibility

Presenters commonly use slides as visual aids for informative talks. Whe...
research
01/10/2022

A Practical Guide to Logical Access Voice Presentation Attack Detection

Voice-based human-machine interfaces with an automatic speaker verificat...
research
07/13/2022

Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion

Voice conversion is to generate a new speech with the source content and...
research
12/26/2019

Score and Lyrics-Free Singing Voice Generation

Generative models for singing voice have been mostly concerned with the ...
research
05/09/2023

Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings

The virtual world is being established in which digital humans are creat...
research
08/02/2021

Creation and Detection of German Voice Deepfakes

Synthesizing voice with the help of machine learning techniques has made...
research
12/22/2021

Adaptive Beam Search to Enhance On-device Abstractive Summarization

We receive several essential updates on our smartphones in the form of S...

Please sign up or login with your details

Forgot password? Click here to reset