DeepAI AI Chat
Log In Sign Up

Synthesising Expressiveness in Peking Opera via Duration Informed Attention Network

by   Yusong Wu, et al.

This paper presents a method that generates expressive singing voice of Peking opera. The synthesis of expressive opera singing usually requires pitch contours to be extracted as the training data, which relies on techniques and is not able to be manually labeled. With the Duration Informed Attention Network (DurIAN), this paper makes use of musical note instead of pitch contours for expressive opera singing synthesis. The proposed method enables human annotation being combined with automatic extracted features to be used as training data thus the proposed method gives extra flexibility in data collection for Peking opera singing synthesis. Comparing with the expressive singing voice of Peking opera synthesised by pitch contour based system, the proposed musical note based system produces comparable singing voice in Peking opera with expressiveness in various aspects.


page 1

page 2

page 3

page 4


Peking Opera Synthesis via Duration Informed Attention Network

Peking Opera has been the most dominant form of Chinese performing art s...

XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

This paper presents XiaoiceSing, a high-quality singing voice synthesis ...

Expressive Singing Synthesis Using Local Style Token and Dual-path Pitch Encoder

This paper proposes a controllable singing voice synthesis system capabl...

Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions

In this paper, we tackle the singing voice phoneme segmentation problem ...

Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism

This paper proposes a novel sequence-to-sequence (seq2seq) model with a ...

Singing Synthesis: with a little help from my attention

We present a novel system for singing synthesis, based on attention. Sta...

Renaissance canons with asymmetric schemes

By a "scheme" of a musical canon, we mean the order of voice entry with ...