Incorporating Music Knowledge in Continual Dataset Augmentation for Music Generation

06/23/2020 ∙ by Alisa Liu, et al. ∙ 0

Deep learning has rapidly become the state-of-the-art approach for music generation. However, training a deep model typically requires a large training set, which is often not available for specific musical styles. In this paper, we present augmentative generation (Aug-Gen), a method of dataset augmentation for any music generation system trained on a resource-constrained domain. The key intuition of this method is that the training data for a generative system can be augmented by examples the system produces during the course of training, provided these examples are of sufficiently high quality and variety. We apply Aug-Gen to Transformer-based chorale generation in the style of J.S. Bach, and show that this allows for longer training and results in better generative output.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep learning has rapidly become the state-of-the-art approach for music generation (Briot et al., 2017). However, training a deep model typically requires a large training set, and it is well known that the performance of deep learning systems scales up with more data, even when the data is noisy. However, when training a model to emulate the style of a particular composer, the size of the dataset is inherently limited to the number of compositions by that musician.

In this paper, we present augmentative generation (Aug-Gen), a method of dataset augmentation for any music generation system trained on a resource-constrained domain. The key intuition of this method is that the training data for a generative system can be augmented by examples the system produces during the course of training, provided these examples are of sufficiently high quality and variety. To the best of our knowledge, our paper is the first to introduce a framework in which a generative system continuously generates output and adds it to its training dataset.

We apply Aug-Gen to chorale generation in the style of J.S. Bach. We perform experiments using a Transformer model (Vaswani et al., 2017) as the music generation system. To select model output to include in the training data, we use the grading function in (Fang et al., 2020), which evaluates a given chorale along nine musical features important for Bach chorales. This allows us to incorporate non-differentiable domain knowledge (e.g. 18th century counterpoint rules) into the training procedure.

There has been recent work to use generative models to synthesize training examples for image classification (Kong et al., 2019) and text classification (Anaby-Tavor et al., 2020)

. However, these works aim to train a classifier rather than a generative system like ours. Moreover, the informativeness of candidate training examples is measured relative to the model, whereas in our work, selection of training examples is done by an external critic that does not depend on the model’s current performance.

2 Augmentative Generation

1:  input: true training dataset
2:

  initialize epoch

3:  while stopping criterion is false do
4:     for  do
5:        generate chorale
6:        grade to obtain
7:        if  and is sufficiently diverse then
8:           add to the train dataset
9:        end if
10:     end for
11:     train on batches of size selected from the dataset
12:     
13:  end while
Algorithm 1 Aug-Gen training algorithm

In Aug-Gen, the true dataset is first split into training and validation data. During model training, the training dataset is continuously augmented by model output, while the validation data is fixed and is used to determine when to terminate training. Each epoch of training includes a generation step and a training step. In the generation step, examples are generated and graded by the grading function . We add all generated output that (1) passes a pre-determined quality threshold , and (2) is sufficiently diverse, to the training dataset. In this work, we use a simple uniqueness criterion (i.e. the chorale must not have been seen before) as the diversity requirement. Next, in the training step of the epoch, we train the model on the augmented dataset. This consists of training on batches of size , so that the amount of training in each epoch is independent of the size of the entire training dataset. These steps continue until the validation loss ceases to improve. In Algorithm 1, the Aug-Gen algorithm is shown in pseudocode. Note that the grading function used returns a measure of distance from Bach chorales, so lower grades represent better chorales.

3 Experiments

Figure 1: A series of boxplots representing the grade distribution of 50 chorales generated at each epoch of training in Aug-Gen. Note the lowest validation loss was achieved at epoch . Lower grades represent better chorales.

In our experiments111https://github.com/asdfang/constraint-transformer-bach, we evaluated the effectiveness of Aug-Gen in improving the output quality of a Transformer model trained to generate Bach-style chorales. We encoded Bach chorales in XML notation using music21 (Cuthbert and Ariza, 2010), and used the same data representation as in (Hadjeres et al., 2017)

. Our generative model consists of a Transformer network with relative attention

(Huang et al., 2019).

3.1 Comparison of Training Methods

We compared three training methods on the same model architecture. In each method, we initialized the dataset to the set of 351 Bach chorales, and split the dataset into training and validation. In the generation step of each training epoch, we generated chorales, and include a generated example in the training dataset if it passes a threshold and is unique. In the training step, we trained on randomly selected batches of size from the full training dataset. We trained each model for epochs.

The three models have equivalent hyperparameters and differ only in the threshold

used for including generated chorales in the training dataset: (1) Aug-Gen (

, i.e. the third quartile of Bach grades), which includes only generated chorales that receive a better grade than

of Bach chorales; (2) Baseline-none (), which includes no generated chorales, equivalent to training a model on only Bach chorales; (3) Baseline-all (), which includes all generated chorales, regardless of quality.

Figure 2: The grade distribution of the 351 Bach chorales, and 351 output generated from each model. The distributions are truncated at due to a long tail from the Baseline-all model. Aug-Gen tends to produce chorales with better grades than either baseline.

3.2 Analysis of Training

Figure 1 shows the grade distribution of output generated by Aug-Gen

during each epoch of training. We see that model improvement is clearly reflected in the grading function, suggesting that the grading function can be used to assess model quality independently from the loss function. The first chorale is added to the dataset at epoch

; by epoch , generated examples are of the training dataset. The lowest validation loss achieved by Baseline-none is at epoch , and for Aug-Gen at epoch . This suggests Aug-Gen allows a model to be trained for longer without overfitting.

3.3 Grade Distribution

To compare the three methods, we used each model’s epoch that achieved the lowest validation loss. Figure 2 shows the grade distribution of the 351 Bach chorales and 351 generations from each method. We see that a threshold that selects only high-quality generations results in a tighter grade distribution that more closely resembles Bach’s, compared to thresholds that select all or no generations for training.

4 Conclusion

Our experimental evidence suggests Aug-Gen, paired with an effective grading function, allows for longer training and results in better generative output. Listening to generated chorales indicates that remaining errors tend to be along dimensions not measured by the grading function, including excessive modulation, weak metric structure, and unmusical repetition. Future work includes improving the grading function to account for these issues, exploring richer measures of diversity within a dataset, applying Aug-Gen to different models and musical domains, and devising other training methods that utilize generated music data.

References

  • A. Anaby-Tavor, B. Carmeli, E. Goldbraich, A. Kantor, G. Kour, S. Shlomov, N. Tepper, and N. Zwerdling (2020) Not enough data? deep learning to the rescue!.

    Association for the Advancement of Artificial Intelligence

    .
    Cited by: §1.
  • J. Briot, G. Hadjeres, and F. Pachet (2017) Deep learning techniques for music generation - a survey. ArXiv abs/1709.01620. Cited by: §1.
  • M. S. Cuthbert and C. Ariza (2010) music21: A toolkit for computer-aided musicology and symbolic music data. In Conference of the International Society of Music Information Retrieval (ISMIR), pp. 637–642. Cited by: §3.
  • A. Fang, A. Liu, P. Seetharaman, and B. Pardo (2020) Bach or mock? A grading function for chorales in the style of J.S. Bach. In Machine Learning for Media Discovery (ML4MD) Workshop at the International Conference on Machine Learning (ICML), Cited by: §1.
  • G. Hadjeres and L. Crestel (2020) Vector quantized contrastive predictive coding for template-based music generation. arXiv preprint arXiv:2004.10120. Cited by: §4.
  • G. Hadjeres, F. Pachet, and F. Nielsen (2017) DeepBach: a steerable model for Bach chorales generation. In Proceedings of the 34th International Conference on Machine Learning, pp. 1362–1371. Cited by: §3.
  • C. A. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, C. Hawthorne, A. M. Dai, M. D. Hoffman, and D. Eck (2019) An improved relative self-attention mechanism for transformer with application to music generation. In Proceedings of the 35th International Conference on Machine Learning (ICML), Cited by: §3.
  • Q. Kong, B. Tong, M. Klinkigt, Y. Watanabe, N. Akira, and T. Murakami (2019)

    Active generative adversarial network for image classification

    .
    In Association for the Advancement of Artificial Intelligence, Cited by: §1.
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Cited by: §1.