Deep Predictive Models in Interactive Music

01/31/2018 ∙ by Charles P. Martin, et al. ∙ UNIVERSITETET I OSLO 2

Automatic music generation is a compelling task where much recent progress has been made with deep learning models. In this paper, we ask how these models can be integrated into interactive music systems; how can they encourage or enhance the music making of human users? Musical performance requires prediction to operate instruments, and perform in groups. We argue that predictive models could help interactive systems to understand their temporal context, and ensemble behaviour. Deep learning can allow data-driven models with a long memory of past states. We advocate for predictive musical interaction, where a predictive model is embedded in a musical interface, assisting users by predicting unknown states of musical processes. We propose a framework for incorporating such predictive models into the sensing, processing, and result architecture that is often used in musical interface design. We show that our framework accommodates deep generative models, as well as models for predicting gestural states, or other high-level musical information. We motivate the framework with two examples from our recent work, as well as systems from the literature, and suggest musical use-cases where prediction is a necessary component.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 10

page 12

page 15

page 19

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

2 Prediction and Music

3 Predictive Interactive Music Designs

4 Conclusion

References

  • [1] N. S. Altman. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175–185, 1992. doi:10.1080/00031305.1992.10475879.
  • [2] C. Ames. Automated composition in retrospect: 1956-1986. Leonardo, 20(2):169–185, 1987. doi:10.2307/1578334.
  • [3] C. Ames. The Markov process as a compositional model: A survey and tutorial. Leonardo, 22(2):175–187, 1989. doi:10.2307/1575226.
  • [4] G. Assayag, G. Bloch, M. Chemillier, A. Cont, and S. Dubnov. Omax brothers: A dynamic topology of agents for improvization learning. In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, AMCMM ’06, pages 125–132, New York, NY, USA, 2006. ACM. doi:10.1145/1178723.1178742.
  • [5] G. Assayag and S. Dubnov. Using factor oracles for machine improvisation. Soft Computing, 8(9):604–610, Sep 2004. doi:10.1007/s00500-004-0385-4.
  • [6] M. Ben-Asher and C. Leider. Toward an emotionally intelligent piano: Real-time emotion detection and performer feedback via kinesthetic sensing in piano performance. In Proceedings of the International Conference on New Interfaces for Musical Expression, pages 21–24, Daejeon, Republic of Korea, May 2013. Graduate School of Culture Technology, KAIST. URL: http://nime.org/proceedings/2013/nime2013_48.pdf.
  • [7] J. A. Biles. Improvizing with genetic algorithms: Genjam. In E. R. Miranda and J. A. Biles, editors, Evolutionary Computer Music, pages 137–169. Springer London, London, 2007. doi:10.1007/978-1-84628-600-1_7.
  • [8] R. M. Bittner, B. McFee, J. Salamon, P. Li, and J. P. Bello. Deep salience representations for estimation in polyphonic music. In 18th International Society for Music Information Retrieval Conference, pages 63–70, 2017.
  • [9] M. Bretan, M. Cicconet, R. Nikolaidis, and G. Weinberg. Developing and composing for a robotic musician using different modes of interaction. In Proceedings of the International Computer Music Conference, 2012. URL: http://hdl.handle.net/2027/spo.bbp2372.2012.092.
  • [10] M. Bretan and G. Weinberg. A survey of robotic musicianship. Commun. ACM, 59(5):100–109, Apr. 2016. URL: http://doi.acm.org/10.1145/2818994, doi:10.1145/2818994.
  • [11] A. R. Brown and T. Gifford. Prediction and proactivity in real-time interactive music systems. Int. Workshop on Musical Metacreation, pages 35–39, 2013. URL: http://eprints.qut.edu.au/64500/.
  • [12] J. Bullock and A. Momeni. ml.lib: Robust, cross-platform, open-source machine learning for max and pure data. In E. Berdahl and J. Allison, editors, Proceedings of the International Conference on New Interfaces for Musical Expression, pages 265–270, Baton Rouge, Louisiana, USA, May 2015. Louisiana State University. URL: http://www.nime.org/proceedings/2015/nime2015_201.pdf.
  • [13] J.-P. Cáceres, R. Hamilton, D. Iyer, C. Chafe, and G. Wang. To the edge with China: Explorations in network performance. In ARTECH 2008: Proc. 4th Int. Conf. Digital Arts, pages 61–66, 2008.
  • [14] B. Caramiaux, N. Montecchio, A. Tanaka, and F. Bevilacqua. Adaptive gesture recognition with variation estimation for interactive systems. ACM Transactions on Interactive Intelligent Systems, 4(4):18:1–18:34, 2014. doi:10.1145/2643204.
  • [15] G. A. Carpenter, S. Grossberg, and D. B. Rosen. Fuzzy art: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Networks, 4(6):759 – 771, 1991. doi:https://doi.org/10.1016/0893-6080(91)90056-B.
  • [16] K. Chakraborty, K. Mehrotra, C. K. Mohan, and S. Ranka. Forecasting the behavior of multivariate time series using neural networks. Neural networks, 5(6):961–970, 1992.
  • [17] A. Clark. Whatever next? predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3):181–204, 2013. doi:10.1017/S0140525X12000477.
  • [18] B. A. Clegg, G. J. DiGirolamo, and S. W. Keele. Sequence learning. Trends in Cognitive Sciences, 2(8):275–281, 1998. URL: http://dx.doi.org/10.1016/S1364-6613(98)01202-9, doi:10.1016/S1364-6613(98)01202-9.
  • [19] D. Conklin. Music generation from statistical models. In Proceedings of the AISB 2003 Symposium on Artificial Intelligence and Creativity in the Arts and Sciences, pages 30–35, 2003.
  • [20] R. B. Dannenberg and C. Raphael. Music score alignment and computer accompaniment. Commun. ACM, 49(8):38–43, Aug. 2006. URL: http://doi.acm.org/10.1145/1145287.1145311, doi:10.1145/1145287.1145311.
  • [21] S. Davies. Themes in the Philosophy of Music. Oxford University Press, Oxford, UK, 2005.
  • [22] J. Drummond. Understanding interactive systems. Organised Sound, 14:124–133, 8 2009. doi:10.1017/S1355771809000235.
  • [23] S. Dubnov, G. Assayag, O. Lartillot, and G. Bejerano. Using machine-learning methods for musical style modeling. Computer, 36(10), October 2003. doi:10.1109/MC.2003.1236474.
  • [24] D. Eck and J. Schmidhuber. Finding temporal structure in music: Blues improvisation with LSTM recurrent networks. In Proc. 12th IEEE Workshop on Neural Networks for Signal Processing, pages 747–756, 2002. doi:10.1109/NNSP.2002.1030094.
  • [25] J. Engel, C. Resnick, A. Roberts, S. Dieleman, D. Eck, K. Simonyan, and M. Norouzi. Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders. ArXiv e-prints, Apr. 2017. URL: https://arxiv.org/abs/1704.01279.
  • [26] S. S. Fels and G. E. Hinton. Glove-talkii-a neural-network interface which maps gestures to parallel formant speech synthesizer controls. IEEE Transactions on Neural Networks, 9(1):205–212, Jan 1998. doi:10.1109/72.655042.
  • [27] R. Fiebrink. Machine learning as meta-instrument: Human-machine partnerships shaping expressive instrumental creation. In T. Bovermann, A. de Campo, H. Egermann, S.-I. Hardjowirogo, and S. Weinzierl, editors, Musical Instruments in the 21st Century: Identities, Configurations, Practices, pages 137–151. Springer Singapore, Singapore, 2017. doi:10.1007/978-981-10-2951-6_10.
  • [28] R. Fiebrink, P. R. Cook, and D. Trueman. Human model evaluation in interactive supervised learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, pages 147–156, New York, NY, USA, 2011. ACM. doi:10.1145/1978942.1978965.
  • [29] R. Fiebrink, D. Trueman, and P. R. Cook. A meta-instrument for interactive, on-the-fly machine learning. In Proceedings of the International Conference on New Interfaces for Musical Expression, NIME ’09, pages 280–285, 2009. URL: http://www.nime.org/proceedings/2009/nime2009_280.pdf.
  • [30] N. Gillian, R. B. Knapp, and S. O’Modhrain. A machine learning toolbox for musician computer interaction. In A. R. Jensenius, A. Tveit, R. I. Godøy, and D. Overholt, editors, Proceedings of the International Conference on New Interfaces for Musical Expression, NIME ’11, pages 343–348, Oslo, Norway, 2011. University of Oslo. URL: http://www.nime.org/proceedings/2011/nime2011_343.pdf.
  • [31] A. Graves. Generating Sequences With Recurrent Neural Networks. ArXiv e-prints, Aug. 2013. URL: https://arxiv.org/abs/1308.0850, arXiv:1308.0850.
  • [32] D. Ha and D. Eck. A Neural Representation of Sketch Drawings. ArXiv e-prints, Apr. 2017. URL: https://arxiv.org/abs/1704.03477.
  • [33] G. Hadjeres, F. Pachet, and F. Nielsen. DeepBach: a steerable model for Bach chorales generation. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1362–1371, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL: http://proceedings.mlr.press/v70/hadjeres17a.html.
  • [34] P. Hamel and D. Eck. Learning features from music audio with deep belief networks. In Proceedings of the 11th International Society for Music Information Retrieval Conference, ISMIR ’10, pages 339–344, 2010.
  • [35] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag New York, New York, USA, 2009. doi:10.1007/978-0-387-84858-7.
  • [36] J. Hawkins and S. Blakeslee. On Intelligence. Times Books, New York, NY, USA, 2004.
  • [37] A. Hawryshkewich, P. Pasquier, and A. Eigenfeldt. Beatback: A real-time interactive percussion system for rhythmic practise and exploration. In Proceedings of the International Conference on New Interfaces for Musical Expression, pages 100–105, Sydney, Australia, 2010. URL: http://www.nime.org/proceedings/2010/nime2010_100.pdf.
  • [38] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997. doi:10.1162/neco.1997.9.8.1735.
  • [39] A. Hunt, M. M. Wanderley, and M. Paradis. The importance of parameter mapping in electronic instrument design. Journal of New Music Research, 32(4):429–440, 2003. doi:10.1076/jnmr.32.4.429.18853.
  • [40] N. Jaques, S. Gu, D. Bahdanau, J. M. Hernández-Lobato, R. E. Turner, and D. Eck. Sequence tutor: Conservative fine-tuning of sequence generation models with KL-control. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1645–1654, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL: http://proceedings.mlr.press/v70/jaques17a.html.
  • [41] Z. Jin, R. Oda, A. Finkelstein, and R. Fiebrink. Mallo: A distributed synchronized musical instrument designed for internet performance. In Proceedings of the International Conference on New Interfaces for Musical Expression, NIME ’15, pages 293–298, 2015. URL: http://www.nime.org/proceedings/2015/nime2015_223.pdf.
  • [42] A. Karpathy. The unreasonable effectiveness of recurrent neural networks. Blog post, May 2015. URL: http://karpathy.github.io/2015/05/21/rnn-effectiveness/.
  • [43] J. Lazzaro and J. Wawrzynek. A case for network musical performance. In Proceedings of the 11th International Workshop on Network and Operating Systems Support for Digital Audio and Video, NOSSDAV ’01, pages 157–166, New York, NY, USA, 2001. ACM. doi:10.1145/378344.378367.
  • [44] M. Lee, A. Freed, and D. Wessel. Real-time neural network processing of gestural and acoustic signals. In Proceedings of the International Computer Music Conference. International Computer Music Association, 1991.
  • [45] B. Levy, G. Bloch, and G. Assayag. Omaxist dialectics: Capturing, visualizing and expanding improvisations. In Proceedings of the International Conference on New Interfaces for Musical Expression, Ann Arbor, Michigan, 2012. University of Michigan. URL: http://www.nime.org/proceedings/2012/nime2012_87.pdf.
  • [46] H. Lim, S. Rhyu, and K. Lee. Chord generation from symbolic melody using BLSTM networks. In Proceedings of the 18th International Society for Music Information Retrieval Conference, 2017. URL: https://arxiv.org/abs/1712.01011.
  • [47] I. Malik and C. H. Ek. Neural translation of musical style. ArXiv e-prints, Aug. 2017. URL: https://arxiv.org/abs/1708.03535.
  • [48] Y. Mann. Ai duet. Interactive Web Page, 2016. URL: https://aiexperiments.withgoogle.com/ai-duet.
  • [49] M. Marchini, F. Pachet, and B. Carré. Rethinking reflexive looper for structured pop music. In Proceedings of the International Conference on New Interfaces for Musical Expression, pages 139–144, Copenhagen, Denmark, 2017. Aalborg University Copenhagen.
  • [50] C. Martin, H. Gardner, and B. Swift. Tracking ensemble performance on touch-screens with gesture classification and transition matrices. In E. Berdahl and J. Allison, editors, Proceedings of the International Conference on New Interfaces for Musical Expression, NIME ’15, pages 359–364, Baton Rouge, LA, USA, 2015. Louisiana State University. URL: http://www.nime.org/proceedings/2015/nime2015_242.pdf.
  • [51] C. P. Martin, K. O. Ellefsen, and J. Torresen. Deep models for ensemble touch-screen improvisation. In Proceedings of the 12th International Audio Mostly Conference on Augmented and Participatory Sound and Music Experiences, AM ’17, pages 4:1–4:4, New York, NY, USA, 2017. ACM. doi:10.1145/3123514.3123556.
  • [52] C. P. Martin and J. Torresen. Exploring social mobile music with tiny touch-screen performances. In T. Lokki, J. Pätynen, and V. Välimäki, editors, Proceedings of the 14th Sound and Music Computing Conference, SMC ’17, pages 175–180, Espoo, Finland, 2017. Aalto University. URL: http://smc2017.aalto.fi/media/materials/proceedings/SMC17_p175.pdf.
  • [53] C. P. Martin and J. Torresen. RoboJam: A musical mixture density network for collaborative touchscreen interaction. In A. Liapis, J. J. Romero Cardalda, and A. Ekárt, editors, Computational Intelligence in Music, Sound, Art and Design: International Conference, EvoMUSART, volume 10783 of Lecture Notes in Computer Science, Switzerland, Apr. 2018. Springer International Publishing. URL: http://arxiv.org/abs/1711.10746, arXiv:1711.10746, doi:10.1007/978-3-319-77583-8_11.
  • [54] K. A. McMillen. Stage-worthy sensor bows for stringed instruments. In Proceedings of the International Conference on New Interfaces for Musical Expression, pages 347–348, Genoa, Italy, 2008. URL: http://www.nime.org/proceedings/2008/nime2008_347.pdf.
  • [55] N. Mor, L. Wolf, A. Polyak, and Y. Taigman. A Universal Music Translation Network. ArXiv e-prints, May 2018. URL: https://arxiv.org/abs/1805.07848.
  • [56] M. C. Mozer. Neural network music composition by prediction: Exploring the benefits of psychoacoustic constraints and multi-scale processing. Connection Science, 6(2-3):247–280, 1994. doi:10.1080/09540099408915726.
  • [57] H.-W. Ng, V. D. Nguyen, V. Vonikakis, and S. Winkler. Deep learning for emotion recognition on small datasets using transfer learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI ’15, pages 443–449, New York, NY, USA, 2015. ACM. doi:10.1145/2818346.2830593.
  • [58] R. Oda, A. Finkelstein, and R. Fiebrink. Towards note-level prediction for networked music performance. In Proceedings of the International Conference on New Interfaces for Musical Expression, NIME ’13, pages 94–97, 2013. URL: http://nime.org/proceedings/2013/nime2013_258.pdf.
  • [59] N. Orio, S. Lemouton, and D. Schwarz. Score following: state of the art and new developments. In Proceedings of the International Conference on New Interfaces for Musical Expression, NIME ’03, pages 36–41, Montreal, Canada, 2003. McGill University. URL: http://www.nime.org/proceedings/2003/nime2003_036.pdf.
  • [60] F. Pachet. The continuator: Musical interaction with style. Journal of New Music Research, 32(3):333–341, 2003. doi:10.1076/jnmr.32.3.333.16861.
  • [61] F. Pachet, P. Roy, J. Moreira, and M. d’Inverno. Reflexive loopers for solo musical improvisation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’13, pages 2205–2208, New York, NY, USA, 2013. ACM. doi:10.1145/2470654.2481303.
  • [62] K. A. Pati, S. Gururani, and A. Lerch. Assessment of student music performances using deep neural networks. Applied Sciences, 8(4), 2018. doi:10.3390/app8040507.
  • [63] L. S. Petroa and L. Mucklia. The brain’s predictive prowess revealed in primary visual cortex. Proceedings of the National Academy of Sciences, 113(5), 2016. doi:10.1073/pnas.1523834113.
  • [64] C. Raphael. Orchestra in a box: A system for real-time musical accompaniment. In Proc. Int. Joint Conf. on AI (Working Notes of IJCAI-03 Rencon Workshop), pages 5–10, 2003. URL: http://music.informatics.indiana.edu/~craphael/papers/ijcai03.pdf.
  • [65] A. Roberts, J. Engel, C. Raffel, C. Hawthorne, and D. Eck. A hierarchical latent vector model for learning long-term structure in music. arXiv preprint arXiv:1803.05428, 2018.
  • [66] S. Ross and N. C. Hansen. Dissociating prediction failure: Considerations from music perception. Journal of Neuroscience, 36(11):3103–3105, 2016. doi:10.1523/JNEUROSCI.0053-16.2016.
  • [67] R. Rowe. Interactive Music Systems: Machine Listening and Composing. The MIT Press, 1993. URL: https://wp.nyu.edu/robert_rowe/text/interactive-music-systems-1993/.
  • [68] M. Sarkar and B. Vercoe. Recognition and prediction in a network music performance system for indian percussion. In Proceedings of the International Conference on New Interfaces for Musical Expression, NIME ’07, pages 317–320, 2007. doi:10.1145/1279740.1279809.
  • [69] M. Schedel, P. Perry, and R. Fiebrink. Wekinating 000000swan: Using machine learning to create and control complex artistic systems. In Proceedings of the International Conference on New Interfaces for Musical Expression, NIME ’11, pages 453–456, 2011. URL: http://www.nime.org/proceedings/2011/nime2011_453.pdf.
  • [70] I. Simon, D. Morris, and S. Basu. Mysong: Automatic accompaniment generation for vocal melodies. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’08, pages 725–734, New York, NY, USA, 2008. ACM. URL: http://doi.acm.org/10.1145/1357054.1357169, doi:10.1145/1357054.1357169.
  • [71] I. Simon and S. Oore. Performance rnn: Generating music with expressive timing and dynamics. https://magenta.tensorflow.org/performance-rnn, 2017.
  • [72] B. D. Smith and G. E. Garnett. The self-supervising machine. In Proceedings of the International Conference on New Interfaces for Musical Expression, pages 108–111, Oslo, Norway, 2011. URL: http://www.nime.org/proceedings/2011/nime2011_108.pdf.
  • [73] J. Snyder. The birl: Adventures in the development of an electronic wind instrument. In T. Bovermann, A. de Campo, H. Egermann, S.-I. Hardjowirogo, and S. Weinzierl, editors, Musical Instruments in the 21st Century: Identities, Configurations, Practices, pages 181–205. Springer Singapore, Singapore, 2017. doi:10.1007/978-981-10-2951-6_13.
  • [74] J. Snyder and D. Ryan. The birl: An electronic wind instrument based on an artificial neural network parameter mapping structure. In Proceedings of the International Conference on New Interfaces for Musical Expression, pages 585–588, London, United Kingdom, 2014. Goldsmiths, University of London. URL: http://www.nime.org/proceedings/2014/nime2014_540.pdf.
  • [75] B. L. Sturm and O. Ben-Tal. Taking the models back to music practice: Evaluating generative transcription models built using deep learning. Journal of Creative Music Systems, 2(1), 2017. URL: http://jcms.org.uk/issues/Vol2Issue1/taking-models-back-to-music-practice/article.html.
  • [76] B. L. Sturm, J. F. Santos, O. Ben-Tal, and I. Korshunova. Music transcription modelling and composition using deep learning. In Proceedings of the 1st Conference on Computer Simulation of Musical Creativity, 2016. URL: http://arxiv.org/abs/1604.08723.
  • [77] R. Sun and C. L. Giles. Sequence learning: From recognition and prediction to sequential decision making. IEEE Intelligent Systems, 16(4):67–70, 2001. URL: http://dx.doi.org/10.1109/MIS.2001.1463065, doi:10.1109/MIS.2001.1463065.
  • [78] A. Tidemann, P. Öztürk, and Y. Demiris. A groovy virtual drumming agent. In Z. Ruttkay, M. Kipp, A. Nijholt, and H. H. Vilhjálmsson, editors, Intelligent Virtual Agents, IVA 2009, volume 5773 of Lecture Notes in Computer Science, pages 104–117. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009. doi:10.1007/978-3-642-04380-2_14.
  • [79] A. Van, B. Caramiaux, and A. Tanaka. PiaF: A tool for augmented piano performance using gesture variation following. In Proceedings of the International Conference on New Interfaces for Musical Expression, NIME ’14, pages 167–170, 2014. URL: http://www.nime.org/proceedings/2014/nime2014_511.pdf.
  • [80] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. W. Senior, and K. Kavukcuoglu. Wavenet: A generative model for raw audio. ArXiv e-prints, abs/1609.03499, Sept. 2016. URL: http://arxiv.org/abs/1609.03499, arXiv:1609.03499.
  • [81] B. Yuksel, D. Afergan, E. Peck, G. Griffin, L. Harrison, N. Chen, R. Chang, and R. Jacob. BRAAHMS: A novel adaptive musical interface based on users’ cognitive state. In Proceedings of the International Conference on New Interfaces for Musical Expression, NIME ’15, pages 136–139, 2015. URL: http://www.nime.org/proceedings/2015/nime2015_243.pdf.