Popular social media apps for mobile devices have allowed millions of users to engage with creative production of images and text. These devices’ cameras, touch-screens, powerful processors, and portability suggest on-the-go creativity, and it would appear that straightforward sharing with friends, or a wider network of followers, is a key factor in encouraging users to create content of all forms. Given the many affordances of mobile devices, it has been well noted that they are suitable platforms for mobile music making (Tanaka, 2010). Despite many creative mobile digital musical instruments (DMIs) appearing in recent years, we have yet to see the widespread adoption of musical creation as an integrated element of social media. Furthermore, few musical apps have attempted to emphasise ensemble, rather than individual, performance, even though group music-making is often seen as a valuable social activity.
In this article, we present the design for MicroJam111Source code and further information about MicroJam is available online (Martin, 2018)., a collaborative and social mobile music-making app, and an analysis of more than 1600 touchscreen performances that have been created so far. The design of MicroJam emphasises casual, frequent, and social performance. As shown in Figure 1, the app features a very simple touch-screen interface for making electronic music where skill is not a necessary prerequisite for interaction. MicroJam departs from other touchscreen instruments by imposing limits on the musical compositions that are possible; most importantly, performances are limited to five seconds in length. These “tiny” performances are uploaded automatically to encourage improvisation and creation rather than editing. Users can reply to others’ performances by recording a new layer, this combines social interaction with ensemble music making.
In Section 2 we will motivate MicroJam’s design with a discussion of music-making in social media, and the possibilities for asynchronous and distributed collaborations with mobile musical interfaces. In Section 3 we will describe the app’s design and the interactive music mappings of the eight synthesised instruments that are made available in its interface. We also formalise the concept of “tiny” touchscreen performances. In Section 4 we examine a dataset of more than 1600 tiny performances saved on the app’s cloud database by testers and early users. We consider this dataset from several levels of abstraction: individual touchscreen interaction events, aggregated touchscreen gestures, and whole performances. This analysis allows us to draw conclusions about how these users perform in MicroJam, and to characterise the musical behaviour in the tiny performance format. While our app design draws on themes introduced by other authors, this is the first time that a systematic and data-driven analysis of a large dataset of touchscreen performances has been published for such a system. Our findings, then, will be useful for revisions of MicroJam, and could also inform the design of other touchscreen music applications, as well as studies of other types of DMIs.
This article is a revised and extended version of a previous conference paper “Exploring Social Mobile Music with Tiny Touch-Screen Performances” (Martin and Torresen, 2017). That paper introduced the prototype design of MicroJam and motivated the idea of social music making. In this new research, we present the fully developed MicroJam app, and empirical research based on examination of more than 1600 touchscreen performances made using the app.
Commodity mobile devices such as smartphones and tablets have often been reframed as DMIs for research, artistic exploration, and entertainment, forming the field of mobile music (Gaye et al., 2006). The sensors and multitouch screens of smartphones provide many affordances for new kinds of musical software (Essl and Rohs, 2009; Tanaka, 2010) and the ubiquity of these devices increases the possibility of musical participation by a wide audience (Essl and Lee, 2018). Most mobile music DMIs have used the touchscreen as an expressive musical controller. Some DMIs, such as Magic Fiddle (Wang et al., 2011) imitate existing musical instrument, but others, such as Crackle (Reus, 2011), or TC-11 (Schlei, 2012), have defined new ways to connect interaction on the touchscreen to sound synthesis algorithms.
While the design and evaluation of mobile music DMIs have been reported in the academic literature (see, e.g., (Essl and Lee, 2018; John, 2013; Gaye et al., 2006) for surveys), few analyses exist of the music created with these systems. Evaluations of these systems tend to focus on either the design and gesture-to-sound mappings (e.g., (d’Alessandro et al., 2012)), or on the experience of the musicians using the apps (e.g., (Martin et al., 2016)). It has previously been argued that archives of touchscreen control data can go beyond audio in terms of analysis of tablet musical performances (Martin and Gardner, 2016). Similar data from motion capture systems has previously been used as a basis for analysis of human interactions between movement and sound (Kelkar and Jensenius, 2018). In this work, we perform an analysis on mobile app touchscreen data to understand how users interact musically with MicroJam.
2.1. Social Music-Making and Constraints
Many social media platforms emphasise the value of constrained contributions by users. Twitter famously limited written notes to 140 characters (Gligorić et al., 2018), while Instagram constrained images to square format, and the (now defunct) Vine platform only allowed six second micro-videos (Redi et al., 2014). Constraints such as these are often thought to lead to increased variability and creativity in the arts (Stokes, 2008) as well as in DMIs (Gurevich et al., 2012). Posts in these services are intended to be frequent, casual, and ephemeral, and it could be that these constrained formats have helped these apps to attract millions of users and encourage their creativity in the written word or photography. While social media is often used to promote music (Dewan and Ramaprasad, 2014), music making has yet to become an important creative part of the social media landscape.
While music is often seen as an activity where accomplishment takes practice and concerted effort, casual musical experiences are well-known to be valuable and rewarding creative activities. Accessible music making, such as percussion drum circles, can be used for music therapy (Scheffel and Matney, 2014). DMIs such as augmented reality instruments (Correa et al., 2009) and touch-screen instruments (Favilla and Pedell, 2013) have also been used for therapeutic and casual music-making. In the case of DMIs, the interface can be designed to support creativity of those without musical experience or with limitations on motor control. Apps such as Ocarina (Wang, 2014) and Pyxis Minor (Barraclough et al., 2015) have shown that simple touch-screen interfaces can be successful for exploration by novice users as well as supporting sophisticated expressions by practised performers.
Some mobile music apps have included aspects of social music-making. Smule’s Leaf Trombone app introduced the idea of a “world stage” (Wang et al., 2015). In this app, users would perform renditions of well-known tunes on a unique trombone-like DMI. Users from around the world were then invited to critique renditions with emoticons and short text comments. World stage emphasised the idea that while the accuracy of a rendition could be rated by the computer, only a human critic could tell if it was ironic or funny. Indeed, Leaf Trombone, and other Smule apps have made much progress in integrating musical creation with social media.
2.2. Jamming through Space and Time
|Same Location||Mobile Ensembles||Locative Performance|
|Mobile Phone Orchestra (Oh et al., 2010)||AuRal (Allison and Dell, 2012)|
|Ensemble Metatone (Martin et al., 2015)||Tactical Sound Garden Toolkit (Shepard, 2007)|
|Different Location||Networked Performance||Glee Karaoke (Hamilton et al., 2011)|
|Magic Piano (Wang, 2016)||MicroJam Performances.|
|Same Time||Different Time|
While performance and criticism is an important social part of music-making, true musical collaboration involves performing music together. These experiences of group creativity can lead to the emergence of qualities, ideas, and experiences that cannot be easily explained by the actions of the individual participants (Sawyer, 2006). Mobile devices have often been used in ensemble situations such as MoPho (Stanford Mobile Phone Orchestra) (Oh et al., 2010), Viscotheque (Swift, 2013), Pocket Gamelan (Schiemer and Havryliv, 2007), ChoirMob (d’Alessandro et al., 2012) and Ensemble Metatone (Martin et al., 2015); however, in these examples, the musicians played together in a standard concert situation.
Given that mobile devices are often carried by users at all times, it would be natural to ask whether mobile device ensemble experiences can be achieved even when performers are not in a rehearsal space or concert venue. Could users contribute to ensemble experiences at a time and place that is convenient to them? The use of computer interfaces to work collaboratively even when not in the same space and time has been extensively discussed. In HCI, groupware systems have been framed using a time-space matrix to address how they allow users to collaborate in the same and different times and places (Greenberg and Roseman, 1998). For many work tasks, it is now common to collaborate remotely and at different times using tools such as Google Docs or Git; however, distributed and asynchronous musical collaboration is not as widely accepted.
In Table 1, we have applied the time-space matrix to mobile musical performance. Conventional collaborative performances happen at the same time and location. Even with mobile devices, most collaboration has occurred in this configuration. Collaborations with performers distributed in different locations but performing at the same time are often called networked musical performances (Carôt et al., 2007). Early versions of Smule’s Magic Piano (Wang, 2016) iPad app included the possibility of randomly assigned, real-time duets with other users. Networked performances are also possible with conventional mobile DMIs and systems for real-time audio and video streaming.
Performance with participants in different times, the right side of Table 1, are less well-explored than those on the left. One stream of mobile music, locative performance (Behrendt, 2012), has led to works that emphasise geographical location as an important input to a musical process such as Location33 (Carter and Liu, 2005), Net_dérive (Tanaka and Gemeinboeck, 2008), or Sonic City (Gaye et al., 2003). In other works, such as AuRal (Allison and Dell, 2012), or Tactical Sound Garden (Shepard, 2007), users’ interactions were stored at their location, allowing collaborations in a certain space, but separate in time.
The final area of the matrix involves music-making with performers that are in different places and different times. Glee Karaoke (Hamilton et al., 2011) allows users to upload their sung renditions of popular songs, and add layers to other performers’ contributions. The focus in this app, however, is on singing along with a backing track, and the mobile device does not really function as a DMI but an audio recorder. These limitations rule out many musical possibilities such as assembling orchestras of many remote participants and improvisation. More conventional DAWs (digital audio workstations) are also available on mobile devices (Meikle, 2016). Some of these apps (e.g., KORG’s Gadget) offer social or collaboration features, such as uploading whole tracks or short audio clips that other users can incorporate into their compositions. Our app, MicroJam, fits into this lower-right quadrant, and is distinguished from these other examples due to its focus on constrained and ephemeral music making, as well as online collaboration. In the next section, we will describe how this new app enables distributed and asynchronous collaboration on original musical material.
MicroJam is an app for creating, sharing, and collaborating with tiny touch-screen musical performances. This app has been specifically created to interrogate the possibilities for collaborative mobile performance that spans space and time. While these goals are lofty, the design of MicroJam has been kept deliberately simple. The main screen (see Figure 2) recalls social-media apps for sharing images. Musical performances in MicroJam are limited to very short interactions, encouraging frequent and ephemeral creative contribution. MicroJam is an iOS app written in Swift and uses Apple’s CloudKit service for backend cloud storage. The source code is freely available for use and modification by other researchers and performers (Martin, 2018). In this section we will discuss the design of the app, the format of the tiny musical performances that can be created, and the synthesised instruments that are available.
MicroJam allows users to do three primary activities (shown in Figure 2): browse and listen to other users’ performances, create and share new performances using the touch screen; and record layers on top of previously shared performances. The interface for creative new performances is in the centre of Figure 2 and is called jam!. This screen features a square touch performance area which is initially blank. Tapping, swirling, or swiping anywhere in this area will create sounds and also start recording touch activity. All touch interaction in this area is visualised with a simple paint-style drawing that follows the user’s touches. Touch interactions are simultaneously sent to a synthesised instrument to be sonified. After five seconds of touch interaction, the recording is automatically stopped (although the performer can continue to interact with the touch area). The recording can be subsequently replayed by tapping a “play” icon, or looped with the circular arrow icon. Users of MicroJam can choose the instrument used to sonify their interactions in the jam interface from a button in the top right. These synthesised instruments map a stream of touchscreen events—the location of a touch, and whether it is the start (touch down), or continuation (touch moved) of a previous gesture—to sound. The timbres and synthesis mappings are different for each instrument and described in Section 3.3 below.
Previously recorded performances, and those recorded by other users and downloaded from the server, are listed in the world screen as shown on the left side of Figure 2. Each performance is represented by a visual trace of the touch-drawing captured during recording and the contributor’s online handle. Any one of these performances can be played back in place in the world screen. When playing back, both the sound and visualised touch-interactions are replayed in the touch-area.
When viewing a previously saved performance in the world screen, the user can tap the reply icon (a curved arrow), to open a new layer on top of the recording. As shown in the right side of Figure 2, the previous as well as current touch-visualisations are shown and each layer is sonified separately. Multiple replies in MicroJam are possible which can result in several layers of performances being played back at once, allowing complex compositions.
3.2. Tiny touchscreen performances
MicroJam is intended to provide a musical experience where constraints are applied to the user’s interaction to increase their creativity and lower the barriers of entry for musical performance. We argued in Section 2.1 that constraints in social creativity systems could actually enhance users’ creative power. In the context of a musical app, these constraints could lead to more frequent interactions and possibly higher creativity due to the lower stakes and effort. Musical interactions in MicroJam are similarly constrained to be tiny touch-screen performances as they are limited in the area and duration of interaction. We define a tiny performance as follows:
All touches take place in a square subset of the touch-screen.
Duration of the performance is five seconds.
Only one simultaneous touch is recorded at a time.
Touch gesture data is recorded for replaying the performance.
Such performances require very little effort on the part of users. While some users may find it difficult to conceive and perform several minutes of music on a touch-screen device, five seconds is long enough to express a short idea, but short enough to leave users wanting to create another recording. It has been argued that five seconds is enough time to express a sonic object and other salient musical phenomena. (Godøy et al., 2010). While the limitation to a single touch may seem unnecessary on today’s multi-touch devices, this stipulation limits tiny performances to monophony. In order to create more complex texture or harmony, performers must collaborate, or record multiple layers themselves.
For playback, storage, and transmission to other users, tiny touch-screen performances are recorded as simple comma-separated value files of recorded touch gestures. This data format records the user’s performance movements in a compact manner (typical size is around 5kB), rather than the actual sound or abstract musical values such as notes and rests. The performance can later be recreated by sending these same touch event signals to MicroJam’s synthesised instruments. The data format records each touch interaction’s time (as an offset in seconds from the start of the performance), whether the touch was moving or not, and locations, as well as touch pressure (), an example is shown in Table 2. The visual trace of performances is also stored as a PNG image for later use in the app, although this can also be reconstructed from the touch data.
Storing gestural control data, rather than audio, or high-level musical data such as MIDI, allows performances to benefit from updated synthesis routines in the app, and future enhancements to the visual interface. As noted in previous work on preserving touchscreen improvisation (Martin and Gardner, 2016), this representation encodes information that might not be available in audio or MIDI recordings. In Section 4 we take advantage of this information to study tiny performances from a touch-gesture perspective.
3.3. Instruments in MicroJam
MicroJam includes eight different instruments (see Table 3
) that map touches in the performance area to different synthesised sounds. This selection of instruments provides basic coverage of typical musical roles such as percussion (drums), bass (fmlead, wub), accompaniment (pad, keys), and lead (chirp, strings). While far from an exhaustive collection, these instruments allow exploration of different musical ideas, both through their different timbres as well as the different touch to sound mappings used in each one. Descriptions and mapping details for each instrument is given in Table3.
The instruments are implemented in Pure Data and make use of the rjlib library (Barknecht, 2011) for some synthesis routine implementations. They are loaded in the app via libpd (Brinkmann et al., 2011). The instruments are controlled by continuous inputs of , and values from the single touch point in performance area, and each instrument is responsible for its specific mappings of these values to higher level musical quantities such as pitch, timbre and control of audio effects. For example, the fmlead sound, a simple FM synthesiser, maps initial x-values from a tap or swipe to a bass register pitch. The y-value is mapped to volume. The initial touch of a swipe triggers a very short note, but continuing a swipe sustains the sound until the touch is released. For continuous swipes, the distance of the present touch-point to the initial one ( and ) is calculated; is used to control pitch bend, and is mapped to reverb and delay effects as well as mixing in a copy of the sound in a lower octave. A similar mapping scheme is used for the other pitched sounds, see Table 3 for precise details. For the “drum” sound, different synthesised drumset sounds (bass drum, snare drum, hihat, and crash cymbal) are triggered from each quadrant of the screen and the swipes trigger a roll.
|chirp||Basic sine oscillator sound.||pitch||mix lower octave||pitch bend|
|drums||Drum sounds (bass drum, snare drum, hihat, crash cymbal) triggered from each quadrant. Swipes trigger rolls.||instrument||instrument||pitch bend||reverb/delay fx|
|fmlead||Simple two operator FM synthesiser that plays in a bass register.||pitch||volume||pitch bend||mix lower octave, reverb/delay fx|
|keys||Phase modulation keyboard sound.||pitch||volume||pitch bend||modulation|
|pad||Sawtooth wave pad synth.||pitch||volume||pitch bend||tremolo, reverb/delay fx|
|quack||Wave packet synth with analogue-like sound.||pitch||timbre||pitch bend|
|strings||Karplus-Strong plucked guitar synthesiser.||pitch||volume||pitch bend||reverb/delay fx|
|wub||Tremolo “wub wub” bass sound.||pitch, timbre||tremolo rate, timbre||pitch bend|
4. Studying Tiny Performances
In this section we describe an investigation of how users interact with MicroJam through the lens of the tiny performances that have been collected in the app. Since early prototypes of MicroJam were developed, the app has been distributed and demonstrated among researchers, students, and the music technology community, and these testers and early users uploaded performances to the app’s cloud database. To date, more than 1600 tiny performances are available in this database, allowing us to gain insight into the musical potential of MicroJam and the concept of tiny touchscreen performances.
Our analysis of the tiny performances is made at three levels of abstraction: whole performances, individual touch events, and gestures or groups of touch events. At the performance level, we consider descriptors of each complete 5-second performance and how these vary by the instrument that was used, we also analyse the visual trace of the resulting performance. At the touch event level, we consider individual touch events that may be part of a swipe across the screen, or single taps. At the gesture level, we consider groups of these touch events that constitute a single “note” or interaction: either individual taps, or the collection of events that form a swipe. This lets us examine the different types of movements that users perform. These levels reveal different aspects of the users’ performance behaviour; while the whole performance level demonstrates their broad artistic intentions, the gestural and touch levels expose small-scale interactions.
4.1. Participants and Data Sources
Tiny performances for this investigation came from two databases: a development database which is only accessible from instances of MicroJam installed on the authors’ test devices, and a public database accessible from beta and published versions of MicroJam. Performances in the development database were made at testing and demonstration sessions taking place in our lab, at conferences, and at other events. Most of the participants in these sessions were university students and researchers in music technology or computer science. These participants had a range of music experience from untrained to professional. The public database was accessed from beta versions of the app as well as the published app store version. Beta versions were distributed to interested members of the computer music and technology community who requested invitations through social media and at conferences. Since the public release of the app, performances in this database have been contributed by unknown members of the public.
Within our dataset 39 unique CloudKit users are represented who created a total of 1626 performances. We can identify performances created by devices under our control and so know that 431 were created on the first author’s devices and 723 performances are from devices used for testing and demonstration sessions. The remaining 903 performances are from unknown users. We have chosen to include performances by the authors as part of an aggregated dataset, although these could be treated separately as autobiographical research (Neustaedter and Sengers, 2012).
4.2. Performance Level Investigation
The visualisations of a subset of tiny performances are reproduced in Figure 3. This figure shows the variety of touch-interaction styles that have been generated by performers. Many of the interactions are abstract, resembling scribbles that show the user experimenting with the synthesis mapping of the jamming interface. Some performances contain patterns where performers have repeated rhythmic motions in different parts of the touch area. A number of the performances are recognisable images: figures, faces, and words. We can characterise the visual appearance of performances under the following broad styles: taps, swipes, long swipe, mixture, image, and text. Tap and swipe performances focus on these fundamental touch gestures, while long swipe performances seem to consist of only one swipe, and mixture performance include multiple of these styles. Image and text performances appear to focus on the visual meaning of the finished trace, likely with less emphasis given to the resulting sound.
Of the 1626 performances in the dataset, 479, or 29%, are replies. As replies of replies are possible in MicroJam, it is interesting to see how long potential chains of replies are in terms of number of performance layers. Of the 479, 361 have just two layers. While there are 83 performances with 3 layers, there are few with 6 and 7 layers (3 each). This data suggests that while users have made some use of the reply function, they have only rarely explored the potential for complex layered performances. Further work in developing the browsing screen could help encourage users to create longer performances; for instance, the number of layers in a performance could be shown, with complex performances highlighted while browsing.
While performances with all instruments are represented in the dataset, the most popular instrument is chirp (the default choice) with 342 performances. Figure 4 shows that the next five most popular instruments (drums, keys, quack, strings, wub) all have around 200 performances, but that fmlead and pad each have less that 100. It is not immediately clear why these two instruments are less popular, but it may be that their sound is less distinctive that the other instruments, particularly through a mobile device’s small speaker.
4.3. Analysis by Touch Event
The dataset includes 249,870 touch events. As set out in the tiny performance definition in Section 3.2, a touch event can either be a touch-down (when the user’s finger hits the touchscreen), or a touch-moved (when the user’s finger has moved without leaving the touchscreen). These are distinguished by a binary value, “moving”, in the dataset. Only 13,480 touch events in the dataset are non-moving compared to 236,390 moving touches, this is because whenever a user swipes during a performance a large number of moving touches are generated, while only one touch-down occurs.
Figure 4(a) shows the distribution of the number of touch events recorded per performance divided by instrument used. This shows that for all instruments except drums performances had a median of between 100 and 200 touches. The median number of touch events for drums was 48, much lower than the other instruments. This is explained by Figure 4(b) which shows the proportion of touch events that were moving in each performance. For drums, many more touch events were taps, rather than swipes, which resulted in fewer touch events for a given performance.
Two interesting statistics are the time differences between consecutive touch events (), and the onscreen distance between them measured as a proportion of the performance area. Distributions of these statistics are shown in Figure 6. As expected, values of
and distance for moving touches tend to be small, although there are many outliers. The medianbetween moving touches is only 0.017s compared with 0.221s for non-moving touches. The interquartile range of for non-moving touches is 0.116s—0.385s, this gives an indication that performers tend not to leave much time in between interactions and the resulting tiny performances would not have much temporal “space”. Similarly, we can observe that non-moving touches tend to have moved within a relatively small proportion of the performance area (median = , interquartile range = ). This can be observed in some of the example performances in Figure 3 where a user has tapped repeatedly in the same part of the performance area.
Figure 7 shows the distribution of touch events across the touchscreen performance area (obtained with a bivariate kernel density estimate). This allows us to investigate where users have most commonly interacted with the performance area to play sounds. We can observe that touches cover the whole performance area; however, more touches occur on the diagonals and in the upper right quadrant. Relatively few touches extend all the way to the edges.
These analyses suggest that users tend not to explore the potential of space in their performances, both in terms of time and the touchscreen area. Given the time limitation of five seconds, it is understandable that users would prefer to squeeze in as much activity as possible. However, a small number of interactions could also be effective by allowing pauses that add structure to the performance (Sutton, 2002). Taken together, these results could inform future synthesis mappings in the app. For instance, a mapping could produce unexpected or interesting sounds if the next touch event is far away in space or time. Given that users tend not to use the edge of the performance area, these areas could be mapped to more extreme sounds (e.g., with distortion or delay effects). Similar mappings have been explored in instruments such as “Crackle” (Reus, 2011).
4.4. Analysis by Gesture
In this section we analyse performances from the perspective of tap and swipe gestures, the two fundamental touchscreen interactions available in MicroJam’s interface. A swipe is a sequence of multiple touch events that can be defined as a touch-down followed by a non-zero number of touch-moved events. The dataset contains 7090 such swipes, compared with 6390 true taps where a touch down was not followed by a touch moved event. These swipes represent one of the more important phenomena in tiny-performances as they represent the actions formed by the majority of touch-points. We extracted swipes from each performance in the dataset by dividing them by touch down events and discarding all divisions with only a touch-down. 155 swipes that were longer than 5 seconds were excluded. We were then able to perform analyses that characterise swiping behaviour seen in our dataset.
|statistic||length (events)||time (s)||mean velocity||distance||max velocity|
Table 4 shows descriptive statistics on the 6935 valid swipes. The majority were short in time and on-screen distance (measured in proportions of the performance width covered). 75% of swipes were shorter than 0.54s in duration and the median swipe time was only 0.13s. The median distance was 0.16 area widths and 75% of swipes traced less than 0.6 of the area width. These results seem inconsistent with the appearance of the visual traces shown in Figure 3 where the images appear to be dominated by long continuous swipes. The dataset does contain long swipes, up to the whole 5s in time and almost 50 area widths. These long swipes are visually dominant, especially given that they can cover up some smaller interactions, but they are outnumbered by the shorter swipes that make up the vast majority of gestures.
In Figure 8, we show the distributions of time, distance, and mean velocity for each instrument in MicroJam. One-way ANOVA tests on each measurement confirm that there are significant effect due to instrument (). In particular, drum performances have much shorter swipes than any of the other instruments in terms of both distance traced and time. This could be due to many attempted “taps” that were actually short swipes with just a few touch-points. Swipes using the pad sound seem to be long in terms of time, but with a lower mean velocity, indicating the use of this instrument to play long notes with slower variation in pitch and timbre, perhaps by subtle movement in a small area.
To gain an intuitive idea of how these swipes looked, we have visualised selections of swipes of different time-length, these are shown in Figure 9
. First, the quartiles for the time dimension were calculated (see Table4), then 200 swipes were sampled from each quartile randomly. The quickest 25% of swipes are almost always straight lines with many spanning quite far across the screen. Some of these very long swipes may indicate bugs in the interface code that has failed to reject multiple touch points on the screen, instead registering them as a single swipe. The two quartiles around the median length show much variation in expression. Swipes shorter than the median rarely have more than a subtle curve, while those above the median show curves that could have an expressive effect on the pitch and timbre of the resulting sounds. In the upper quartile of length, swipes can cover the whole performance area or trace complex patterns. These swipes could represent longer notes, parts of drawings, or whole 5-second performances.
The 2D traces of performances in figures 3 and 9 do not show the velocity of swipes—a quantity with much expressive potential. In Figure 10, we visualise the normalised velocity curves for selections of swipes of different lengths. The shortest swipes have only 2 or 3 touch points and the velocity tends to only increase, indicate a quick flicking movement. The next quartile shows a more expressive curve, with a quick rise and slower release as the touch point stops moving before the end of the swipe. The third quartile shows the possibility for multiple peaks and valleys in the velocity, perhaps indicating changes in direction of the moving touch point. Again, only the fourth quartile shows extensive expressive behaviour such as repeating peaks in velocity that could indicate a rhythmic movement pattern over a longer swipe.
The analysis of swipes in MicroJam has left us some important insights. Most importantly, the majority of swipes are short—three quarters are less than 0.54 seconds—however, the longest swipes have more scope for expressive behaviour. This has implications for future instrument design that should allow more detailed expression with very short interactions, e.g., by slowing down the effect a swipe has on audio effects or timbral changes to make them more noticeable. Alternatively, long swipes could be encouraged. It could be if the visual impact of long swipes was reduced by fading them out over time, users might feel more confident to explore them more frequently. Further refinements to instruments such as “drums” that reward longer swipes with interesting sounds (e.g., pitch bends on toms or cymbals) could also encourage such behaviour.
The results above allow us to characterise how users perform within the constraints of MicroJam’s tiny performance idiom. We know that they perform in different broad styles including abstract gestures as well as images and text. While visually meaningful performances draw the eye, the dataset is dominated by very short swipes and taps which are typical of the more abstract performance styles. It is questionable whether image and text explorations lead to rewarding musical experiments, and it may be more appropriate to focus on improving the expressive potential of other performance styles. As for the social aspects of MicroJam, while the reply function has certainly been used in the dataset, few multi-layered performances are present which limits the conclusions we can draw. Future revisions of MicroJam could emphasise replying and collaboration rather than just performance creation.
The results of our analysis lead us to make the following design recommendations that could be explored in future versions of MicroJam and other similar DMIs:
Multi-layered performances should be encouraged and celebrated within the app. At present, few performances have more than two layers. To encourage users to create more complex performances, these performances could be highlighted in the world feed, and opportunities to reply presented more actively.
The edge of the performance area should sound edgy. The edges are rarely used by performers; mappings could be altered to use these spaces to create more exotic or experimental sounds that reward the user’s exploration.
Short swipes should be a focus for improvement to synthesis mappings. These interactions are the most common gesture, and should be emphasised more in the sound design for MicroJam.
The visual impact of long swipes should be reduced. These interactions are rare, yet dominate the visual trace of performances. Perhaps fading out long swipes would allow multiple of them to be used in performances without overwhelming the touch area.
So far, MicroJam has mainly been used in test and demo environments, and few users have shared large numbers of performances. As a result, the performances analysed in this dataset are generally by inexperienced users. We would expect, however, that as for other instruments, MicroJam users would improve with practice, and develop new styles. Future work could seek to identify changes in tiny performance style over time.
One aspect of analysis that has not been mentioned is modelling and generation of tiny performances with machine-learning algorithms. Previous research has already discussed a mixture density recurrent neural network model for generating and responding to tiny performances(Martin and Torresen, 2018). This “RoboJam” system is available in the app to provide an automatic reply to a performance on demand and generates the same control data format as the tiny performances. In future work we could explore the potential for developing tailored models of individual users’ styles which could even provide control over the kind of gestures (for instance, short swipes or taps) that are provided in an automatic response. These models could also be used to demonstrate effective use of the gestures highlighted in the above recommendations as part of a performance training feature.
5. Conclusions and Future Work
In this paper we have presented the design for MicroJam, a social mobile music app for creating touchscreen performances and defined the tiny performance format. We have also investigated this app through a data-driven analysis of more than 1600 performances. This investigation has revealed how users perform in the tiny touchscreen idiom and allowed us to make recommendations for revisions that could better align the app’s capacity for musical expression with user behaviour. MicroJam is an example of a social app centred on musical creation rather than written and visual media. We have argued that such apps could take advantage of the ubiquity of mobile devices by allowing users to collaborate asynchronously and in different locations, and shown that these modes of interaction are relatively unexplored compared to more conventional ensemble performances.
MicroJam represents a new approach to asynchronous musical collaboration with the focus on time-limited tiny performances. Taking inspiration from the constrained contributions that typify social media apps, MicroJam limits users to five-second touchscreen performances, but affords them extensive opportunities to browse, playback, and collaborate through responses. MicroJam’s tiny performance format includes a complete listing of the touch interactions and so allows performances to be easily distributed, visualised, and studied.
Our novel data-driven investigation examined 1626 performances consisting of 249,870 touch events. These were analysed at the levels of individual touch events, grouped touch gestures, and whole performances. The investigation revealed the variety of styles used in performances but that fewer performances than desired were replies. Examining touch points showed that the edges of the performance area was not used as much as the centre and main diagonals, and that moving touches, rather than taps, dominated the dataset. Grouping touches into swipes showed that while long swipes are more visually apparent, the vast majority of swipe gestures are actually short.
We have distilled the findings of our investigation into design recommendations for enhancing the instrument mappings and visualisations in MicroJam. These could encourage users to be more expressive, and future work could explore how these enhancements affect tiny performances. Given the social goal of MicroJam, the most important measure could be to encourage users to interact through musical collaboration and to generate more complex sequences of replies. MicroJam could potentially host very large collaborations between users and performances with multiple threads of replies. Automatic traversal of such structures could constitute a kind of generative composition with users’ original musical material.
The analysis in this article has suggested that even a simple and constrained touchscreen interface can lead to a variety of styles and unexpected musical interactions. While constraining the length of performances may have increased the number recorded, and made it easier to collaborate using musical replies, it could have curbed users’ gestural exploration with the touch screen. Implementing the design recommendations may encourage more expressive performances in MicroJam users, without increasing the effort required to generate tiny touchscreen performances. For music making, as opposed to appreciation, to be widely adopted as part of everyday social media interactions, this balance between constraint and expression will need to be further examined and addressed. We posit that a data-driven approach to mobile music performance, examining musical data generated by users, can be used to further examine this balance in MicroJam and other systems for mobile musical creativity.
This work was partially supported by The Research Council of Norway through the Engineering Predictability with Embodied Cognition (EPEC) project, under grant agreement 240862, and the Centres of Excellence scheme, project number 262762.
We wish to thank participants in our study as well as beta testers and others who tried MicroJam. We also thank Henrik Brustad and Benedikte Wallace who worked on MicroJam as research assistants.
Conflict of Interest
The authors declare no conflict of interest.
- Allison and Dell (2012) Jesse Allison and Christian Dell. 2012. AuRal: A Mobile Interactive System for Geo-Locative Audio Synthesis. In Proceedings of the International Conference on New Interfaces for Musical Expression. University of Michigan, Ann Arbor, Michigan, 3. http://www.nime.org/proceedings/2012/nime2012_301.pdf
- Barknecht (2011) Frank Barknecht. 2011. rj - Abstractions for getting things done. In Proceedings of the Pure Data Convention. Faculty of Media, Bauhaus-Universität Weimar, Bauhaus-Universität Weimar, Weimar, Germany, 9.
- Barraclough et al. (2015) Timothy J. Barraclough, Dale A. Carnegie, and Ajay Kapur. 2015. Musical Instrument Design Process for Mobile Technology. In Proceedings of the International Conference on New Interfaces for Musical Expression, Edgar Berdahl and Jesse Allison (Eds.). Louisiana State University, Baton Rouge, Louisiana, USA, 289–292. http://www.nime.org/proceedings/2015/nime2015_313.pdf
- Behrendt (2012) Frauke Behrendt. 2012. The sound of locative media. Convergence 18, 3 (2012), 283–295. https://doi.org/10.1177/1354856512441150
- Brinkmann et al. (2011) Peter Brinkmann, Peter Kirn, Richard Lawler, Chris McCormick, Martin Roth, and Hans-Christoph Steiner. 2011. Embedding Pure Data with libpd. In Proceedings of the Pure Data Convention. Bauhaus-Universität Weimar, Weimar, Germany, 8. http://www.uni-weimar.de/medien/wiki/PDCON:Conference/Embedding_Pure_Data_with_libpd:_Design_and_Workflow
- Carôt et al. (2007) Alexander Carôt, Pedro Rebelo, and Alain Renaud. 2007. Networked Music Performance: State of the Art. In Audio Engineering Society 30th International Conference. AES, New York, NY, 1–7. http://www.aes.org/e-lib/browse.cfm?elib=13914
- Carter and Liu (2005) William Carter and Leslie S. Liu. 2005. Location33: A Mobile Musical. In Proceedings of the International Conference on New Interfaces for Musical Expression. University of British Columbia, Vancouver, BC, Canada, 176–179. http://www.nime.org/proceedings/2005/nime2005_176.pdf
- Correa et al. (2009) A. G. D. Correa, I. K. Ficheman, M. d. Nascimento, and R. d. D. Lopes. 2009. Computer Assisted Music Therapy: A Case Study of an Augmented Reality Musical System for Children with Cerebral Palsy Rehabilitation. In Ninth IEEE International Conference on Advanced Learning Technologies. IEEE, New York, NY, 218–220. https://doi.org/10.1109/ICALT.2009.111
- d’Alessandro et al. (2012) Nicolas d’Alessandro, Aura Pon, Johnty Wang, David Eagle, Ehud Sharlin, and Sidney Fels. 2012. A Digital Mobile Choir: Joining Two Interfaces towards Composing and Performing Collaborative Mobile Music. In Proceedings of the International Conference on New Interfaces for Musical Expression. University of Michigan, Ann Arbor, Michigan, 4. http://www.nime.org/proceedings/2012/nime2012_310.pdf
- Dewan and Ramaprasad (2014) Sanjeev Dewan and Jui Ramaprasad. 2014. Social media, traditional media, and music sales. MIS Quarterly 38, 1 (2014), 101–121.
- Essl and Lee (2018) Georg Essl and Sang Won Lee. 2018. Mobile Devices as Musical Instruments - State of the Art and Future Prospects. In Music Technology with Swing. CMMR 2017 (Lecture Notes in Computer Science), M. Aramaki, M. Davies, R. Kronland-Martinet, and Ystad S. (Eds.), Vol. 11265. Springer, Cham, 525–539.
- Essl and Rohs (2009) G. Essl and M. Rohs. 2009. Interactivity for mobile music-making. Organised Sound 14, 2 (2009), 197–207. https://doi.org/10.1017/S1355771809000302
- Favilla and Pedell (2013) Stu Favilla and Sonja Pedell. 2013. Touch Screen Ensemble Music: Collaborative Interaction for Older People with Dementia. In Proceedings of the 25th Australian Computer-Human Interaction Conference (OzCHI ’13). ACM, New York, NY, USA, 481–484. https://doi.org/10.1145/2541016.2541088
- Fiesler and Proferes (2018) Casey Fiesler and Nicholas Proferes. 2018. “Participant” Perceptions of Twitter Research Ethics. Social Media + Society 4, 1 (2018), 1–14. https://doi.org/10.1177/2056305118763366
- Gaye et al. (2006) Lalya Gaye, Lars Erik Holmquist, Frauke Behrendt, and Atau Tanaka. 2006. Mobile Music Technology: Report on an Emerging Community. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME ’06). IRCAM Centre Pompidou, Paris, France, 22–25. http://www.nime.org/proceedings/2006/nime2006_022.pdf
- Gaye et al. (2003) Lalya Gaye, Ramia Mazé, and Lars Erik Holmquist. 2003. Sonic City: the urban environment as a musical interface. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME ’03). McGill University, Montreal, Canada, 109–115. http://www.nime.org/proceedings/2003/nime2003_109.pdf
- Gligorić et al. (2018) Kristina Gligorić, Ashton Anderson, and Robert West. 2018. How Constraints Affect Content: The Case of Twitter’s Switch from 140 to 280 Characters. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM). AAAI Press, Palo Alto, CA, 596–599. https://aaai.org/ocs/index.php/ICWSM/ICWSM18/paper/view/17895
- Godøy et al. (2010) Rolf Inge Godøy, Alexander Refsum Jensenius, and Kristian Nymoen. 2010. Chunking in Music by Coarticulation. Acta Acustica united with Acustica 96, 4 (2010), 690–700. https://doi.org/10.3813/AAA.918323
- Greenberg and Roseman (1998) Saul Greenberg and Mark Roseman. 1998. Using a Room Metaphor to Ease Transitions in Groupware. Technical Report 98/611/02. Department of Computer Science, University of Calgary.
- Gurevich et al. (2012) Michael Gurevich, Adnan Marquez-Borbon, and Paul Stapleton. 2012. Playing with Constraints: Stylistic Variation with a Simple Electronic Instrument. Computer Music Journal 36, 1 (2012), 23–41. https://doi.org/10.1162/COMJ_a_00103
- Hamilton et al. (2011) Robert Hamilton, Jeffrey Smith, and Ge Wang. 2011. Social Composition: Musical Data Systems for Expressive Mobile Music. Leonardo Music Journal 21 (December 2011), 57–64. https://doi.org/10.1162/LMJ_a_00062
- Hofmann et al. (2017) Heike Hofmann, Hadley Wickham, and Karen Kafadar. 2017. Letter-Value Plots: Boxplots for Large Data. Journal of Computational and Graphical Statistics 26, 3 (2017), 469–477. https://doi.org/10.1080/10618600.2017.1305277
- John (2013) David John. 2013. Updating the Classifications of Mobile Music Projects. In Proceedings of the International Conference on New Interfaces for Musical Expression. Graduate School of Culture Technology, KAIST, Daejeon, Republic of Korea, 301–306.
- Kelkar and Jensenius (2018) Tejaswinee Kelkar and Alexander Refsum Jensenius. 2018. Analyzing Free-Hand Sound-Tracings of Melodic Phrases. Applied Sciences 8, 1 (2018), 21. https://doi.org/10.3390/app8010135
- Martin and Gardner (2016) Charles Martin and Henry Gardner. 2016. A Percussion-Focussed Approach to Preserving Touch-Screen Improvisation. In Curating the Digital: Spaces for Art and Interaction, David England, Thecla Schiphorst, and Nick Bryan-Kinns (Eds.). Springer International Publishing, Switzerland, 51–72. https://doi.org/10.1007/978-3-319-28722-5_5
- Martin et al. (2015) Charles Martin, Henry Gardner, and Ben Swift. 2015. Tracking Ensemble Performance on Touch-Screens with Gesture Classification and Transition Matrices. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME ’15), Edgar Berdahl and Jesse Allison (Eds.). Louisiana State University, Baton Rouge, LA, USA, 359–364. http://www.nime.org/proceedings/2015/nime2015_242.pdf
- Martin et al. (2016) Charles Martin, Henry Gardner, Ben Swift, and Michael Martin. 2016. Intelligent Agents and Networked Buttons Improve Free-Improvised Ensemble Music-Making on Touch-Screens. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’16). ACM, New York, NY, USA, 2295–2306. https://doi.org/10.1145/2858036.2858269
- Martin (2018) Charles P. Martin. 2018. MicroJam Source Code. Git Repository. https://doi.org/10.5281/zenodo.1412274
- Martin and Torresen (2017) Charles P. Martin and Jim Torresen. 2017. Exploring Social Mobile Music with Tiny Touch-Screen Performances. In Proceedings of the 14th Sound and Music Computing Conference (SMC ’17), Tapio Lokki, Jukka Pätynen, and Vesa Välimäki (Eds.). Aalto University, Espoo, Finland, 175–180. http://smc2017.aalto.fi/media/materials/proceedings/SMC17_p175.pdf
- Martin and Torresen (2018) Charles P. Martin and Jim Torresen. 2018. RoboJam: A Musical Mixture Density Network for Collaborative Touchscreen Interaction. In Computational Intelligence in Music, Sound, Art and Design: International Conference, EvoMUSART (Lecture Notes in Computer Science), Antonios Liapis, Juan Jesús Romero Cardalda, and Anikó Ekárt (Eds.), Vol. 10783. Springer International Publishing, Switzerland, 161–176. https://doi.org/10.1007/978-3-319-77583-8_11
- Meikle (2016) George Meikle. 2016. Examining the Effects of Experimental/Academic Electroacoustic and Popular Electronic Musics on the Evolution and Development of Human-Computer Interaction in Music. Contemporary Music Review 35, 2 (2016), 224–241. https://doi.org/10.1080/07494467.2016.1221634
- Neustaedter and Sengers (2012) Carman Neustaedter and Phoebe Sengers. 2012. Autobiographical Design in HCI Research: Designing and Learning Through Use-it-yourself. In Proceedings of the Designing Interactive Systems Conference (DIS ’12). ACM, New York, NY, USA, 514–523. https://doi.org/10.1145/2317956.2318034
- Oh et al. (2010) Jieun Oh, Jorge Herrera, Nicholas J. Bryan, Luke Dahl, and Ge Wang. 2010. Evolving the Mobile Phone Orchestra. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME ’10), Kirsty Beilharz, Andrew Johnston, Sam Ferguson, and Amy Yi-Chun Chen (Eds.). University of Technology Sydney, Sydney, Australia, 82–87. http://www.nime.org/proceedings/2010/nime2010_082.pdf
- Redi et al. (2014) Miriam Redi, Neil O’Hare, Rossano Schifanella, Michele Trevisiol, and Alejandro Jaimes. 2014. 6 Seconds of Sound and Vision: Creativity in Micro-Videos. In https://doi.org/10.1109/CVPR.2014.544
- Reus (2011) Jonathan Reus. 2011. Crackle: A mobile multitouch topology for exploratory sound interaction. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME ’11), Alexander Refsum Jensenius, Anders Tveit, Rolf Inge Godøy, and Dan Overholt (Eds.). University of Oslo, Oslo, Norway, 377–380. http://www.nime.org/proceedings/2011/nime2011_377.pdf
- Sawyer (2006) R Keith Sawyer. 2006. Group creativity: Musical performance and collaboration. Psychology of Music 34, 2 (2006), 148–165. https://doi.org/10.1177/0305735606061850
- Scheffel and Matney (2014) Stephanie Scheffel and Bill Matney. 2014. Percussion Use and Training: A Survey of Music Therapy Clinicians. Journal of Music Therapy 51, 1 (2014), 39. https://doi.org/10.1093/jmt/thu006
- Schiemer and Havryliv (2007) Greg Schiemer and Mark Havryliv. 2007. Pocket Gamelan: Swinging phones and ad-hoc standards. In Proceedings of the 4th International Mobile Music Workshop. STEIM, Amsterdam, 2.
- Schlei (2012) Kevin Schlei. 2012. TC-11: A Programmable Multi-Touch Synthesizer for the iPad. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME ’12), G. Essl, B. Gillespie, M. Gurevich, and S. O’Modhrain (Eds.). University of Michigan, Ann Arbor, Michigan, 4. http://www.nime.org/proceedings/2012/nime2012_230.pdf
- Shepard (2007) Mark Shepard. 2007. Tactical Sound Garden Toolkit. In ACM SIGGRAPH 2007 Art Gallery (SIGGRAPH ’07). ACM, New York, NY, USA, 219–. https://doi.org/10.1145/1280120.1280176
- Stokes (2008) Patricia D Stokes. 2008. Creativity from Constraints: What can we learn from Motherwell? from Modrian? from Klee? The Journal of Creative Behavior 42, 4 (2008), 223–236. https://doi.org/10.1002/j.2162-6057.2008.tb01297.x
- Sutton (2002) Julie P. Sutton. 2002. “The Pause That Follows”.. Nordic Journal of Music Therapy 11, 1 (2002), 27–38. https://doi.org/10.1080/08098130209478040
- Swift (2013) Ben Swift. 2013. Chasing a Feeling: Experience in Computer Supported Jamming. In Music and Human-Computer Interaction, Simon Holland, Katie Wilkie, Paul Mulholland, and Allan Seago (Eds.). Springer, London, UK, 85–99. https://doi.org/10.1007/978-1-4471-2990-5_5
- Tanaka (2010) Atau Tanaka. 2010. Mapping Out Instruments, Affordances, and Mobiles. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME ’10), Kirsty Beilharz, Andrew Johnston, Sam Ferguson, and Amy Yi-Chun Chen (Eds.). University of Technology Sydney, Sydney, Australia, 88–93. http://www.nime.org/proceedings/2010/nime2010_088.pdf
- Tanaka and Gemeinboeck (2008) Atau Tanaka and Petra Gemeinboeck. 2008. Net_Dérive: Conceiving and Producing a Locative Media Artwork. In Mobile Technologies: From Telecommunications to Media, Gerard Goggin and Larissa Hjorth (Eds.). Routledge, London, UK.
- Wang (2014) Ge Wang. 2014. Ocarina: Designing the iPhone’s Magic Flute. Computer Music Journal 38, 2 (2014), 8–21. https://doi.org/10.1162/COMJ_a_00236
- Wang (2016) Ge Wang. 2016. Game Design for Expressive Mobile Music. In Proceedings of the International Conference on New Interfaces for Musical Expression, Vol. 16. Queensland Conservatorium Griffith University, Brisbane, Australia, 182–187. http://www.nime.org/proceedings/2016/nime2016_paper0038.pdf
- Wang et al. (2011) Ge Wang, Jieun Oh, and Tom Lieber. 2011. Designing for the iPad: Magic Fiddle. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME ’11), Alexander Refsum Jensenius, Anders Tveit, Rolf Inge Godøy, and Dan Overholt (Eds.). University of Oslo, Oslo, Norway, 197–202. http://www.nime.org/proceedings/2011/nime2011_197.pdf
- Wang et al. (2015) Ge Wang, Spencer Salazar, Jieun Oh, and Robert Hamilton. 2015. World Stage: Crowdsourcing Paradigm for Expressive Social Mobile Music. Journal of New Music Research 44, 2 (2015), 112–128. https://doi.org/10.1080/09298215.2014.991739