Direct Segmented Sonification of Characteristic Features of the Data Domain

by   Paul Vickers, et al.

Sonification and audification create auditory displays of datasets. Audification translates data points into digital audio samples and the auditory display's duration is determined by the playback rate. Like audification, auditory graphs maintain the temporal relationships of data while using parameter mappings (typically data-to-frequency) to represent the ordinate values. Such direct approaches have the advantage of presenting the data stream `as is' without the imposed interpretations or accentuation of particular features found in indirect approaches. However, datasets can often be subdivided into short non-overlapping variable length segments that each encapsulate a discrete unit of domain-specific significant information and current direct approaches cannot represent these. We present Direct Segmented Sonification (DSSon) for highlighting the segments' data distributions as individual sonic events. Using domain knowledge to segment data, DSSon presents segments as discrete auditory gestalts while retaining the overall temporal regime and relationships of the dataset. The method's structural decoupling from the sound stream's formation means playback speed is independent of the individual sonic event durations, thereby offering highly flexible time compression/stretching to allow zooming into or out of the data. Demonstrated by three models applied to biomechanical data, DSSon displays high directness, letting the data `speak' for themselves.



There are no comments yet.


page 4

page 5


Graph Based Analysis for Gene Segment Organization In a Scrambled Genome

DNA rearrangement processes recombine gene segments that are organized o...

Sound Event Detection Using Duration Robust Loss Function

Many methods of sound event detection (SED) based on machine learning re...

Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection

In this paper, we propose a temporal-frequential attention model for sou...

Synchronising speech segments with musical beats in Mandarin and English singing

Generating synthesised singing voice with models trained on speech data ...

Changepoint detection in non-exchangeable data

Changepoint models typically assume the data within each segment are ind...

CaR-FOREST: Joint Classification-Regression Decision Forests for Overlapping Audio Event Detection

This report describes our submissions to Task2 and Task3 of the DCASE 20...

Discrete Logarithmic Fuzzy Vault Scheme

In this paper a three fuzzy vault schemes which integrated with discrete...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Sonification and audification are representational techniques in which data sets, or selected features of data sets, are mapped to audio signals (see The Sonification Handbook [1] for an overview of the main techniques.) As sound is a temporal phenomenon such auditory displays are especially well suited to time series data.

The most common sonification method is via parameter mapping in which the data values drive the parameters of an audio signal. For example, Parseihian et al [2] mapped target distance to pitch, timbre, and tempo in various combinations to assist with one-dimensional guidance tasks. Silva et al [3] mapped features of graphical objects to acoustic parameters to communicate visual information to visually impaired people. Using physical modeling sound synthesis, Roodaki et al. [4] mapped stylus pressure to the timbral parameters of an acoustic model to assist users with visual object tracking tasks.

In contrast, audification only involves transposing the frequencies of the data to the human-audible range and occasional filtering to remove unwanted linear distortions (and in rare cases dynamic range compression to remove very large level variations). Therefore, the process maintains a tighter relationship with the data than other auditory display processes which generally rely on mappings to effect the auditory display. These mappings can be low level (e.g., [2]) or more metaphorical (e.g., the use of melodic phrase structures to represent elements of a computer program [5]).

The directness of a sonification is a measure of the arbitrariness (in relation to the underlying data) of the mapping [6]. A method exhibiting maximal directness will derive the sound directly from the data (e.g., through the use of direct data-to-sound translations). Low directness arises from more symbolic, metaphoric, or interpretative mappings. Thus, audification is a more direct form of auditory display, the audio being generated entirely by the data.

The sonification method proposed here pursues directness as a design goal so that, as far as possible, the data are allowed to ‘speak’ for themselves. In this way, any metaphors arise as contingent properties of the sonification rather than being imposed by the designer. For example, the characteristic sound caused by accentuating data range excursions in §5.3 below assumes its own sonic identity and metaphorical labels may be assigned by (and will vary depending on) the listener. In this way, sonification users may start identifying regions of interest in the data by describing the characteristic sounds they hear.

The proposed method follows a direct sonification strategy which conserves fundamental properties of (pure) audification, notably the compact temporal support and some aspects of the precise temporal structure of a data set.

1.1 Leveraging the Directness of Audification

The audification of a physical process strictly conserves the temporal regime of the source signal and so contains high-frequency components when rapid transients occur in the data. This is advantageous because such transients, which often correspond to points of interest in the data, are also significant features of the audio signal which the human auditory system relies on to identify real-world sounds. Hence, they can be a perceptually salient basis for auditory data exploration [7].

When the data is sampled from a band-limited physical process the audification signal has a one-to-one relationship with the data. In fact, the mapping is, in principle, bijective and fully reversible (at least while the data remains in the digital domain prior to any D/A conversion.) However, even such direct representations can contain misleading features because of the band-limited interpolation of the reconstruction filter of the D/A converter leading to extreme data values being elevated in the audification.

As Höldrich and Vogt [7] pointed out, the ideal audification signal has auditory gestalts within time and frequency ranges that are clearly perceptible to a listener. Take a data stream dominated by low frequencies with transients occurring within a range of 1 k data points and with an aperiodic interval of approximately 10 k data points. At a playback rate of 44.1 kHz roughly four of these events will occur each second which is comparable to the number of syllables per second in spoken English and so is suitable for listeners (see Wood [8] for a detailed view of the information aspects of tempo). However, each transient event’s duration will be approximately 22 ms appearing as a band-limited impulse with a cut-off frequency at around 50 Hz, which is below the most sensitive range of the human auditory system. If the playback rate were raised by, say, a factor of 10–20, the individual impulses would be shifted to a more perceptible frequency range, but at the cost of an indiscernible temporal structure of the impulse series. Thus, pure audification is a trade-off between the macroscopic time scale and the frequency range of the relevant information.

2 Direct Segmented Sonification (DSSon)

Following Rohrhuber’s approach [9] the DSSon process is regarded as a mapping operation between data domain and sound domain. Because the sonification time domain will often be different from the data time domain (e.g., choosing to listen to a 100 s data set over a period of only 10 s) Rohrhuber proposed superscribing sonification domain variables with a ring to distinguish them from data domain variables, thereby enabling the construction of unambiguous mixed domain expressions. In this scheme the sonification operator maps from the given data space to the sound signal space :


The relation is more explicit at the level of the variables [10]:


The sonification signal depends on (sonification time), because sound is a temporal phenomenon, on the data to be sonified which itself is assumed to depend on a data domain time , and the parameters of the sonification method which determine how the sonification sounds.

2.1 Sonification Variables

Variable Description Value range
time compression factor sonification duration
dilation factor
reference frequency
pitch scaling factors
power law distortion factor
amplitude threshold
gain function e.g. mean, rms, …
decay parameter
operator for timbral control e.g., wave shaping,
additive synthesis
TABLE I: Direct Segmented Sonification Variables, Functions, and Operators

The proposed sonification method uses the variables shown in Table I. The sonification parameter set is then given as with any appropriate subset being used in the models described below. The meanings of these variables are given in the sections that follow. To distinguish sonification time from data domain time, sonification time variables are given as and data domain time variables as .

2.2 General Framework of DSSon

DSSon relies on the assumption that a one-dimensional time-varying data stream, , can be subdivided into short non-overlapping segments of generally different length where each segment contains a consistent portion of application-dependent significant information. Thus, identification of the appropriate cutting points is crucial. For example, if one is interested in the short-term fluctuation of a stock price, the crossing points of the actual stock price with a moving average might be a good choice. We consider a data stream as a time varying signal expressed as a sequence of sampled values at a sampling rate . The duration of the data stream is seconds, hence the sequence consists of samples. Assuming that the DSSon of the data should last for approximately seconds (the reason for the duration being approximate is explained below), a time compression factor is defined by .

As a first step, the cutting points (the borders between segments ) have to be determined depending on the application and the specific properties of the data. As a simple example, consider a broadband AC signal. In this case the zero crossing points are a reasonable choice. If the signal contains DC or strong low-frequency components (as is the case with stock prices and the data used in §4) some preprocessing might be necessary. For instance, the trend signal calculated by a moving average filter can be subtracted from the original data yielding a signal which exhibits numerous zero crossings.

Assuming the first cutting point is at and the last one is at , a sequence of segments (or if the low frequency mean or DC component has been removed through preprocessing) is obtained where:


Thus, the actual duration of each segment is given by . Each data segment is to be sonified as an individual sonic event depending on the parameters of the sonification method at hand and is superimposed to form the final sonification . For the sake of simplicity, we skip the explicit dependence of the sonic event on the data segment and the sonification parameters in the following:

where (4)

Note that the individual sonic events might be longer or shorter than the duration of the respective data segment depending on the specific sonification method and parameters. Therefore, the actual length of is only approximately equal to the data duration divided by the compression factor: .

The DSSon approach conserves the overall temporal structure of the data as long as the cutting points are chosen appropriately, that is, they are meaningful within the context of the data domain. Since the sonification length of the individual segments is not predetermined by this very general formulation, the resulting auditory display can be adjusted either to focus on the rhythmical structure of the segments’ temporal distribution (such as by choosing very short and transient sonic events for each segment and thereby presenting, essentially, a sequence of clicks) or to zoom into the specific data evolution of each segment (e.g., by choosing long sonic events with time-varying properties according to the segment’s data values). Note that the latter approach yields a temporal overlap of sonic events of adjacent segments and hence might confound the auditory gestalts originating from the individual segments. In any case, the appropriate choice of the sonification method for the individual segments is crucial for the quality of the DSSon. In the following section, a simple method for segment sonification which is derived from auditory graphing is presented.

3 Modified Auditory Graphs for Sonifying Individual Segments

Auditory graphs have been a part of the standard repertoire of auditory display research since its beginning. At its simplest, an auditory graph represents the ordinate value of a data series as the time-varying frequency of a sinusoid with (usually) constant amplitude [11]. An obvious benefit is the straightforward analogy to visual graphs, which makes them readily understandable, at least for sighted users. Flowers [11] recommended using distinct timbres in order to minimize stream confusions and unwanted perceptual grouping. Since auditory graphs usually encode data values as pitch or (fundamental) frequency, harmonic complexes with a small number (6–8) of partials and amplitudes in inverse proportion to partial order are recommended instead of pure sinusoids because of the improved pitch salience they are able to produce. Nevertheless, the resulting timbre should be time-invariant to guide the listener’s attention to the pitch contour and not obscure the data representation by arbitrary timbral fluctuations. More complex timbres run the risk of evoking categorical associations with real-world sound that might change at more or less arbitrary data values and therefore confound the intended perceptual continuum of the frequency or pitch range representing the important aspects of the data. If several auditory graphs are to be presented simultaneously spectral overlap between adjacent graphs should be avoided, therefore pure sinusoids might be the better choice in this instance.

In order to achieve the intended directness of the final sonification, not only must the overall temporal relationship of the segmentation pattern be preserved (as is ensured by the general framework in §2.2), but the sonic events resulting from the individual segments must also display the segments’ data evolution as directly as possible. Therefore, a modified auditory graph is proposed as the specific method of segment sonification in DSSon with each segment being treated as an individual graph. We assume segments are derived from zero crossing points (either due to the inherent AC characteristics of the data or after removing the signal average) and exploit the property that each segment starts and ends with data values of negligible magnitude. To accentuate strong deviations from a chosen baseline (such as the average), amplitude modulation derived from the segment’s data complements the time-varying pitch progression of the basic auditory graph. Thus, the general form of the sonification signal is given by


where is the amplitude modulator, is the base frequency for the pitch range of the sonification, and is a pitch modulator.111In order to allow specific control of timbre, an additional timbre operator which acts on the sine function has to be considered in the model:

might be implemented as, for instance, waveshaping utilizing Chebyshev polynomials or any kind of additive synthesis. The operator properties itself will depend on the data to be sonified, i.e. . However, in the case of the modified auditory graph the resulting sonic events consist only of amplitude and pitch modulated sinusoids, hence can be regarded as the identity function, , and will be omitted in the following for the sake of simplicity. To include the (previously removed) short-term average value as an overall pitch trend, we explicitly take into account both the mean-free segment and the trend signal at the segment’s starting point for pitch modulation.


In (6), the magnitude of the segment’s data values is used as amplitude modulation and the dilation parameter determines the length of the sonic event in relation to the duration of the data segment . If , adjacent sonic events do not overlap since , whereas results in overlapping events.

Of course, both pitch and amplitude modulation can be parameterized in various ways. For example, if mainly peak or strong deviations from the mean are to be displayed, a power law distortion can be applied to the amplitude modulator :


If only deviations exceeding a threshold around the mean are to be sonified, then a magnitude offset followed by half-wave rectification might be included in the amplitude modulator:


On the other hand, the relative importance of the trend signal and the actual data progression of the segment can be adjusted via non-negative parameters and .


If the stream of segments with positive deviation from the trend should be discriminated from the stream of negative segments, two different reference frequencies and could be used. From the above, the general parameterized form of DSSon is

3.1 Modulation of Segment Duration

To relate the duration of sonification segments to some property of the data we can use not as a constant, but as a function of the segment’s data, . For instance, if highly peaked segments should be displayed as longer sonic events to display the data distribution in more detail, a monotonically decreasing, concave function of the segment’s mean (or other property such as rms, power) or area (or energy) is more suitable for .

3.2 Decaying Envelope as Amplitude Modulator

In order to emphasize the rhythmical patterns induced by the temporal distribution of the cutting points, a sharp attack of the individual sonic events is needed. This can be achieved by replacing the amplitude modulator or the variants in (7) and (8) by an appropriate envelope, for example, or , where is the envelope’s decay parameter and the gain factor is determined by a specific function of the segment’s data values, , e.g., the mean, rms, area, power, or energy of the segment.

4 Applying DSSon to Biomechanical Data

We applied the above DSSon method to biomechanical signal data taken from the Functional Readaptive Exercise Device (FRED), an exercise machine designed for use in physiotherapy to help patients with low back pain [12]. The current version of FRED is a modified cross-trainer but which offers minimal resistance (Fig. 1). This creates a situation in which the user has an unstable base of support: when the front foot comes to the forward-most position in its elliptical path gravity then pulls the foot downward requiring the user to apply compensatory balancing force with the rear foot to control the descent. The goal is to operate the machine with an upright posture in a smooth, controlled manner with as little variability in movement speed as possible [12].

A rotary encoder in the drive wheel generates a pulse stream which represents the instantaneous angular velocity of the wheel at each sampling point. This pulse stream is sampled at 4 kHz into LabChart [13]. The data is converted to frequency values (i.e., revolutions per second) for ease of display for the user (a patient). The resultant data stream is then smoothed using a triangular Bartlett filter to remove the steps in the data. The smoothed stream is presented to the user via LabChart (with a zoom level of 50:1) as a means of feedback to help them control their performance (Fig. 2).

Fig. 1:

The height (difficulty) of FRED’s walking path is increased by moving the rear end of the stride rail (A) through the slot (B) towards the edge of the wheel. FRED has five such settings and is shown here in its default (lowest) configuration.

22 at 22 -2 23 at 38.5 -2 24 at 55 -2 25 at 71.5 -2 26 at 88 -2 27 at 105 -2 28 at 122 -2 29 at 138.5 -2 30 at 155.5 -2 31 at 172 -2 32 at 188.5 -2 33 at 205 -2 34 at 221.5 -2 Hz at 2.5 85 0.2 at 8 33 0.3 at 8 51 0.4 at 8 72.5 0.5 at 8 93.5 0.6 at 8 115.5 0.7 at 8 138

Fig. 2: Screen capture of the live scrolling window view in LabChart as seen by a FRED user during an exercise session (axis labels have been superimposed here to aid the reader).

It has been determined that with the machine in its default configuration (Fig. 1), operating it within a frequency range of 0.2 Hz 0.6 Hz results in therapeutic benefit leading to recruitment of the key spinal and abdominal muscles lumbar multifidus (LM) and transversus abdominis (TrA), and with the biomechanical optimum for maximum benefit being achieved at Hz [12]. At this optimal frequency a complete rotation of the footplates takes 2.5 s, thus requiring a slow and steady pace.

The white area in Fig. 2 shows the user when they are performing inside the required range with the shaded areas denoting frequencies above and below the required range. Fig. 2 shows the user is maintaining a good pace until 27.2 s at which point they slow down dramatically, coming to a brief halt (27.65 s) followed by a sharp corrective acceleration which takes the frequency up to 0.8 Hz followed by a compensatory attempt to slow down, followed by another sharp acceleration, with normal performance being re-attained at around 29.2 s.

Fig. 3: A full FRED exercise session comprising one three-minute warm-up block and four three-minute exercise blocks with rest periods.

A typical session comprises several three-minute blocks of exercise separated by rest periods of between 30–60 s. A full session’s worth of data is shown in Fig. 3. A quick glance at the data plotted at a zoom level of 2000:1 (Fig. 3) is helpful to the physiotherapist for getting an overall impression of the user’s performance. During rest periods the therapist uses this zoomed-out view to look for signs of fatigue (such as a rising trend line in the frequency) which may require extension of the rest period. The plots are useful to the therapist during an exercise session but post-hoc review of many session data files quickly becomes tiring. Repeated zooming in and out is needed to locate regions of possible interest and to spot specific instances of particular performance behaviours.

4.1 Features of Interest

During a post-hoc review of performance, the physiotherapist is interested in identifying a number of discrete features in the data sets. The main performance goal is to maintain a walking pace of 0.2 Hz 0.6 Hz. While the patient needs to be aware of excursions outside this range during exercise, for the therapist all excursions above 0.6 Hz and long excursions below 0.2 Hz are of interest. If the frequency exceeds 0.6 Hz (Fig. 2) this indicates a loss of control — the machine is running away with the user. However, because it takes a great deal of muscle control to operate the machine slowly, if the frequency momentarily drops below 0.2 Hz and then goes back in range this is of less interest to the therapist as it is still evidence of control — it is a controlled recovery (Fig. ((b))(b)). But if it drops below 0.2 Hz for an extended period of time (typically half-a-second or more) then this also indicates a lack of control as motion is coming to a stop.

The target range of 0.2 Hz 0.6 Hz means that users can demonstrate variability in their average speed while still maintaining acceptable performance. Therefore, for each user, the physiotherapist will additionally determine a maximum deviation from the individual mean as a target range based upon their assessment of the user’s current ability and any physical characteristics that might impact upon how well they are able to use FRED. For example, a beginner with reasonable control might be expected to achieve a standard target deviation of 0.15 Hz while someone who is able to keep within the range 0.35 Hz 0.45 Hz would have a target deviation of 0.05 Hz. Once the therapist has determined a user’s target deviation it is interesting to know at what points they are failing to maintain it.

If someone were able to operate the machine perfectly there would be no variation in their speed and the plot would show a flat line. Therefore, the smoother the plot the less the user’s pace is varying. When a user starts to master the required walking technique they begin to exhibit what are known as “flat tops”. A flat top is a region of activity lasting approximately 0.5 s or more in which the variation in speed is so small that the curve starts to flatten out. Flat tops typically occur during the portion of a walking cycle after the rear foot has come up from the bottom of the elliptical path and before the front foot descends again. Fig. ((a))(a) shows a double flat top. At around the 53 s mark the small peak indicates where the user’s rear foot has ascended from the bottom of the elliptical path. This is followed by a period of relatively flat speed variation lasting just under 1 s. At around 54.2 s the front foot descends and then another flat top of 0.7 s occurs.

Because these features require zooming in to see clearly it becomes time consuming to zoom-and-scroll through many data files, so DSSon was applied to FRED data sets to see how well these features could be heard. After discussions with physiotherapists from Northumbria University’s Aerospace Medicine and Rehabilitation lab in which FRED is being further developed, the features to be represented were:

  1. Any excursions above 0.6 Hz.

  2. Long excursions below 0.2 Hz.

  3. Periods outside the user’s target deviation range.

  4. ‘Flat tops’ lasting 0.5 s or longer.

((a)) Double “flat top”
((b)) Controlled recovery
Fig. 4: Strong and weak performance. In (a) the user demonstrates two periods of very small deviation in velocity. In (b) the user’s velocity drops below target but is very quickly recovered back into the target range.

The preprocessing stage involved audifying FRED data streams by simply converting each data point to a signed 16-bit integer and storing the result in a PCM-encoded digital audio file. Because the revolution rate does not exceed 2 Hz (which would be very fast walking) the signal spectrum caused by the speed fluctuations occurring during a full revolution is band limited below 15–20Hz. Therefore, to keep the file sizes small the data extracted from LabChart were first downsampled to Hz prior to audification. Thus, the time series signal, in the DSSon method was provided by these audio files. The DSSon method was implemented in a series of MATLAB (for sonification) and Python (preprocessing) scripts (see the project repository [14]).

5 DSSon Models for FRED Signals

In this section we describe three DSSon models that were applied to FRED data that emphasize the features of interest identified above to varying degrees resulting in differing auditory saliency. DSSon for FRED data is mainly intended to provide an auditory display of users’ performance that enables the physiotherapist to conduct a quick analysis during post-hoc review. The DSSon parameters might also be individually adjusted by the therapist during the review session in order to concentrate on specific data features. Consequently, it is impractical to evaluate the DSSon display through extensive listening tests based on specific task completion performance and statistical analysis. This kind of evaluation procedure is planned for future work on other application fields. Here, DSSon’s properties (benefits and limitations) are demonstrated by comparing data excerpts containing specific features of interest and the resulting DSSon display. Audio files, demonstrating the system output, together with the corresponding data sets used to generate them, can be found in the project repository [14] and are listed in Table II.

# Audio file Description
1 DSSon_Basic_A_n.wav M1, user A — novice
2 DSSon_Basic_A_e.wav M1, user A — experienced
3 DSSon_Basic_B.wav M1, user B — novice
4 DSSon_ITR_A_e.wav M2, user A — exp.
5 DSSon_ITR_B.wav M2, user B — novice
6 DSSon_ADV_A_n.wav M3, user A — novice
7 DSSon_Adv_A_e.wav M3, user A — exp.
8 DSSon_Adv_B.wav M3, user B — novice
Models: M1 = basic model; M2 = individual target range model; M3 = advanced model
Data files used: user A, novice = DA1; user A, experienced = DA2; user B, novice = DB1
TABLE II: Example Sound Files

The first step in DSSon is signal segmentation. For FRED data, the main feature of interest is the deviation of the instantaneous revolution rate from the fixed target value, 0.4 Hz (the biomechanical optimum from above). Hence, an obvious choice for segmentation is to cut the data stream at its crossing points with this target value, that is, extract segments with positive and negative deviation from . However, as far as a user is able to maintain a steady revolution rate, even slightly deviating from 0.4 Hz, or shows a slowly varying average revolution rate exhibiting only small excursions, he/she shows sufficient muscle control and therefore gains therapeutic benefit. To account for this fact, we did not use the fixed target value of 0.4 Hz to determine the segments’ start and end points, but calculated a weighted mean of the target and the moving average of the data stream, , to obtain the trend signal:

The data stream and the trend signal (weighting factor ) of two exercise sessions of the same user are shown in Figs. 5 and 6. The first data stream was recorded in the second week of a six-week training period, and the second was recorded four months after the end of the training period. The data segments are determined utilizing the zero-crossing points of the trend-free signal: for , and otherwise.

Fig. 5: Data stream and trend signal (weighting factor ) of FRED exercise sessions of user A at the beginning of training (audio file 1).
Fig. 6: Data stream and trend signal (weighting factor ) of FRED exercise sessions of user A, four months after training (audio file 2).

5.1 DSSon Basic Model

The DSSon basic model uses a time compression factor and a dilation parameter . This moderate compression factor allows for a rather fast post-hoc review of the data. The sonic events resulting from adjacent positive and negative excursions are displayed at a rate of approximately 8 events per second, that is, a mean revolution rate of 0.4 Hz times (typically) 4 segments per revolution (2 positive and 2 negative excursions) times compression factor . This rhythmical pattern can be easily perceived in detail because it lies quite within the typical range of musical gestures and the individual events do not overlap due to the dilation parameter chosen (). In order to better facilitate the discrimination between positive and negative excursions, different reference frequencies for the pitch modulator are employed, specifically and . To monitor both the individual excursions and the overall trend, both pitch scaling factors are applied . Amplitude modulation derived from the instantaneous magnitude of the segment’s data values is used, that is, the power law distortion factor equals . The final model including the parameter values reads:


The model was applied to three FRED data signals, two from user A (audio files 1, 2) and one from user B (audio file 3). Figs. 7 and 8 show the data and trend as well as the spectrogram of the basic DSSon model for a rather poor performance (user B, audio file 3). The user is obviously not able to maintain a stable mean speed at the beginning of the exercise session nor to stay within the range of 0.2 Hz – 0.6 Hz. Large positive excursions are clearly visible at 6 and 15 s in Fig. 7 and result in strong high frequency events at 1 and 3 s (Fig. 8). Sudden slow instants at 11, 45, and 55 s yield prominent low frequency sounds at 2, 9, and 11 s accordingly (Fig. 8).

Fig. 7: Data stream and trend signal (weighting factor ) of FRED exercise session for a poor performer (user B).

Note that highlighting the trend (of approximately 0.4 Hz) in the sonification (due to ) results in an upward shift of the pitch register compared to the range of the reference frequencies.222If , the trend data are completely suppressed resulting in a lower pitch register. If and the trend equals 0.4Hz, then the instantaneous frequencies are multiplied by , resulting in a center frequency of 610 Hz instead of 350 Hz. The trend variation results in an overall glissando gliding upward and downward displayed in the spectrogram as the sliding white frequency band framed by the sonic events of positive and negative segments respectively.

Fig. 8: DSSon basic model spectrogram of User B (Fig. 7, audio file 3).

In comparison, the DSSon of the experienced user (Fig. 6, audio file 2) is shown as the spectrogram in Fig. 9. A constant mean rate and regular small deviations resulting in a soft and steady rhythmical pattern dominate this example.

Fig. 9: Spectrogram of DSSon basic model for an experienced user (FRED data and trend are shown in Fig. 6) (audio file 2).

5.2 DSSon Individual Target Range Model

The time compression factor used in the previous examples allows for a quick review of an individual performance. Nevertheless, exploring a collection of FRED sessions consisting of up to five exercise blocks each of 3 minutes duration, would result in a rather time-consuming endeavour and providing sonification with an even larger time compression of is preferable. However, the increased playback speed means that the rhythmical patterns of the sonic events and their pitch contours would become indiscernible if the DSSon basic model with its previous parameter values were employed.

Therefore, the DSSon individual target range model (ITR) suppresses segments whose maximum excursions stay below the target range set for each user individually by the physiotherapist. This is accomplished by a threshold-based amplitude modulator similar to the one proposed in (8) and setting the threshold parameter appropriately. Contrary to the amplitude modulator in (8) which displays only the segment’s data values exceeding the threshold, one might be interested to listen to the entire segment if its value exceeds the target range at some point. Hence, a threshold-based indicator function combined with the segment’s instantaneous magnitude is used as the amplitude modulator :


To display the remaining segments in sufficient detail, the dilation parameter is set as yielding potentially overlapping sonic events. Figs. 10 and 11 show the spectrograms of the new model for the two users. results in a sonification duration of 4 seconds for a 1 minute session, yielding a threefold overlap of adjacent sonic events. The threshold parameter is set to 0.1 Hz for both examples though in practice the therapist would have chosen individual values for the two users according to their level of motor control. All other sonification parameters are set as in the basic model. Note that for the experienced user (Fig. 10), a sparse auditory display is obtained by the new model (audio file 4) whereas a dense sonification with almost constantly overlapping sonic events is caused by the poor performance of user B (Fig. 11, audio file 5).

Fig. 10: DSSon ITR model experienced user spectrogram (audio file 4).
Fig. 11: DSSon ITR model spectrogram for user B (audio file 5).

5.3 DSSon Advanced Model

Both DSSon models presented so far are based on a modified auditory graph of adjacent data segments. They are characterized by a smooth functional relationship between data values and the auditory display which can be easily perceived by the listener. As every segment is sonified by an amplitude and pitch modulated sinusoid, a coherent auditory gestalt of homogeneous timbre emerges. However, the special features of interest mentioned in subsection 4.1 are not displayed saliently except for the ITR model which delivers sonic events only for segments exceeding the individual target range, thereby explicitly displaying feature #3. In order to indicate excursions above 0.6 Hz (feature #1) and below 0.2 Hz (feature #2) prominently, timbre modifications are utilized as an additional sonification parameter. Segments whose maximum excursions cross these limits, are sonified by a fixed harmonic complex (for overshoots above 0.6 Hz) or subharmonic complex (for undershoots below 0.2 Hz) respectively. This is achieved by including the timbre operator in a DSSon advanced model (ADV) as:

The auxiliary sonification parameters and specify the number of partials, hence the bandwidth of the sonic event, and the amplitude attenuation associated with increasing partial order. and are set so as to align the loudness levels of the overshoot and undershoot segment with the basic one (in this case, and ). Note that by introducing a non-trivial timbre operator, the additional distinct categories of sonic events will result in a sonification where three auditory streams are likely to be perceived and the coherent gestalts of the previous models become dispersed.

To further accentuate segments of long excursions which predominantly occur for undershoots, a data-dependent transformation of the dilation parameter is incorporated in the ADV model. For data segments whose maximum excursions stay within specified limits (e.g. ), the dilation parameter is fixed to , whereas for overshoot and undershoot segments, the dilation parameter becomes a monotonically decreasing function of the segment’s data values, , and causes stretched sonic events. As a transformation, we specifically propose the hyperbolic function of the segment’s area, that is, the time integral of segment’s magnitude :


The hyperbolic function translates into a linear dependence of the sonic event’s duration on the segment’s area , since (16) and lead to

for (17)

The additional sonification parameters and determine the area threshold and the strength of the dilation transformation respectively. The area threshold should be set to which equals the area of a sine-formed segment of duration (the expected duration of an excursion at target revolution rate of 0.4 Hz) and of amplitude 0.2 (magnitude difference between either limit, i.e., 0.2 Hz and 0.6 Hz, and the target rate). Utilizing this dilation transformation yields dominant stretched sonic events for long overshoot and undershoot segments. However, because the amplitude modulator used up to this point ((7) and (14)) delays the loudness peaks of the stretched events, the temporal structure of data segmentation is likely to get obscured. Therefore, an envelope-based amplitude modulation with a rather sharp attack followed by a decay and weighted by the segment’s maximum magnitude is considered for overshoots and undershoots in the ADV model:


The decay parameter is set to which leads the sonic event to end at an amplitude level of dB relative to its maximum. To prevent annoying clicks, a short fade-out portion is further applied at the very end of the envelope. The complete amplitude modulator for the ADV model reads as:


We applied the ADV model to FRED data setting the sonification parameters , , , , , and as mentioned above and the other parameters as in the ITR model. Fig. 12 shows the spectrogram of the ADV model for user B. Note the additional harmonic and subharmonic partials for the overshoot and undershoot segments at 0.4, 1.0, 3.2, 3.7 s, and 0.0, 0.7, 3.6 s respectively (audio file 8). As the experienced user A did not produce any excursions beyond the limits, the ADV model yields the same results as the ITR model (see Fig. 10, audio file 4).

Fig. 12: Spectrogram of DSSon ADV model for user B (audio file 8).

6 Conclusion

The proposed DSSon method aims to construct a direct sonification strategy for one-dimensional streams of numerical data. To achieve the intended directness, DSSon inherits an important property of other highly direct sonification approaches like audification and auditory graphs, in that it preserves the overall temporal structure of the data stream. DSSon is especially well-suited for data whose size (number of data points), is too small to be suitable for (pure) audification, because the audified sound would be either too short to perceptually decipher data details when using a high playback rate or, otherwise, would be displayed at very low frequencies where the human auditory system lacks good sensitivity.

Höldrich and Vogt’s Augmented Audification [7] addressed the same problem domain. To ameliorate the drawback of the output being in too low a frequency range, they applied a data-dependent single side-band modulation to shift audio up by a desired frequency. The problem with this is that the frequencies in the data are scaled linearly resulting in compression of the frequency relationships, thereby destroying the periodicity of harmonic signals. A solution might be to use pitch-shifting which retains the frequency ratios, but this introduces artefacts into the signal and only works well for small shifts.

In DSSon’s general form, the data stream is cut into non-overlapping segments where the selection of the slicing points depends on the nature of the data and the envisioned application. (In the presented test case of biomechanical data, the zero-crossing points of the trend-free speed signal are utilized as segment boundaries.) Each segment is sonified as a single sonic event using a sonification method not predefined within the general DSSon framework. For instance, a method (such as the proposed modified auditory graphs) which is based on mapping data properties of the segment to sound parameters could be used; even a highly metaphorical sonification which displays an alert whenever a segment’s duration exceed a certain threshold is possible (though at the cost of reducing directness.) To form the entire DSSon signal, the sonic events are superimposed in such a way that the temporal pattern of the segments’ starting points corresponds precisely to the temporal structure of the cutting points, thereby preserving the overall relative time structure of the data.

As the sonification method for the segments is structurally decoupled from the formation of the final sound stream, the playback speed of the entire DSSon signal can be set independent of the length of the individual sonic events offering a wide range of possible time compression/stretching factors and thereby high flexibility for zooming into or out of the data. Even pure audification can be regarded as a special case of DSSon, if every single data point is treated as a segment and sonified by a Dirac impulse weighted by the signed data value.

To ensure maximum directness of the resulting sonification, a modified auditory graph has been proposed as the specific method for sonifying the individual segments. In contrast to common auditory graphs, additional amplitude modulation derived from the segment’s data evolution in an application-dependent way is accommodated to accentuate large data values. Furthermore, the reference frequency (and thereby the pitch register) is set individually for each sonic event depending on specific segment properties, for example, positive and negative-valued segments in an AC signal, or an overall trend.

As a demonstration, three DSSon models using variants of modified auditory graphs (with/without AM thresholding and timbre design) were applied to data gathered from FRED exercise sessions. The determination of the cutting points, as well as the specific choice of the amplitude modulation (thresholding in the Individual Target Range model), are based on domain expertise and intended to display the main features of physiotherapeutic interest in a perceptually salient way. For the third advanced model, the modified auditory graph was extended by incorporating a different timbre for segments whose magnitude exceeds a predefined range.

DSSon offers some, albeit limited, potential for real-time applications since a segment’s sonic event can generally only be synthesized when its end point is reached and the entire segment is available for deriving parameters of the specific sonification method.

The DSSon framework provides a wide range of application-dependent flexibility (as demonstrated by the different models for post hoc analysis of physiotherapeutic data) while maintaining a high degree of directness of the auditory display in that it succeeds in letting the data ‘speak’ for themselves. For future work, it is intended to apply DSSon to data from other domains which allow for the precise determination of specific detection or discrimination tasks, so that the DSSon method can be compared with audification and auditory graphs in formal listening tests.


The authors would like to thank Kirsty Lindsay and Nick Caplan of Northumbria University’s Aerospace Medicine and Rehabilitation Laboratory for their advice on the salient information sought by physiologists in the post hoc analysis of FRED exercise data.


  • [1] T. Hermann, A. D. Hunt, and J. Neuhoff, Eds., The Sonification Handbook.   Berlin: Logos Verlag, 2011.
  • [2] G. Parseihian, C. Gondre, M. Aramaki, S. Ystad, and R. Kronland-Martinet, “Comparison and evaluation of sonification strategies for guidance tasks,” IEEE Trans. Multimedia, vol. 18, no. 4, pp. 674–686, Apr. 2016.
  • [3] P. M. Silva, T. N. Pappas, J. Atkins, and J. E. West, “Perceiving graphical and pictorial information via hearing and touch,” IEEE Trans. Multimedia, vol. 18, no. 12, pp. 2432–2445, Dec. 2016.
  • [4] H. Roodaki, N. Navab, A. Eslami, C. Stapleton, and N. Navab, “Sonifeye: Sonification of visual information using physical modeling sound synthesis,” IEEE Trans. Vis. Comput. Graphics, vol. 23, no. 11, pp. 2366–2371, Nov 2017.
  • [5] P. Vickers and J. L. Alty, “Musical program auralization: Empirical studies,” ACM Trans. Appl. Percept., vol. 2, no. 4, pp. 477–489, 2005. [Online]. Available:
  • [6]

    P. Vickers and B. Hogg, “Sonification abstraite/sonification concrète: An ‘aesthetic perspective space’ for classifying auditory displays in the ars musica domain,” in

    ICAD 2006 - The 12th Meeting of the International Conference on Auditory Display, T. Stockman, L. V. Nickerson, C. Frauenberger, A. D. N. Edwards, and D. Brock, Eds., London, UK, 20–23 Jun. 2006, pp. 210–216.
  • [7] R. Höldrich and K. Vogt, “Augmented audification,” in ICAD 15: Proceedings of the 21st International Conference on Auditory Display, K. Vogt, A. Andreopoulou, and V. Goudarzi, Eds.   Graz, Austria: Institute of Electronic Music and Acoustics (IEM), University of Music and Performing Arts Graz (KUG), 2015, pp. 102–108.
  • [8] S. A. J. Wood, “Speech tempo,” in Working Papers.   Department of General Linguistics, Lund University, 1973.
  • [9] J. Rohrhuber, “ — introducing sonification variables,” in SuperCollider Symposium 2010, Berlin, 23–16 Sep. 2010, pp. 1–8.
  • [10] K. Vogt and R. Höldrich, “Translating sonifications,” JAES Journal of the Audio Engineering Society, vol. 60, no. 11, pp. 926–935, 2012.
  • [11] J. H. Flowers, “Thirteen years of reflection on auditory graphing: Promises, pitfalls, and potential new directions,” in Proceedings of 11th International Conference on Auditory Display (ICAD2005), E. Brazil, Ed., Limerick, Ireland, 6–9 Jul. 2005, pp. 406–409.
  • [12] A. Winnard, D. Debuse, M. Wilkinson, L. Samson, T. Weber, and N. Caplan, “Movement amplitude on the functional re-adaptive exercise device: deep spinal muscle activity and movement control,” European Journal of Applied Physiology, pp. 1–10, 2017.
  • [13] AD Instruments. (2017). [Online]. Available:
  • [14] R. Höldrich and P. Vickers, “nuson-DSSon: Direct segmented sonification,” Jul. 2017, DOI: 10.5281/zenodo.1007784. [Online]. Available: