Sonification and audification are representational techniques in which data sets, or selected features of data sets, are mapped to audio signals (see The Sonification Handbook  for an overview of the main techniques.) As sound is a temporal phenomenon such auditory displays are especially well suited to time series data.
The most common sonification method is via parameter mapping in which the data values drive the parameters of an audio signal. For example, Parseihian et al  mapped target distance to pitch, timbre, and tempo in various combinations to assist with one-dimensional guidance tasks. Silva et al  mapped features of graphical objects to acoustic parameters to communicate visual information to visually impaired people. Using physical modeling sound synthesis, Roodaki et al.  mapped stylus pressure to the timbral parameters of an acoustic model to assist users with visual object tracking tasks.
In contrast, audification only involves transposing the frequencies of the data to the human-audible range and occasional filtering to remove unwanted linear distortions (and in rare cases dynamic range compression to remove very large level variations). Therefore, the process maintains a tighter relationship with the data than other auditory display processes which generally rely on mappings to effect the auditory display. These mappings can be low level (e.g., ) or more metaphorical (e.g., the use of melodic phrase structures to represent elements of a computer program ).
The directness of a sonification is a measure of the arbitrariness (in relation to the underlying data) of the mapping . A method exhibiting maximal directness will derive the sound directly from the data (e.g., through the use of direct data-to-sound translations). Low directness arises from more symbolic, metaphoric, or interpretative mappings. Thus, audification is a more direct form of auditory display, the audio being generated entirely by the data.
The sonification method proposed here pursues directness as a design goal so that, as far as possible, the data are allowed to ‘speak’ for themselves. In this way, any metaphors arise as contingent properties of the sonification rather than being imposed by the designer. For example, the characteristic sound caused by accentuating data range excursions in §5.3 below assumes its own sonic identity and metaphorical labels may be assigned by (and will vary depending on) the listener. In this way, sonification users may start identifying regions of interest in the data by describing the characteristic sounds they hear.
The proposed method follows a direct sonification strategy which conserves fundamental properties of (pure) audification, notably the compact temporal support and some aspects of the precise temporal structure of a data set.
1.1 Leveraging the Directness of Audification
The audification of a physical process strictly conserves the temporal regime of the source signal and so contains high-frequency components when rapid transients occur in the data. This is advantageous because such transients, which often correspond to points of interest in the data, are also significant features of the audio signal which the human auditory system relies on to identify real-world sounds. Hence, they can be a perceptually salient basis for auditory data exploration .
When the data is sampled from a band-limited physical process the audification signal has a one-to-one relationship with the data. In fact, the mapping is, in principle, bijective and fully reversible (at least while the data remains in the digital domain prior to any D/A conversion.) However, even such direct representations can contain misleading features because of the band-limited interpolation of the reconstruction filter of the D/A converter leading to extreme data values being elevated in the audification.
As Höldrich and Vogt  pointed out, the ideal audification signal has auditory gestalts within time and frequency ranges that are clearly perceptible to a listener. Take a data stream dominated by low frequencies with transients occurring within a range of 1 k data points and with an aperiodic interval of approximately 10 k data points. At a playback rate of 44.1 kHz roughly four of these events will occur each second which is comparable to the number of syllables per second in spoken English and so is suitable for listeners (see Wood  for a detailed view of the information aspects of tempo). However, each transient event’s duration will be approximately 22 ms appearing as a band-limited impulse with a cut-off frequency at around 50 Hz, which is below the most sensitive range of the human auditory system. If the playback rate were raised by, say, a factor of 10–20, the individual impulses would be shifted to a more perceptible frequency range, but at the cost of an indiscernible temporal structure of the impulse series. Thus, pure audification is a trade-off between the macroscopic time scale and the frequency range of the relevant information.
2 Direct Segmented Sonification (DSSon)
Following Rohrhuber’s approach  the DSSon process is regarded as a mapping operation between data domain and sound domain. Because the sonification time domain will often be different from the data time domain (e.g., choosing to listen to a 100 s data set over a period of only 10 s) Rohrhuber proposed superscribing sonification domain variables with a ring to distinguish them from data domain variables, thereby enabling the construction of unambiguous mixed domain expressions. In this scheme the sonification operator maps from the given data space to the sound signal space :
The relation is more explicit at the level of the variables :
The sonification signal depends on (sonification time), because sound is a temporal phenomenon, on the data to be sonified which itself is assumed to depend on a data domain time , and the parameters of the sonification method which determine how the sonification sounds.
2.1 Sonification Variables
|time compression factor||sonification duration|
|pitch scaling factors|
|power law distortion factor|
|gain function||e.g. mean, rms, …|
|operator for timbral control||e.g., wave shaping,|
The proposed sonification method uses the variables shown in Table I. The sonification parameter set is then given as with any appropriate subset being used in the models described below. The meanings of these variables are given in the sections that follow. To distinguish sonification time from data domain time, sonification time variables are given as and data domain time variables as .
2.2 General Framework of DSSon
DSSon relies on the assumption that a one-dimensional time-varying data stream, , can be subdivided into short non-overlapping segments of generally different length where each segment contains a consistent portion of application-dependent significant information. Thus, identification of the appropriate cutting points is crucial. For example, if one is interested in the short-term fluctuation of a stock price, the crossing points of the actual stock price with a moving average might be a good choice. We consider a data stream as a time varying signal expressed as a sequence of sampled values at a sampling rate . The duration of the data stream is seconds, hence the sequence consists of samples. Assuming that the DSSon of the data should last for approximately seconds (the reason for the duration being approximate is explained below), a time compression factor is defined by .
As a first step, the cutting points (the borders between segments ) have to be determined depending on the application and the specific properties of the data. As a simple example, consider a broadband AC signal. In this case the zero crossing points are a reasonable choice. If the signal contains DC or strong low-frequency components (as is the case with stock prices and the data used in §4) some preprocessing might be necessary. For instance, the trend signal calculated by a moving average filter can be subtracted from the original data yielding a signal which exhibits numerous zero crossings.
Assuming the first cutting point is at and the last one is at , a sequence of segments (or if the low frequency mean or DC component has been removed through preprocessing) is obtained where:
Thus, the actual duration of each segment is given by . Each data segment is to be sonified as an individual sonic event depending on the parameters of the sonification method at hand and is superimposed to form the final sonification . For the sake of simplicity, we skip the explicit dependence of the sonic event on the data segment and the sonification parameters in the following:
Note that the individual sonic events might be longer or shorter than the duration of the respective data segment depending on the specific sonification method and parameters. Therefore, the actual length of is only approximately equal to the data duration divided by the compression factor: .
The DSSon approach conserves the overall temporal structure of the data as long as the cutting points are chosen appropriately, that is, they are meaningful within the context of the data domain. Since the sonification length of the individual segments is not predetermined by this very general formulation, the resulting auditory display can be adjusted either to focus on the rhythmical structure of the segments’ temporal distribution (such as by choosing very short and transient sonic events for each segment and thereby presenting, essentially, a sequence of clicks) or to zoom into the specific data evolution of each segment (e.g., by choosing long sonic events with time-varying properties according to the segment’s data values). Note that the latter approach yields a temporal overlap of sonic events of adjacent segments and hence might confound the auditory gestalts originating from the individual segments. In any case, the appropriate choice of the sonification method for the individual segments is crucial for the quality of the DSSon. In the following section, a simple method for segment sonification which is derived from auditory graphing is presented.
3 Modified Auditory Graphs for Sonifying Individual Segments
Auditory graphs have been a part of the standard repertoire of auditory display research since its beginning. At its simplest, an auditory graph represents the ordinate value of a data series as the time-varying frequency of a sinusoid with (usually) constant amplitude . An obvious benefit is the straightforward analogy to visual graphs, which makes them readily understandable, at least for sighted users. Flowers  recommended using distinct timbres in order to minimize stream confusions and unwanted perceptual grouping. Since auditory graphs usually encode data values as pitch or (fundamental) frequency, harmonic complexes with a small number (6–8) of partials and amplitudes in inverse proportion to partial order are recommended instead of pure sinusoids because of the improved pitch salience they are able to produce. Nevertheless, the resulting timbre should be time-invariant to guide the listener’s attention to the pitch contour and not obscure the data representation by arbitrary timbral fluctuations. More complex timbres run the risk of evoking categorical associations with real-world sound that might change at more or less arbitrary data values and therefore confound the intended perceptual continuum of the frequency or pitch range representing the important aspects of the data. If several auditory graphs are to be presented simultaneously spectral overlap between adjacent graphs should be avoided, therefore pure sinusoids might be the better choice in this instance.
In order to achieve the intended directness of the final sonification, not only must the overall temporal relationship of the segmentation pattern be preserved (as is ensured by the general framework in §2.2), but the sonic events resulting from the individual segments must also display the segments’ data evolution as directly as possible. Therefore, a modified auditory graph is proposed as the specific method of segment sonification in DSSon with each segment being treated as an individual graph. We assume segments are derived from zero crossing points (either due to the inherent AC characteristics of the data or after removing the signal average) and exploit the property that each segment starts and ends with data values of negligible magnitude. To accentuate strong deviations from a chosen baseline (such as the average), amplitude modulation derived from the segment’s data complements the time-varying pitch progression of the basic auditory graph. Thus, the general form of the sonification signal is given by
where is the amplitude modulator, is the base frequency for the pitch range of the sonification, and is a pitch modulator.111In order to allow specific control of timbre, an additional timbre operator which acts on the sine function has to be considered in the model:
In (6), the magnitude of the segment’s data values is used as amplitude modulation and the dilation parameter determines the length of the sonic event in relation to the duration of the data segment . If , adjacent sonic events do not overlap since , whereas results in overlapping events.
Of course, both pitch and amplitude modulation can be parameterized in various ways. For example, if mainly peak or strong deviations from the mean are to be displayed, a power law distortion can be applied to the amplitude modulator :
If only deviations exceeding a threshold around the mean are to be sonified, then a magnitude offset followed by half-wave rectification might be included in the amplitude modulator:
On the other hand, the relative importance of the trend signal and the actual data progression of the segment can be adjusted via non-negative parameters and .
If the stream of segments with positive deviation from the trend should be discriminated from the stream of negative segments, two different reference frequencies and could be used. From the above, the general parameterized form of DSSon is
3.1 Modulation of Segment Duration
To relate the duration of sonification segments to some property of the data we can use not as a constant, but as a function of the segment’s data, . For instance, if highly peaked segments should be displayed as longer sonic events to display the data distribution in more detail, a monotonically decreasing, concave function of the segment’s mean (or other property such as rms, power) or area (or energy) is more suitable for .
3.2 Decaying Envelope as Amplitude Modulator
In order to emphasize the rhythmical patterns induced by the temporal distribution of the cutting points, a sharp attack of the individual sonic events is needed. This can be achieved by replacing the amplitude modulator or the variants in (7) and (8) by an appropriate envelope, for example, or , where is the envelope’s decay parameter and the gain factor is determined by a specific function of the segment’s data values, , e.g., the mean, rms, area, power, or energy of the segment.
4 Applying DSSon to Biomechanical Data
We applied the above DSSon method to biomechanical signal data taken from the Functional Readaptive Exercise Device (FRED), an exercise machine designed for use in physiotherapy to help patients with low back pain . The current version of FRED is a modified cross-trainer but which offers minimal resistance (Fig. 1). This creates a situation in which the user has an unstable base of support: when the front foot comes to the forward-most position in its elliptical path gravity then pulls the foot downward requiring the user to apply compensatory balancing force with the rear foot to control the descent. The goal is to operate the machine with an upright posture in a smooth, controlled manner with as little variability in movement speed as possible .
A rotary encoder in the drive wheel generates a pulse stream which represents the instantaneous angular velocity of the wheel at each sampling point. This pulse stream is sampled at 4 kHz into LabChart . The data is converted to frequency values (i.e., revolutions per second) for ease of display for the user (a patient). The resultant data stream is then smoothed using a triangular Bartlett filter to remove the steps in the data. The smoothed stream is presented to the user via LabChart (with a zoom level of 50:1) as a means of feedback to help them control their performance (Fig. 2).
It has been determined that with the machine in its default configuration (Fig. 1), operating it within a frequency range of 0.2 Hz 0.6 Hz results in therapeutic benefit leading to recruitment of the key spinal and abdominal muscles lumbar multifidus (LM) and transversus abdominis (TrA), and with the biomechanical optimum for maximum benefit being achieved at Hz . At this optimal frequency a complete rotation of the footplates takes 2.5 s, thus requiring a slow and steady pace.
The white area in Fig. 2 shows the user when they are performing inside the required range with the shaded areas denoting frequencies above and below the required range. Fig. 2 shows the user is maintaining a good pace until 27.2 s at which point they slow down dramatically, coming to a brief halt (27.65 s) followed by a sharp corrective acceleration which takes the frequency up to 0.8 Hz followed by a compensatory attempt to slow down, followed by another sharp acceleration, with normal performance being re-attained at around 29.2 s.
A typical session comprises several three-minute blocks of exercise separated by rest periods of between 30–60 s. A full session’s worth of data is shown in Fig. 3. A quick glance at the data plotted at a zoom level of 2000:1 (Fig. 3) is helpful to the physiotherapist for getting an overall impression of the user’s performance. During rest periods the therapist uses this zoomed-out view to look for signs of fatigue (such as a rising trend line in the frequency) which may require extension of the rest period. The plots are useful to the therapist during an exercise session but post-hoc review of many session data files quickly becomes tiring. Repeated zooming in and out is needed to locate regions of possible interest and to spot specific instances of particular performance behaviours.
4.1 Features of Interest
During a post-hoc review of performance, the physiotherapist is interested in identifying a number of discrete features in the data sets. The main performance goal is to maintain a walking pace of 0.2 Hz 0.6 Hz. While the patient needs to be aware of excursions outside this range during exercise, for the therapist all excursions above 0.6 Hz and long excursions below 0.2 Hz are of interest. If the frequency exceeds 0.6 Hz (Fig. 2) this indicates a loss of control — the machine is running away with the user. However, because it takes a great deal of muscle control to operate the machine slowly, if the frequency momentarily drops below 0.2 Hz and then goes back in range this is of less interest to the therapist as it is still evidence of control — it is a controlled recovery (Fig. ((b))(b)). But if it drops below 0.2 Hz for an extended period of time (typically half-a-second or more) then this also indicates a lack of control as motion is coming to a stop.
The target range of 0.2 Hz 0.6 Hz means that users can demonstrate variability in their average speed while still maintaining acceptable performance. Therefore, for each user, the physiotherapist will additionally determine a maximum deviation from the individual mean as a target range based upon their assessment of the user’s current ability and any physical characteristics that might impact upon how well they are able to use FRED. For example, a beginner with reasonable control might be expected to achieve a standard target deviation of 0.15 Hz while someone who is able to keep within the range 0.35 Hz 0.45 Hz would have a target deviation of 0.05 Hz. Once the therapist has determined a user’s target deviation it is interesting to know at what points they are failing to maintain it.
If someone were able to operate the machine perfectly there would be no variation in their speed and the plot would show a flat line. Therefore, the smoother the plot the less the user’s pace is varying. When a user starts to master the required walking technique they begin to exhibit what are known as “flat tops”. A flat top is a region of activity lasting approximately 0.5 s or more in which the variation in speed is so small that the curve starts to flatten out. Flat tops typically occur during the portion of a walking cycle after the rear foot has come up from the bottom of the elliptical path and before the front foot descends again. Fig. ((a))(a) shows a double flat top. At around the 53 s mark the small peak indicates where the user’s rear foot has ascended from the bottom of the elliptical path. This is followed by a period of relatively flat speed variation lasting just under 1 s. At around 54.2 s the front foot descends and then another flat top of 0.7 s occurs.
Because these features require zooming in to see clearly it becomes time consuming to zoom-and-scroll through many data files, so DSSon was applied to FRED data sets to see how well these features could be heard. After discussions with physiotherapists from Northumbria University’s Aerospace Medicine and Rehabilitation lab in which FRED is being further developed, the features to be represented were:
Any excursions above 0.6 Hz.
Long excursions below 0.2 Hz.
Periods outside the user’s target deviation range.
‘Flat tops’ lasting 0.5 s or longer.
The preprocessing stage involved audifying FRED data streams by simply converting each data point to a signed 16-bit integer and storing the result in a PCM-encoded digital audio file. Because the revolution rate does not exceed 2 Hz (which would be very fast walking) the signal spectrum caused by the speed fluctuations occurring during a full revolution is band limited below 15–20Hz. Therefore, to keep the file sizes small the data extracted from LabChart were first downsampled to Hz prior to audification. Thus, the time series signal, in the DSSon method was provided by these audio files. The DSSon method was implemented in a series of MATLAB (for sonification) and Python (preprocessing) scripts (see the project repository ).
5 DSSon Models for FRED Signals
In this section we describe three DSSon models that were applied to FRED data that emphasize the features of interest identified above to varying degrees resulting in differing auditory saliency. DSSon for FRED data is mainly intended to provide an auditory display of users’ performance that enables the physiotherapist to conduct a quick analysis during post-hoc review. The DSSon parameters might also be individually adjusted by the therapist during the review session in order to concentrate on specific data features. Consequently, it is impractical to evaluate the DSSon display through extensive listening tests based on specific task completion performance and statistical analysis. This kind of evaluation procedure is planned for future work on other application fields. Here, DSSon’s properties (benefits and limitations) are demonstrated by comparing data excerpts containing specific features of interest and the resulting DSSon display. Audio files, demonstrating the system output, together with the corresponding data sets used to generate them, can be found in the project repository  and are listed in Table II.
|1||DSSon_Basic_A_n.wav||M1, user A — novice|
|2||DSSon_Basic_A_e.wav||M1, user A — experienced|
|3||DSSon_Basic_B.wav||M1, user B — novice|
|4||DSSon_ITR_A_e.wav||M2, user A — exp.|
|5||DSSon_ITR_B.wav||M2, user B — novice|
|6||DSSon_ADV_A_n.wav||M3, user A — novice|
|7||DSSon_Adv_A_e.wav||M3, user A — exp.|
|8||DSSon_Adv_B.wav||M3, user B — novice|
|Models: M1 = basic model; M2 = individual target range model; M3 = advanced model|
|Data files used: user A, novice = DA1; user A, experienced = DA2; user B, novice = DB1|
The first step in DSSon is signal segmentation. For FRED data, the main feature of interest is the deviation of the instantaneous revolution rate from the fixed target value, 0.4 Hz (the biomechanical optimum from above). Hence, an obvious choice for segmentation is to cut the data stream at its crossing points with this target value, that is, extract segments with positive and negative deviation from . However, as far as a user is able to maintain a steady revolution rate, even slightly deviating from 0.4 Hz, or shows a slowly varying average revolution rate exhibiting only small excursions, he/she shows sufficient muscle control and therefore gains therapeutic benefit. To account for this fact, we did not use the fixed target value of 0.4 Hz to determine the segments’ start and end points, but calculated a weighted mean of the target and the moving average of the data stream, , to obtain the trend signal:
The data stream and the trend signal (weighting factor ) of two exercise sessions of the same user are shown in Figs. 5 and 6. The first data stream was recorded in the second week of a six-week training period, and the second was recorded four months after the end of the training period. The data segments are determined utilizing the zero-crossing points of the trend-free signal: for , and otherwise.
5.1 DSSon Basic Model
The DSSon basic model uses a time compression factor and a dilation parameter . This moderate compression factor allows for a rather fast post-hoc review of the data. The sonic events resulting from adjacent positive and negative excursions are displayed at a rate of approximately 8 events per second, that is, a mean revolution rate of 0.4 Hz times (typically) 4 segments per revolution (2 positive and 2 negative excursions) times compression factor . This rhythmical pattern can be easily perceived in detail because it lies quite within the typical range of musical gestures and the individual events do not overlap due to the dilation parameter chosen (). In order to better facilitate the discrimination between positive and negative excursions, different reference frequencies for the pitch modulator are employed, specifically and . To monitor both the individual excursions and the overall trend, both pitch scaling factors are applied . Amplitude modulation derived from the instantaneous magnitude of the segment’s data values is used, that is, the power law distortion factor equals . The final model including the parameter values reads:
The model was applied to three FRED data signals, two from user A (audio files 1, 2) and one from user B (audio file 3). Figs. 7 and 8 show the data and trend as well as the spectrogram of the basic DSSon model for a rather poor performance (user B, audio file 3). The user is obviously not able to maintain a stable mean speed at the beginning of the exercise session nor to stay within the range of 0.2 Hz – 0.6 Hz. Large positive excursions are clearly visible at 6 and 15 s in Fig. 7 and result in strong high frequency events at 1 and 3 s (Fig. 8). Sudden slow instants at 11, 45, and 55 s yield prominent low frequency sounds at 2, 9, and 11 s accordingly (Fig. 8).
Note that highlighting the trend (of approximately 0.4 Hz) in the sonification (due to ) results in an upward shift of the pitch register compared to the range of the reference frequencies.222If , the trend data are completely suppressed resulting in a lower pitch register. If and the trend equals 0.4Hz, then the instantaneous frequencies are multiplied by , resulting in a center frequency of 610 Hz instead of 350 Hz. The trend variation results in an overall glissando gliding upward and downward displayed in the spectrogram as the sliding white frequency band framed by the sonic events of positive and negative segments respectively.
5.2 DSSon Individual Target Range Model
The time compression factor used in the previous examples allows for a quick review of an individual performance. Nevertheless, exploring a collection of FRED sessions consisting of up to five exercise blocks each of 3 minutes duration, would result in a rather time-consuming endeavour and providing sonification with an even larger time compression of is preferable. However, the increased playback speed means that the rhythmical patterns of the sonic events and their pitch contours would become indiscernible if the DSSon basic model with its previous parameter values were employed.
Therefore, the DSSon individual target range model (ITR) suppresses segments whose maximum excursions stay below the target range set for each user individually by the physiotherapist. This is accomplished by a threshold-based amplitude modulator similar to the one proposed in (8) and setting the threshold parameter appropriately. Contrary to the amplitude modulator in (8) which displays only the segment’s data values exceeding the threshold, one might be interested to listen to the entire segment if its value exceeds the target range at some point. Hence, a threshold-based indicator function combined with the segment’s instantaneous magnitude is used as the amplitude modulator :
To display the remaining segments in sufficient detail, the dilation parameter is set as yielding potentially overlapping sonic events. Figs. 10 and 11 show the spectrograms of the new model for the two users. results in a sonification duration of 4 seconds for a 1 minute session, yielding a threefold overlap of adjacent sonic events. The threshold parameter is set to 0.1 Hz for both examples though in practice the therapist would have chosen individual values for the two users according to their level of motor control. All other sonification parameters are set as in the basic model. Note that for the experienced user (Fig. 10), a sparse auditory display is obtained by the new model (audio file 4) whereas a dense sonification with almost constantly overlapping sonic events is caused by the poor performance of user B (Fig. 11, audio file 5).
5.3 DSSon Advanced Model
Both DSSon models presented so far are based on a modified auditory graph of adjacent data segments. They are characterized by a smooth functional relationship between data values and the auditory display which can be easily perceived by the listener. As every segment is sonified by an amplitude and pitch modulated sinusoid, a coherent auditory gestalt of homogeneous timbre emerges. However, the special features of interest mentioned in subsection 4.1 are not displayed saliently except for the ITR model which delivers sonic events only for segments exceeding the individual target range, thereby explicitly displaying feature #3. In order to indicate excursions above 0.6 Hz (feature #1) and below 0.2 Hz (feature #2) prominently, timbre modifications are utilized as an additional sonification parameter. Segments whose maximum excursions cross these limits, are sonified by a fixed harmonic complex (for overshoots above 0.6 Hz) or subharmonic complex (for undershoots below 0.2 Hz) respectively. This is achieved by including the timbre operator in a DSSon advanced model (ADV) as:
The auxiliary sonification parameters and specify the number of partials, hence the bandwidth of the sonic event, and the amplitude attenuation associated with increasing partial order. and are set so as to align the loudness levels of the overshoot and undershoot segment with the basic one (in this case, and ). Note that by introducing a non-trivial timbre operator, the additional distinct categories of sonic events will result in a sonification where three auditory streams are likely to be perceived and the coherent gestalts of the previous models become dispersed.
To further accentuate segments of long excursions which predominantly occur for undershoots, a data-dependent transformation of the dilation parameter is incorporated in the ADV model. For data segments whose maximum excursions stay within specified limits (e.g. ), the dilation parameter is fixed to , whereas for overshoot and undershoot segments, the dilation parameter becomes a monotonically decreasing function of the segment’s data values, , and causes stretched sonic events. As a transformation, we specifically propose the hyperbolic function of the segment’s area, that is, the time integral of segment’s magnitude :
The hyperbolic function translates into a linear dependence of the sonic event’s duration on the segment’s area , since (16) and lead to
The additional sonification parameters and determine the area threshold and the strength of the dilation transformation respectively. The area threshold should be set to which equals the area of a sine-formed segment of duration (the expected duration of an excursion at target revolution rate of 0.4 Hz) and of amplitude 0.2 (magnitude difference between either limit, i.e., 0.2 Hz and 0.6 Hz, and the target rate). Utilizing this dilation transformation yields dominant stretched sonic events for long overshoot and undershoot segments. However, because the amplitude modulator used up to this point ((7) and (14)) delays the loudness peaks of the stretched events, the temporal structure of data segmentation is likely to get obscured. Therefore, an envelope-based amplitude modulation with a rather sharp attack followed by a decay and weighted by the segment’s maximum magnitude is considered for overshoots and undershoots in the ADV model:
The decay parameter is set to which leads the sonic event to end at an amplitude level of dB relative to its maximum. To prevent annoying clicks, a short fade-out portion is further applied at the very end of the envelope. The complete amplitude modulator for the ADV model reads as:
We applied the ADV model to FRED data setting the sonification parameters , , , , , and as mentioned above and the other parameters as in the ITR model. Fig. 12 shows the spectrogram of the ADV model for user B. Note the additional harmonic and subharmonic partials for the overshoot and undershoot segments at 0.4, 1.0, 3.2, 3.7 s, and 0.0, 0.7, 3.6 s respectively (audio file 8). As the experienced user A did not produce any excursions beyond the limits, the ADV model yields the same results as the ITR model (see Fig. 10, audio file 4).
The proposed DSSon method aims to construct a direct sonification strategy for one-dimensional streams of numerical data. To achieve the intended directness, DSSon inherits an important property of other highly direct sonification approaches like audification and auditory graphs, in that it preserves the overall temporal structure of the data stream. DSSon is especially well-suited for data whose size (number of data points), is too small to be suitable for (pure) audification, because the audified sound would be either too short to perceptually decipher data details when using a high playback rate or, otherwise, would be displayed at very low frequencies where the human auditory system lacks good sensitivity.
Höldrich and Vogt’s Augmented Audification  addressed the same problem domain. To ameliorate the drawback of the output being in too low a frequency range, they applied a data-dependent single side-band modulation to shift audio up by a desired frequency. The problem with this is that the frequencies in the data are scaled linearly resulting in compression of the frequency relationships, thereby destroying the periodicity of harmonic signals. A solution might be to use pitch-shifting which retains the frequency ratios, but this introduces artefacts into the signal and only works well for small shifts.
In DSSon’s general form, the data stream is cut into non-overlapping segments where the selection of the slicing points depends on the nature of the data and the envisioned application. (In the presented test case of biomechanical data, the zero-crossing points of the trend-free speed signal are utilized as segment boundaries.) Each segment is sonified as a single sonic event using a sonification method not predefined within the general DSSon framework. For instance, a method (such as the proposed modified auditory graphs) which is based on mapping data properties of the segment to sound parameters could be used; even a highly metaphorical sonification which displays an alert whenever a segment’s duration exceed a certain threshold is possible (though at the cost of reducing directness.) To form the entire DSSon signal, the sonic events are superimposed in such a way that the temporal pattern of the segments’ starting points corresponds precisely to the temporal structure of the cutting points, thereby preserving the overall relative time structure of the data.
As the sonification method for the segments is structurally decoupled from the formation of the final sound stream, the playback speed of the entire DSSon signal can be set independent of the length of the individual sonic events offering a wide range of possible time compression/stretching factors and thereby high flexibility for zooming into or out of the data. Even pure audification can be regarded as a special case of DSSon, if every single data point is treated as a segment and sonified by a Dirac impulse weighted by the signed data value.
To ensure maximum directness of the resulting sonification, a modified auditory graph has been proposed as the specific method for sonifying the individual segments. In contrast to common auditory graphs, additional amplitude modulation derived from the segment’s data evolution in an application-dependent way is accommodated to accentuate large data values. Furthermore, the reference frequency (and thereby the pitch register) is set individually for each sonic event depending on specific segment properties, for example, positive and negative-valued segments in an AC signal, or an overall trend.
As a demonstration, three DSSon models using variants of modified auditory graphs (with/without AM thresholding and timbre design) were applied to data gathered from FRED exercise sessions. The determination of the cutting points, as well as the specific choice of the amplitude modulation (thresholding in the Individual Target Range model), are based on domain expertise and intended to display the main features of physiotherapeutic interest in a perceptually salient way. For the third advanced model, the modified auditory graph was extended by incorporating a different timbre for segments whose magnitude exceeds a predefined range.
DSSon offers some, albeit limited, potential for real-time applications since a segment’s sonic event can generally only be synthesized when its end point is reached and the entire segment is available for deriving parameters of the specific sonification method.
The DSSon framework provides a wide range of application-dependent flexibility (as demonstrated by the different models for post hoc analysis of physiotherapeutic data) while maintaining a high degree of directness of the auditory display in that it succeeds in letting the data ‘speak’ for themselves. For future work, it is intended to apply DSSon to data from other domains which allow for the precise determination of specific detection or discrimination tasks, so that the DSSon method can be compared with audification and auditory graphs in formal listening tests.
The authors would like to thank Kirsty Lindsay and Nick Caplan of Northumbria University’s Aerospace Medicine and Rehabilitation Laboratory for their advice on the salient information sought by physiologists in the post hoc analysis of FRED exercise data.
-  T. Hermann, A. D. Hunt, and J. Neuhoff, Eds., The Sonification Handbook. Berlin: Logos Verlag, 2011.
-  G. Parseihian, C. Gondre, M. Aramaki, S. Ystad, and R. Kronland-Martinet, “Comparison and evaluation of sonification strategies for guidance tasks,” IEEE Trans. Multimedia, vol. 18, no. 4, pp. 674–686, Apr. 2016.
-  P. M. Silva, T. N. Pappas, J. Atkins, and J. E. West, “Perceiving graphical and pictorial information via hearing and touch,” IEEE Trans. Multimedia, vol. 18, no. 12, pp. 2432–2445, Dec. 2016.
-  H. Roodaki, N. Navab, A. Eslami, C. Stapleton, and N. Navab, “Sonifeye: Sonification of visual information using physical modeling sound synthesis,” IEEE Trans. Vis. Comput. Graphics, vol. 23, no. 11, pp. 2366–2371, Nov 2017.
-  P. Vickers and J. L. Alty, “Musical program auralization: Empirical studies,” ACM Trans. Appl. Percept., vol. 2, no. 4, pp. 477–489, 2005. [Online]. Available: https://doi.org/10.1145/1101530.1101546
P. Vickers and B. Hogg, “Sonification abstraite/sonification concrète: An ‘aesthetic perspective space’ for classifying auditory displays in the ars musica domain,” inICAD 2006 - The 12th Meeting of the International Conference on Auditory Display, T. Stockman, L. V. Nickerson, C. Frauenberger, A. D. N. Edwards, and D. Brock, Eds., London, UK, 20–23 Jun. 2006, pp. 210–216.
-  R. Höldrich and K. Vogt, “Augmented audification,” in ICAD 15: Proceedings of the 21st International Conference on Auditory Display, K. Vogt, A. Andreopoulou, and V. Goudarzi, Eds. Graz, Austria: Institute of Electronic Music and Acoustics (IEM), University of Music and Performing Arts Graz (KUG), 2015, pp. 102–108.
-  S. A. J. Wood, “Speech tempo,” in Working Papers. Department of General Linguistics, Lund University, 1973.
-  J. Rohrhuber, “ — introducing sonification variables,” in SuperCollider Symposium 2010, Berlin, 23–16 Sep. 2010, pp. 1–8.
-  K. Vogt and R. Höldrich, “Translating sonifications,” JAES Journal of the Audio Engineering Society, vol. 60, no. 11, pp. 926–935, 2012.
-  J. H. Flowers, “Thirteen years of reflection on auditory graphing: Promises, pitfalls, and potential new directions,” in Proceedings of 11th International Conference on Auditory Display (ICAD2005), E. Brazil, Ed., Limerick, Ireland, 6–9 Jul. 2005, pp. 406–409.
-  A. Winnard, D. Debuse, M. Wilkinson, L. Samson, T. Weber, and N. Caplan, “Movement amplitude on the functional re-adaptive exercise device: deep spinal muscle activity and movement control,” European Journal of Applied Physiology, pp. 1–10, 2017.
-  AD Instruments. (2017). [Online]. Available: https://www.adinstruments.com/products/labchart
-  R. Höldrich and P. Vickers, “nuson-DSSon: Direct segmented sonification,” Jul. 2017, DOI: 10.5281/zenodo.1007784. [Online]. Available: https://github.com/nuson/DSSon