Classification of ALS patients based on acoustic analysis of sustained vowel phonations

12/14/2020 ∙ by Maxim Vashkevich, et al. ∙ 0

Amyotrophic lateral sclerosis (ALS) is incurable neurological disorder with rapidly progressive course. Common early symptoms of ALS are difficulty in swallowing and speech. However, early acoustic manifestation of speech and voice symptoms is very variable, that making their detection very challenging, both by human specialists and automatic systems. This study presents an approach to voice assessment for automatic system that separates healthy people from patients with ALS. In particular, this work focus on analysing of sustain phonation of vowels /a/ and /i/ to perform automatic classification of ALS patients. A wide range of acoustic features such as MFCC, formants, jitter, shimmer, vibrato, PPE, GNE, HNR, etc. were analysed. We also proposed a new set of acoustic features for characterizing harmonic structure of the vowels. Calculation of these features is based on pitch synchronized voice analysis. A linear discriminant analysis (LDA) was used to classify the phonation produced by patients with ALS and those by healthy individuals. Several algorithms of feature selection were tested to find optimal feature subset for LDA model. The study's experiments show that the most successful LDA model based on 32 features picked out by LASSO feature selection algorithm attains 99.7 with 99.3 small number of features, we can highlight LDA model with 5 features, which has 89.0

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease involving the upper and lower motor neurons. There are two main forms of ALS which differ by onset: spinal form (first symptoms manifest in the arms and legs) and bulbar form (voice and/or swallowing difficulties are often the first symptoms). Progressive bulbar motor impairment due to ALS leads to deterioration in speech and swallowing function 

Green et al. (2013). The abnormalities in speech production, phonation and articulation due to neurological disorders is referred to as dysarthria. Dysarthria develops in more than 80% of affected by ALS individuals at some point during the disease’s course Duffy (2013). Currently the diagnosis of ALS is based on clinical observations of upper and lower motor neuron damage in the absence of other causes. Due to the lack of clinical diagnostic markers of ALS, the pathway to correct diagnosis in average takes 12 months Iwasaki et al. (2001).

During the last years objective evaluation of voice and speech has gained popularity as a means of detecting early signs of neurological diseases Benba et al. (2016); Rusz,J. et al. (2011); R. et al. (2016). It can be explained by the fact that speech is accomplished through complex articulatory movements, requires precise coordination and timing and therefore is very sensitive to violations in the peripheral or central nervous system Gomez-Vilda et al. (2015); Castillo Guerra and Lovey (2003). Recent studies suggested that acoustic voice and speech analysis might provide useful biomarkers for diagnosis and remote monitoring of ALS patients Norel et al. (2018); Spangler et al. (2017). The advantage of using voice/speech signals is the capability of using smartphone or tablet for recording patients at home conditions without the logistical difficulty in a clinical environment Benba et al. (2016); An et al. (2018).

The main goal of this work is automatic detection of ALS patients with or without bulbar disorders (i.e. classification of healthy controls vs. patients with ALS) based on sustained vowel phonation (SVP) test. Our long-term aim is to build automated system for classification of neuromotor degenerative disorders based on analysis of SVP test. Therefore, we consider the problem of binary classification of the voice recording to be belonging to ALS patient or healthy person as a first step toward this aim. We chose sustained vowel phonation test among different diagnostic speech tasks due to its simplicity and wide spreading in medical practice. Besides all, recent research shows Tsanas et al. (2012) that using SVP test it is possible to detect persons with Parkinson’s disease. This give us hope that this test can be effective for ALS detection.

Sustained phonation is a common speech task used to evaluate the health of the phonatory speech subsystem Rusz,J. et al. (2011). By using SVP test the following characteristics of voice can be assessed: pitch, loudness, resonance, stain, breathiness, hoarseness, roughness, tremor, etc Benba et al. (2016); Castillo Guerra and Lovey (2003); Gómez-García et al. (2019). However, it can be argued that some of the vocal abnormalities in continuous speech might not be captured by use of sustained vowels, but the analysis of continuous speech is much more complex due to articulatory and other linguistic confounds Gómez-García et al. (2019). One more argument is that the use of sustained vowels is commonplace in clinical practice Baken and Orlikoff (2000). Besides all this, early study Silbergleit et al. (1997) had been reported that abnormal acoustic parameters of the voice were demonstrated in ALS subjects with perceptually normal vocal quality on sustained phonation. Also in van der Graaff et al. (2009) it was reported that glottic narrowing due to vocal cord dysfunction (that can be assessed using SVP test) is one of ALS symptoms.

SVP test is widely used for detecting and diagnosing of different neurological diseases such as Parkinson’s, Alzheimer, Dystonia and others Benba et al. (2016); Rusz,J. et al. (2011). For example, it has been shown in Tsanas et al. (2012)

that classifier based on the features extracted form SVP test allows one to discriminate Parkinson’s disease subjects from healthy controls (HC) with almost 99% overall classification accuracy. However, there are few studies dedicated to the detection of the ALS based on SVP test. In 

Castillo Guerra and Lovey (2003) SVP was used along with the other speech tests for dysarthria classification. Sustained phonation also was used for assessing laryngeal subsystem within a comprehensive speech assessment battery in Yunusova et al. (2013). But in the majority of prior works running speech test that consist in reading of specially-designed passage was used for ALS detection Norel et al. (2018); An et al. (2018); Mujumdar and Kubichek (2010); Illa et al. (2018). In Spangler et al. (2017) rapid repetition of syllable (pa/ta/ka), which is often referred to as diadochokinetic task (DDK) was used for automatic ALS detection. Some studies use kinematic sensors to model articulation for ALS detection Bandini et al. (2017), however this approach is invasive in nature and less attractive compared to non-invasive speech test.

The purpose of this work is to investigate the possibility of designing a classifier for detection of patients with ALS based on the sustained phonation test. Traditionally, vowel /a/ is used in SVP test, however, in our study along with /a/ we have used vowel /i/. This decision is based on preliminary results of works Lee et al. (2017); Vashkevich et al. (2018, 2019), that provide evidence that information contained in these vowels might allow to obtain classifier with high performance. This work is based on the analysis of the sustained phonation of vowels /a/ and /i/, in contrast to other studies that extract vowels from running speech tests (see e.g. Vashkevich et al. (2018, 2019)).

The remainder of the paper is organized as follows; Section 2 provides information about methods of acoustic analysis used for feature extraction. The voice data used in this study along with various methods of feature selection, classification and validation are presented in section 3. In section 4 we present the results of our findings and discuss the interpretation of them. Section 5 provides conclusion on the work.

2 Acoustic analysis

Bulbar system that is affected by ALS is considered as a part of the larger speech production network and comprises of four distinct subsystems Green et al. (2013): respiratory, phonatory, articulatory, and resonatory. In this short review of acoustic features, we indicate which subsystem is described by each feature.

2.1 Perturbation measurements

2.1.1 Jitter

Jitter (i.e. frequency/period perturbation) is the measure of variability of fundamental period from one cycle to the next. As far as jitter estimates short-term variations it can not be accounted to voluntary changes in F0. Therefore jitter is intended to provide an index of the stability of the phonatory subsystem. High level of jitter results from diminished neuromotor and aerodynamic control 

Baken and Orlikoff (2000). The jitter has been used as an indicator of the voice quality that characterizes the severity of dysphonia Miller and Moerman (2013). In this study we have used following popular jitter measures Kasuya et al. (1982):

1) local jitter () that is defined as average difference between consecutive periods, divided by the average period:

(1)

where is the duration of -th fundamental period in seconds, is the number of extracted periods;

2) period perturbation quotient () to quantify the variability of pitch period evaluated in consecutive cycles:

(2)

In this work, we used the parameter equal to 3, 5 and 55.

2.1.2 Shimmer

Shimmer is an amplitude perturbation measure that characterize the extent of variation of expiratory flow during the phonation. This feature can be considered as characteristic of the respiratory subsystem. Basic shimmer measure () is defined as average absolute difference between the amplitude of consecutive periods, divided by the average amplitude:

(3)

where is the amplitude of the -th pitch period.

fall under influence of long-term changes in vocal intensity Baken and Orlikoff (2000). To eliminate the effects of amplitude “drift” and get a truer index of underlying shimmer it has been suggested to measure amplitude perturbation quotient (APQ) Kasuya et al. (1982). APQ quantify whether the amplitude of pitch period duration is smooth over consecutive cycles:

(4)

Typically the parameter takes value 3, 5, 11 or 55 Rusz,J. et al. (2011); R. et al. (2016); Moran et al. (2006). We used all of those options in our study.

2.1.3 Directional perturbation factor

Directional perturbation factor (DFP) is a measure of perturbation that ignores the magnitude of period perturbation: it depends on the number of times that frequency changes shift direction Baken and Orlikoff (2000). The DFP calculation consists of two steps. At the first step the difference between adjacent fundamental periods is calculated:

(5)

At the second step the number of sign changes in sequence of is calculated:

Finally, DFP parameter is obtained as follows:

(6)

where is the total number of fundamental periods.

2.2 Noise measurements

The existence of noise energy, broadly understood as that outside of harmonic components during sustained phonation, is the result of incomplete closure of the vocal folds during the phonation, indicative of an interruption of the morphology of the larynx Moran et al. (2006). We used two different noise measurements: harmonic-to-noise ratio (HNR) Boersma (1993) and glottal-to noise excitation ratio (GNE).

2.2.1 Hnr

The HNR measures the ratio between periodic (or harmonic) component and non periodic (or noise) component of the voice signal. Sonorant and harmonic voices are characterized by high HNR values. A low HNR denotes that voice comprise increased amount of noise. For calculation of HNR we used mathematical background presented in Boersma (1993). At the beginning, for a voice signal a normalized autocorrelation function is calculated. Then, the first local maximum outside 0 (with corresponded lag ) is searched. The normalized autocorrelation represents the relative power of the periodic component of the signal (while full power ). Finally, HNR is calculated as

(7)

2.2.2 Gne

GNE measures the amount of excitation in voice due to the vibration of the vocal folds relative to the excitation noise due to the turbulence in the vocal tract Orozco-Arroyave et al. (2015). The GNE is often associated with the breathiness Castillo Guerra and Lovey (2003); Awan (2009) and therefore can be considered as characteristics of phonatory subsystem.

Calculation of the GNE is based on the correlation between Hilbert envelopes of three different frequency channels Michaelis et al. (1997). Since full band signal simultaneously excited by a single glottis closure the envelopes in all channels have the same shape. This leads to high correlation between envelopes. However, in case of turbulent signals a narrowband noise is excited in each frequency channel. These narrow band noise signals are uncorrelated. Thus, interband correlation can be used to measure the amount of turbulence in a signal.

Calculation of the GNE factor is consist in the following steps:

  1. Down sampling the signal to 8 kHz.

  2. Divide signal into 30 ms overlapping frames with 10 ms hop size. For each frame execute steps 3–7.

  3. Inverse filtering of the signal by calculating the linear prediction error signal, using a predictor of 10-th order estimated by the autocorrelation method Huang et al. (2001) with Hamming window.

  4. Calculating the Hilbert envelopes of three different frequency bands with 1000 Hz bandwidth and central frequencies at 500, 1500 and 2500 Hz.

  5. Calculating the cross correlation function between every pair of envelopes for which the difference of their center frequency is equal or greater than half the bandwidth.

  6. Pick the maximum of each correlation function.

  7. The GNE for the current frame is equal to the maximum of the maximums obtained in step 6.

  8. Compute the mean value of GNE and its standard deviation.

2.3 Spectral parameters

2.3.1 Mfcc

Mel-Frequency Cepstral Coefficients (MFCCs) is the most widely used feature in speech-related applications such as speaker identification and recognition. Moreover, recent studies have shown promising results on the identification of voice pathology with MFCCs  Benba et al. (2016); Spangler et al. (2017); Tsanas et al. (2012); Godino-Llorente et al. (2006). MFCCs can detect subtle changes in the motion of the articulators (tongue, lips), which are known to be affected in many neurological diseases Tsanas et al. (2012). They have been used for detecting of hypernasality due to the velopharyngeal insufficiency in Dubey et al. (2018). In Orozco-Arroyave et al. (2015) the usage of MFCCs is argued by its ability of modelling changes in the speech spectrum, especially around the first two formants (F1 and F2), where most of the energy of the signal is concentrated. The work Godino-Llorente et al. (2006) showed that MFCCs have an inherent ability to model an irregular movement of the vocal folds, or a lack of closure due to a change in the properties of the tissue covering vocal folds. Therefore MFCCs can be considered as parameters describing both resonatory and articulatory subsystems.

MFCC parameters Godino-Llorente et al. (2006); Huang et al. (2001) are obtained by applying discrete cosine transform over the logarithm of the energy calculated in several mel-frequency bands:

(8)

where is the number of uniform frequency bands in the mel scale, , and is the order of the MFCC coefficients. The energy of frequency bands are calculated using -point magnitude spectrum of the frame of the voice signal:

(9)

where is the triangular weighting function Huang et al. (2001) associated with -th band.

In our study we used MFCC parameters that computed within windows of 25-ms length and 10-ms time shift. Magnitude spectrum is calculated in the range Hz and averaged within uniform mel-frequency bands (see (9)). The first () derivatives of MFCC have also been calculated since they provide information about the dynamics of the time-variation in MFCC parameters. A priori, these features can be considered as significant because the disorder lowers stability of the voice signal Godino-Llorente et al. (2006); therefore lager time-variations of the parameters may be expected in ALS voice relative to normal voice.

Because the MFCCs are a timeseries, we averaged the MFCCs across the time domain to consolidate them to a single set of coefficients. Finally 12 MFCCs and 12 MFCCs are evaluated for each voice recording.

2.3.2 Formants

Changes of formant frequencies during the vowel phonation due to dysarthria have been reported in many studies Lee et al. (2017); Kent et al. (1999); Tomik and Guiloff (2010); Turner et al. (1995); Weismer,Gary et al. (1992). The most frequently reported abnormalities of vowel production include: centralization of formant frequencies Lansford and Liss (2014), reduction of the vowels space area Turner et al. (1995), and abnormal formant frequencies for high vowels and front vowels Vashkevich et al. (2018); Kent et al. (1999). In Weismer,Gary et al. (1992) it was shown that in patients with ALS measurement of the F2 slope (or F2 transition) is correlated with overall speech intelligibility score. Also features derived from statistics of the first (F1) and second (F2) formant frequencies (and their trajectories) have shown good performance for predicting speaking rate decline in ALS Horwitz-Martin et al. (2016). Though SVP test cannot reflect the dynamics of formant frequency trajectories, we still can use the values of formant frequencies as source of information. In Vashkevich et al. (2019); Lee et al. (2019) it was shown that the value of F2 for vowel /i/ appears to be a good feature for discriminating between patients with ALS and healthy control group. In this study we use second formant of vowel /i/ () and Euclidean distance (convergence) between the vowels /i/ and /a/:

(10)

Study Lee et al. (2017) have shown that convergence of the F2 of vowels /i/ and /a/ is much stronger in speakers with dysarthria due to ALS, than in healthy speakers. Both features ( and ) are prove to be a highly informative for ALS detection using running speech test Vashkevich et al. (2019).

2.3.3 Distance between the spectral envelopes of the vowels

In Vashkevich et al. (2018) it was suggested to use distance between the spectral envelopes of the vowels /a/ and /i/ to quantify the amount of articulatory undershoot. The joint analysis of envelopes of vowels /a/ and /i/ of persons with ALS have revealed an increased similarity between the shapes of these envelopes. The similarity between the envelopes is assessed using -norm distance metric

(11)

where is envelope of the vowel /i/, is envelope of the vowel /a/,

is the number of points in frequency domain. The spectral envelopes of the vowels were estimated using all-pole modelling 

Huang et al. (2001). An example of vowel envelopes from healthy individual are shown in figure 1,a. A typical example of envelopes with a high degree of similarity is given in figure 1,b.

(a) (b)
Figure 1: Envelopes of vowels /a/ and /i/: (a) healthy speaker; (b) ALS patient

2.4 contour based parameters

2.4.1 Phonatory frequency range

Phonatory frequency range (PFR) is defined as semitone difference between lowest () and highest () fundamental frequencies Moran et al. (2006):

(12)

This parameter measures the degree of variability in fundamental frequency contour and characterizes the functioning of the phonatory subsystem.

2.4.2 Pitch period entropy

Pitch period entropy (PPE) is a highly informative feature proposed in Little et al. (2008) to assess the degree of loss of control over the stationary voice pitch during sustain phonatition (due to Parkinson’s disease). We have used this measure in our study since the ALS also affects the ability to control the stability of voice pitch.

The calculation of PPE is based on the following observations: 1) the healthy voice has natural pitch variation characterized by smooth vibrato or microtremor Baken and Orlikoff (2000); Little et al. (2008); and 2) speakers with naturaly high-pitch voices have much lager vibrato and microtremor than speakers with low-pitch voices. PPE measurement takes into account both these factors. The natural smooth variations is removed prior to measuring the extent of such variations (first factor) and pitch transformation to perceptually-relevant, logarithmic semitone scale is applied (second factor). The algorithm of PPE calculation used in this study is given below.

  1. Estimation of contour with 5 ms time step using IRAPT algorithm Azarov et al. (2012);

  2. Transformation of contour to semitone scale:

    (13)

    where is lower octave band limit, calculated considering that mean value of pitch correspond to center of this octave:

  3. Applying whitening filter to signal to remove healthy, smooth variation:

    (14)

    where is linear prediction coefficients (LPC) estimated using covariance method Huang et al. (2001), is the predictor order. We used ;

  4. Calculation of discrete probability distribution of occurrence of relative semitone variations

    by computing normalized histogram in equal-sized bins () in the range form to ;

  5. Calculation the entropy distribution obtained on previous step:

    (15)

The larger the measure of entropy, the more the observed variations exceed the natural level of variation of the fundamental frequency in a healthy voice. The fig. 2 give an example that illustrates the process of calculation of PPE measure.

Figure 2: Details of PPE calculation, left column: healthy subject, right column: ALS patient. Rows from the top: extracted F0, pitch in semitone scale, residual signal

after spectral whitening filter, probability densities

of residual pitch period

Figure 2 shows that semitone pitch sequence of healthy voice is quite stable and has signs of small regular vibrato. After eliminating this healthy vibrato with whitening filter, the distribution of residuals shows strong peak at zero. This leads to small value of entropy. On the contrary, for ALS voice the semitone pitch sequence has significant irregular variation, the distribution of residuals is spread over a wider range as a result the larger value of entropy is obtained.

2.4.3 Tremor (vibrato) analysis

Vocal tremor is involuntary quasi-sinusoidal modulation in energy and F0 contour appeared during sustained phonation Peplinski et al. (2019). In our study we consider only the modulation in F0 contour. Some authors distinguish wow (oscillation of 1-2 Hz), tremor (oscillation of 2-10 Hz) and flutter (oscillation of 10-20 Hz) Kent et al. (1999). An example of vowel phonation for a patient with a rapid tremor (or flutter) is given in figure 3,b (the voice is taken from the database used in experiments).

(a) (b)
Figure 3: Time-frequency representation of vowel phonation /a/: (a) speaker from HC group (men, 60 years old); (b) ALS patient (men, 67 years old, subject code in voice base 039)

An essential distortion can be seen when compared spectrogram of a ALS patient (figure 3,b) with spectrogram of a normal subject (figure 3,a). In particular, in figure 3 narrowband spectrograms are shown (long 84 ms analysis window have been used for their calculation). Thus it can be seen substantial changes in harmonics behaviour. Normal voice shows stable harmonics with low variation, while harmonics of pathological voice exhibiting high frequency quasi-sinusoidal modulations.

In Peplinski et al. (2019) in order to characterize the tremor the average spectra of F0 contour is analysed in frequency band from 3 to 25 Hz. However, as reported in Aronson et al. (1992) the most essential frequency peaks of person with ALS lies within the range 6 to 12 Hz. It seems that sum of the amplitudes of spectral components in frequency band Hz could be a good feature for detection of ALS voices. However, normal voices also have inherent modulations (some times called vibrato) in range 5 to 8 Hz Nakano et al. (2006). Thus vibrato frequency bands of healthy and ALS voices are overlapped. So, for obtaining a new feature, that characterizes the extent of pathological modulations in F0 contour we decide to analyse the amplitudes of spectral components in range from 9 to 14 Hz. The obtained feature is referred to as pathological vibrato index (PVI) and presented in Vashkevich et al. (2019). The algorithm for PVI calculation is given below

  1. Estimation of contour with 5 ms time step using IRAPT algorithm Azarov et al. (2012);

  2. Normalization of F0 contour:

    (16)
  3. Bandpass filtering of using 3-th order Butterworth IIR with pass band Hz;

  4. Amplitude spectrum estimation using Welch’s method with windows of 1 sec length and 95% overlap;

  5. Calculation of pathological vibrato index:

    (17)

Figure 4 shows the steps of the PVI calculation for a typical normal and pathological case. It can be seen that frequency components of amplitude spectrum in the range from 9 to 14 Hz are significantly higher for the ALS voice than for a healthy voice.

Figure 4: Left column: normal case, right column: pathological case. Rows from the top: Extracted F0, normalized F0 contour, IIR filtered F0 contour, amplitude spectrum , amplitudes used for PVI calculation are indicated by red x-marks

2.4.4 Analysis of the harmonic structure of the vowels

Harmonic structure of sustained vowel has been recognized as a important and informative feature for voice pathology identification Castillo Guerra and Lovey (2003); Cordeiro and Meneses (2018). Incomplete glottal closure during phonation, which allows the air to escape, is one of the factors that makes voice more breathy. In particular, for vowel /a/ this produce a disturbance of harmonic structure: amplitude of first harmonic (H1) becomes higher than the second (H2) Cordeiro and Meneses (2018).

One of the important aspect of voice quality is stability of harmonics structure during the phonation process. Evaluation of harmonics structure can be considered as feature for description the excitation source (a driving force for voice production). The difficulty in estimation of harmonic parameters is that they depend on the fundamental frequency F0. In this study we have used voice analysis based on fixed number of fundamental periods (alternatively it can be considered as pitch synchronized voiced analysis). We focused on extracting mean and standard deviation (SD) of the first eight harmonics of the vowels. Given a voice signal the analysis process can be summarized in the following steps.

  1. Split into fundamental periods using waveform matching method with phase constrain Vashkevich et al. (2019).

  2. Divide into overlapping frames that containing fundamental periods with one period overlap. For each frame , execute steps 3–5.

  3. Interpolate into equidistant time points: .

  4. Apply Hamming window

    to interpolated frame and compute discrete Fourier Transform (DFT):

    .

  5. Extract harmonic amplitudes:

  6. Scale the harmonic amplitudes as

  7. Compute mean and SD for scaled harmonic amplitudes

  8. Compute additional feature – inverse of the sum of absolute value of and :

    (18)

The intuition behind the feature (18) is that strong and stable harmonic should have low scaled amplitude and low deviation and therefore high value of .

In this study the following parameters of the procedure were used: and .

3 Experiments

3.1 Database

Voice database used in this study was collected in Republican Research and Clinical Center of Neurology and Neurosurgery (Minsk, Belarus). It consists of 128 sustained vowel phonations (64 of vowel /a/ and 64 of vowel /i/) from 64 speakers, 31 of which were diagnosed with ALS. Each speaker was asked to produce sustained phonation of vowels /a/ and /i/ at a comfortable pitch and loudness as constant and long as possible. It can be seen that voice database is almost balanced and contains 48% of pathological voices and 52% of healthy voices.

The age of the 17 male patients ranges from 40 to 69 (mean 61.17.7) and the age of the 14 female patients ranges from 39 to 70 (mean 57.37.8). For the case of healthy controls (HC), the age of the 13 men ranges from 34 to 80 (mean 50.213.8) and the age of the 20 females ranges from 37 to 68 (mean 56.19.7). The samples were recorded at 44.1 kHz using different smartphones with a regular headsets and stored as 16 bit uncompressed PCM files. Average duration of the records in the HC group was 3.71.5 s, and in ALS group 4.12.0 s. The detailed information about ALS patients is presented in table 1. All the patients were judged by the neurologist (the second author) to have presence of bulbar motor changes in speech (last column of the table 1).

Subject code Sex Age Time from ALS onset (months) Bulbar/ spinal onset Presence of the bulbar signs
008 M 67 28 bulbar yes
020 F 57 35 spinal no
021 F 55 15 spinal yes
022 F 70 11 bulbar yes
024 M 66 16 spinal no
025 M 51 7 spinal no
027 M 57 18 bulbar yes
028 M 58 5 spinal yes
031 M 67 6 spinal yes
032 M 61 19 spinal yes
039 M 67 12 bulbar yes
042 M 67 22 spinal yes
046 F 50 12 spinal yes
048 F 63 22 bulbar yes
052 F 62 36 spinal no
055 M 61 11 spinal yes
058 M 58 9 bulbar yes
062 M 57 23 bulbar yes
064 M 57 58 spinal yes
068 M 40 11 bulbar yes
072 F 64 10 spinal yes
076 M 68 12 bulbar yes
078 F 64 12 bulbar yes
080 F 63 20 bulbar yes
084 F 55 33 bulbar yes
092 F 39 57 spinal no
094 F 55 14 spinal no
096 F 52 14 spinal yes
098 M 68 37 spinal yes
100 M 68 16 bulbar yes
102 F 53 25 spinal no
Table 1: ALS participants clinical records

3.2 Aggregation of feature set and its statistical survey

For each vowel used in SVP test 64 features are extracted (see figure 5). These features include the following groups (the number of parameters in each group is indicated in parentheses): jitter (4), shimmer (5), DPF(1), HNR (1), GNE (mean and SD), PFR(1), PPE(1), PVI (1), (8), (8), (8), MFCC (12),  MFCC (12).

Figure 5: Features extracted from SVP test of one vowel

We also used three additional parameters , and (extra feature for vowel /i/). Thus the total number of features used in this study was 131 (64 for vowel /a/, 64+1 for /i/ and 2 joint parameters). In most cases we have used lower subscript to indicate the vowel for which feature was calculated. For example, is SD of 2nd harmonic of vowel /i/ phonation.

In order to get initial understanding of the statistical properties of the features, we computed the Pearson correlation coefficient

, where the vector

contains the values of a single feature for all phonations, and is the associate labels (“” for healthy subject, “” – for ALS patient).

3.3 Feature selection

It is known that reducing the number of features often improves the model’s predictive power. Also the reduced feature subset give better insight into the problem via analysis of the most predictive features Flach (2012).

In this study we used four efficient feature selection (FS) approaches: 1) maximization of quality of variation (QoV) Liu and Gillie (2011), 2) Relief Kira and Rendell (1992) 3) least absolute shrinkage and selection operator (LASSO) Tibshirani (1994), 4) RelieFF Kononenko et al. (1997). Maximization of QoV is a noise-resistant method for feature selection based on order statistics. The basic notion of this method is class impurity – characteristic that is calculated for each feature based on its order statistics. The quality of variation of a feature is inverse of the average impurity of all the classes along the feature. This method allows one to rank all features according to the QoV criterion. It has been show that QoV method performs well when the available training data is small or not much bigger compared to the dimensionality of feature vector Liu and Gillie (2011)

. LASSO is a linear regression based technique that minimizes the residual sum of squares subject to the absolute value of the coefficient being less than a constant. This leads to some coefficients that are shrunk to zero, which in essence means that feature associated with those coefficients are eliminated. In order to rank the features using LASSO we repeat its computation with different values of regularization parameter

in order to track the order in which features are eliminated. The first eliminated feature is considered as least informative while the last as the most relevant. The key idea of Relief is to estimate features according to how well their values distinguish among the instances that are near to each other. Original Relief algorithm estimates relevance of feature for a given instance by analysis closest neighbors: one from the same class (nearest hit) and one from the opposite class (nearest miss). Advanced version RelieFF extends this idea to k nearest neighbors. Overall, all four feature selection algorithms have shown promising results in machine learning application.

3.4 Classification

For a binary classification between normal and pathological classes, linear discriminant analysis (LDA) with Fisher criterion was used Hastie et al. (2001). The basic idea of LDA consists in searching for such a direction in the feature space, that the projection of all training vectors onto it minimizes the within-class variation and maximizes the between-class variation:

(19)

where – between class scatter matrix and – within class scatter matrix. In turn these matrices are calculated as follows

(20)
(21)

where – feature vectors from training set, – mean value of feature vector for healthy people and – mean value of feature vector for people with ALS. The solution of (19

) can be found via the generalized eigenvalue problem

(22)

where the eigenvector associated with maximum eigenvalue

gives the projection basis. Classification function of LDA is formulated as follows

(23)

where is a bias. In the experiments, the value of was chosen in a such way that the number of correctly detected positive and negative instance in the training set was equal. More detailed description of LDA can be found in Hastie et al. (2001).

3.5 Classifier Validation

The goal of validation is to estimate of the generalization performance of the classification based on the selected set of features, when presented new (previously unseen) data. Most studies use cross-validation to achieve this goal Tsanas et al. (2012); Orozco-Arroyave et al. (2015); Cordeiro and Meneses (2018).

In this work we used -fold stratified cross validation (CV) method Kohavi (1995), with equal to 8. According to this method at the beginning of the CV process dataset randomly permuted and then splits into eight equal subsets (folds) (), the folds are stratified so that they contain approximately the same proportions of labels as original dataset. At first iteration classifier is trained using subsets , while testing is conducted using subset. Then training is repeated using subsets, and classifier tested using subset, and so on. After 8 iteration whole dataset is labelled using eight classifiers. This process was repeated a total of 40 times. The classification performance is evaluated in terms of the mean and standard deviation of the accuracy on the test set across all folds.

Accuracy, sensitivity, and specificity were used in this study to measure the classification performance. Accuracy is the overall probability of correctly classified instance over the total number of instances. Sensitivity is the probability of correctly classified ALS patients given all ALS samples and specificity is probability of classified HC given all HC samples. Accuracy, sensitivity, and specificity are calculated as follows:

where , , , – the number of true positive, true negative, false positive and false negative results of classification, respectively. In this case, positive means a prediction that the voice sample is produced by a speaker with ALS.

4 Results

4.1 Preliminary statistical survey

Table 2 presents the several features most strongly associated with the labels in dataset, sorted by the absolute value of the correlation coefficient Hastie et al. (2001). We used label “” for healthy controls and “” for people with ALS. Thus, positive correlation coefficient suggest that the feature takes, in general, larger value for ALS voices. All of the listed features exhibit statistically significant correlation ().

Feature Correlation coefficient Difference between groups
0.456
0.446
0.422
0.418
0.390
0.381
0.371
0.361
0.351
0.347
0.346
0.335
0.324
0.321
0.311
0.302
0.285
0.282
0.282
0.273
0.250
Table 2: Statistical analysis of the acoustic features

According to the table 2 the most relevant features are and . The distance between spectral envelopes of the vowels /a/ and /i/ () has the strongest correlation with the labels in the dataset. The negative sign of its correlation coefficient means that the smaller distance , the more likely that voice belongs to the category of ALS patients. has almost the same strong correlation as spectral distance . It is well known that, low-order MFCC describes the spectral envelope of the sound, therefore it can be concluded that patients with ALS usually have significant changes in spectral envelope of the vowel /i/.

Parameters and (third and fourth rows in the table 2) also have high correlation with the labels in the dataset. This fact indicates that, as a result of neuromotor disorders in patients with ALS, oscillations uncharacteristic for healthy people appear in the F0 contour, which lead to increasing of and . It is interesting that along with parameter (ninth row) is also presented in table 2 and has high correlation coefficient, while does not show a statistically significant correlation (). This may indicate that less depends on type of analyzing vowel than and better reflects changes associated with a decrease in the control over fundamental frequency in patients with ALS.

Five features (, , , and ), among twenty-one of those listed in Table 2, relate to parameters that describe harmonic structure of the vowel. This suggests that these parameters could be useful for accurate voice classification.

We also estimated distribution of several features listed in table 2 using Gaussian kernel density method to characterize their statistical properties. Figure 6,a shows the distribution of distance between spectral envelopes of the vowels /a/ and /i/ (first row in table 2). As expected, on average this feature has lower value for ALS patients than for healthy subjects.

(a) (b)
Figure 6: Box plot and probability densities of (a) ; (b)

The distribution of 2nd MFCC of vowel /i/ that has a strong correlation with labels in the dataset is given in figure 6,b. As stated above the differences in indicate changes in the spectral envelope of the vowel /i/ in patients with ALS. Among the others, this can be seen from the changes of the second formant frequency of the vowel /i/. From table 2 we see that a lower value of is typical for patients with ALS. This observation is consistent with previous findings in this area Lee et al. (2017, 2019). Let us consider scatter plot of the pairs of and for healthy and pathological voices (see figure 7). It can be seen that for healthy voices and are weakly correlated (i.e. they not set out along slanting line). In contrast, for the voices of patients with ALS, it can be seen that and are strongly correlated (points are grouped along slanting line). Thus high relevance of the is likely caused by the fact that it reflects the changes in second formant of vowel /i/ in patients with ALS.

(a) (b)
Figure 7: Scatter plots of pairs of and showing low correlation for healthy voices (a) and high correlation for ALS voices (b)
(a) (b)
(c) (d) (e) (f)
Figure 8: Box plot and probability densities of : (a) ; (b) ; (c) ; (d) ; (e) ; (f)

Figure 8,a-b illustrate distributions of and features. Both of them characterize the excess of variability in a pitch contour and have high correlation coefficients (3-rd and 4-th rows of table 2). Comparing boxplots of the and

parameters, we can see that the first quartile of

for pathological voices is located at the level of the median of the for healthy voices. In turn, the first quartile of for pathological voices exceeds the third quartile of for healthy voices. This indicates that has stronger discriminatory power than .

Examples of features that characterized harmonic structure of voice are given in figure 8,c-d. Both features are strongly associated with the labels in the dataset. As expected, distributions figure 8,c indicate that has tending to have higher value for healthy voices. Figure 8,d shows that 8-th harmonic of vowel /a/ has lower mean value in ALS group.

Figure  8,e-f illustrates the distributions of the MFCC and delta MFCC that have high correlation with labels in the dataset (rows 7 and 14 in table 2). Boxplot in figure 8,f shows that have almost symmetrical distribution with median greater than zero for ALS voices, while for healthy voices this parameter have asymmetrical distribution with near zero median.

The presented findings give tentative confidence that we can expect good results for the classification problem of this study.

4.2 Classification results and discussion

In our experiments we computed the accuracy (see section 3.5) of LDA classifier using different number of features selected by the four FS algorithms described in section 3.3. Figure 9 shows the obtained results.

Figure 9:

Classification accuracy with confidence interval (one standard deviation around the quoted mean accuracy). The results obtained using different feature selection algorithm. For RelieFF algorithm adjustable parameter

was used.

The analysis of figure 9 shows that performance of all FS algorithm is quite similar while the number of features is less than 6. However, for

LASSO demonstrates significantly better performance in comparison with other approaches. Possible explanation of this fact is that mathematical principles of LASSO regression are in accordance with the discriminant function (

23) of the LDA classifier.

The optimal size of the feature vector is equal to 43 and it was achieved using the LASSO approach. The accuracy obtained in this case is 97%. This result considerably outperforms the others. For example, the best accuracy of LDA classifier with QoV FS algorithm is 79% and was achieved using 4 features. The best results obtained with RelieFF and Relief algorithms are even smaller – 76% and 72%, accordingly.

It is always desirable to have a classifier with a low number of features, therefore we applied backward-step selection procedure Flach (2012) to reduce the number of features picked out by FS algorithms. The backward-step selection starts with LDA model that used best feature subset found by FS algorithm, and sequentially deletes the feature that has low (or negative) impact on the fit. The result of the described feature selection process is summarized in table 3.

FS algorithm Accuracy with initial subset Accuracy after backward- stepwise selection
QoV % %
Relief % %
LASSO 97.02.4% 99.70.6%
RelieFF % %
Table 3: Classifiers accuracy obtained using different feature selection (FS) algorithms. The resulting number of features is given in parentheses.

Result shown in table 3 demonstrate that the best accuracy for LDA classifier is obtained using feature selected by the LASSO algorithm. Also it can be noted that backward-stepwise selection (BSS) is effective in reducing the number of features and increasing the accuracy of classifier. The most noticeable result, in this regard, is increasing the accuracy of LDA model with feature subset selected using RelieFF algorithm by 7 %, while reducing the number of features by 9. Nevertheless, feature subsets found by QoV, Relief and RelieFF algorithms with application of BSS procedure give the resulting accuracy considerably lower compared to feature subset selected using LASSO algorithms.

Table 4 lists final subsets of features selected using FS algorithms (with application of BSS). The obtained accuracy, sensitivity and specificity for each cases are also given in table 4.

FS algorithm Features Accuracy results (%)
QoV () , , ,
Relief () , , , ,
LASSO () , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
RelieFF () , , , , , , , , , ,
Low order model
10 best LASSO features +BSS () , , , , ,
Table 4: Selected feature subsets and classification accuracy

The analysis of tables 2 and 4 leads to the logical question: why statistically significant feature was not selected by LASSO FS algorithm? Detailed analysis have revealed that and have strong correlation ( with ), thus, MFCCs already contain information possessed in feature. Another question: why such significant features like and were not selected by neither algorithm? First of all these features are highly correlated ( with ), thus the location of is more relevant rather than its proximity to . Furthermore, is strongly correlated with and ( and , accordingly), therefore the information about can be passed to classifier with any of these parameters. LASSO and QoV algorithms have selected that contained information about location, while RelieFF algorithm selected for this purpose. A visual example of interplay between and is given in figure 10.

Figure 10: Interplay between features and

Figure 10 shows the estimation of spectral envelopes computed during MFCC calculation and partial reconstruction of envelopes using DC component and 6-th MFCC coefficient. It can be seen that the voice of ALS patient is characterized by reduced frequency of second formant. As a result, projection onto 6th basis function of discrete cosine transform which is used in MFCC calculation is changing sign (if we compare HC and ALS voices).

The result of our study confirm the findings of Tsanas et al. (2012), where MFCC are also found to be highly informative features for Parkinson’s disease detection. However unlike Tsanas et al. (2012) we give interpretation that MFCC reflect changes in second formant of vowel /i/ for ALS patients. Also it should be noted that proposed features extracted using harmonic analysis of the vowels are essential for obtaining good classifier. For example, among the 32 features selected by LASSO, ten describe the harmonic structure. Among the rest features: 9 MFCCs describe the envelopes of vowels (6 for /a/ and 3 for /i/), 7 delta MFCCs reflect variability of vowels envelopes, the GNE parameters gives information about noise content of the voice and PVI describes the changes in vibrato. Another interesting observation is that in subset of features selected by LASSO 19 are related to vowel /a/ and 13 to vowel /i/. It means that information contained in phonation /a/ is relevant and necessary for gaining high classification accuracy. It is interesting that traditional measures such as jitter, shimmer and HNR are out of table  4. This suggests that PVI, MFCC and harmonic structure parameters have greater predictive power for distinguishing between HC and patients with ALS.

Surely, that main goal of classification is most accurate detection of the ALS patients voices. In this regard, LDA model with 32 features and 99% accuracy is a significant result. However, there is reason to believe that this feature set is quite specific for our voice database. More relevant information about parameters that are most important for ALS detection can be derived by analyzing high-performance LDA models with small number of features. Table 4 shows that LDA models obtained using the QoV and RelieFF feature selection algorithms have a small number of features, however they have quite low performance. To find a model with higher performance we took LDA classifier with 10 best feature picked out using LASSO algorithm, which has accuracy 87.62.6% ( = 90.54.1%, = 84.83.0%) and applied back-step selection procedure. As a result LDA model with five features (, , , , ) was obtained, which has accuracy 89.0% (see last row of table  4). Thus, it can be concluded that the most important information for detecting the ALS patients’ voices is contained in the spectral envelopes of sounds /a/ and /i/ (MFCC parameters), as well as in the vibrato changes (PVI).

Norel Norel et al. (2018) Spangler Spangler et al. (2017) An An et al. (2018) Present
Feature Extracted with Open-SMILE toolkit Fractal jitter, MFCC, RPDE + articulatory data filterbank energies + its deltas MFCC, Harmonic parameters, PVI
Total number of features for male 1 for female 15 17 120 32
Classifier linear SVM

Extreme Gradient Boosting

CNN LDA
Verification Leave-five-subject-out CV Leave-one-subject-out CV Leave-one-subject-pair-out CV 8-fold CV
Database 133 speakers (67 ALS, 66 HC), running speech 83 speakers (49 ALS, 34 HC), DDK test 26 speakers (13 ALS, 13 HC), running speech 64 speakers (31 ALS, 33 HC), SVP test
Reported performance for male Acc=79% Sens=76% Spec=70% for female Acc=83% Sens=78% Spec=88% Acc=90.2% Sens=94.2% Spec=85.1% Acc=76.2% Sens=71.5% Spec=80.9% Acc=99.7% Sens=99.3% Spec=99.9%
Table 5: Comparison with other studies

Table 5 presents comparison of the present work with recent similar studies. The purpose of those works was to discriminate between healthy people and ALS patients. The main differences between these studies concern speech tasks, classification approaches, features and verification methods. The most closest result was obtained in Spangler et al. (2017). However, in Spangler et al. (2017) along with voice recording articulatory data was used. In table 5 two different performance results are given for study Norel et al. (2018) because it uses sex-specific features for classifiers to take into account differences in the vocal tracts of males and females. Study An et al. (2018) presents results of two type: sample-level and person level classification. The second type is obtained based on sample voting. In table 5 we compare only sample-level classifiers. However, even person-level classifier based on 5 samples An et al. (2018) has accuracy 90.8%, sensitivity 85.6% and specificity 94.9%. Therefore the obtained result with near 99% of accuracy, sensitivity and specificity based on LDA classifier can be considered as an essential improvement over the previous results.

4.3 Additional experiment: early ALS detection

The following additional experiment has been performed in order to determine validity of LDA models with features extracted from SVP test for early ALS detection problem. From ALS patients were chosen 12 that having been diagnosed less than one year before recordings (see table 1). So, the reduced dataset included 45 speakers (33 HC + 12 ALS).

Using the reduced dataset, we performed feature selection procedures and optimization of feature set as described above. However, in contrast to experiments presented in previous sections, we used leave-one-subject-out (LOSO) cross-validation procedure to evaluate the performance of classifiers. Flach (2012); Hastie et al. (2001). In fact, the LOSO method is a -fold CV procedure, with equal to the size of the dataset. We used LOSO in order to bring closer the size of the samples on which the classifiers are trained in sections 4.2 and 4.3. In section 4.2, where the 8-fold CV was used, the LDA classifier model was trained on 54 samples, in this section, using the LOSO CV method, the classifier is trained on 44 samples.

FS algorithm Features Accuracy results (%)
QoV + BSS () , , , ,
RelieFF () , , , ,
LASSO () , , , , , , , , , , , ,
Table 6: Early ALS detection: selected feature subsets and classification accuracy

LDA model with 5 features and above 80% accuracy has been obtained using QoV feature selection algorithm with BSS procedure (see table 6). Best LDA model obtained using Relief algorithm with BSS procedure has 39 features and 100% accuracy. The same accuracy is achieved by the LDA model using 28 features selected by LASSO algorithm. However, these feature sets (unless they legitimacy) are too specifically fit to our database. We believe that more relevant conclusions can be derived by analyzing models with feature sets of limited size. For example, LDA model trained on the first 5 features selected by RelieFF algorithm has 93,3% accuracy (see table 6). Furthermore, among the LDA models with a small number of features we can highlight one that has 95,6% accuracy and trained on the first 12 features selected by the LASSO algorithm.

Analyzing the features contained in the table 6, we can draw the following conclusions. In all feature sets is present, its relevance is discussed in previous sections (for example, see figure 10). Four out of five features selected by RelieFF algorithm are also included in feature set picked out by LASSO algorithm. This indicates their high significance for early ALS detection. Feature set obtained using the RelieFF algorithm shows that valuable information for early ALS detection is contained in spectral envelopes of the vowels /a/ and /i/ (this information is concentrated in parameters , and , ). Parameter , which indicates the degree of fundamental frequency variation, is also important for early ALS detection. It should be noted that neither of the feature sets contains parameters PVI and PPE, the significance of which was revealed in the previous experiment. This means that changes in the vibrato are not related to the early diagnosis of the ALS, but rather characteristic of later stages of the disease.

5 Conclusion

In this study we investigate the possibility of designing linear classifier for discriminate ALS patients from healthy controls using acoustical sustained vowels /a/ and /i/ phonation tests. A large set of features was analysed. LDA classifier with 99.7% accuracy (99.3% sensitivity, 99.9% specificity) was obtained based on 32 features determined by LASSO feature selection algorithm. We also obtained the LDA model with only 5 features that has 89.0% accuracy (87.5% sensitivity, 90.4% specificity). We found that the most important information for detecting the ALS patients’ voices is contained in the spectral envelopes of sounds /a/ and /i/ (MFCC parameters), as well as in the vibrato changes (PVI). Like in Spangler et al. (2017) traditional jitter measures were found not to have a high importance. We also carried out experiment to determine validity of LDA models with features extracted from SVP test for early ALS detection problem. Our results show that it is possible to obtain LDA model with 93.3% accuracy (83.3% sensitivity, 97.0% specificity) based on only 5 features determined by RelieFF algorithm. We can also draw the conclusion that valuable information for early ALS detection is contained in spectral envelopes of the vowels /a/ and /i/ (MFCC parameters). We also found that the selected feature sets did not contain the PVI and PPE parameters. This means that changes in the vibrato are not related to the early diagnosis of the ALS, but rather characteristic of the later stages of the disease. It should be noted that the data for this study was collected using smartphone with regular headset. Therefore we can assert that proposed approach is tolerant to non-professional recording condition.

Acknowledgements

The authors thank the anonymous reviewers for their useful comments.

References

  • K. An, M. Kim, K. Teplansky, J. Green, T. Campbell, Y. Yunusova, D. Heitzman, and J. Wang (2018)

    Automatic early detection of amyotrophic lateral sclerosis from intelligible speech using convolutional neural networks

    .
    In Proc. of Interspeech 2018, pp. 1913–1917. External Links: Document Cited by: §1, §1, §4.2, Table 5.
  • A. E. Aronson, W. S. Winholtz, L. O. Ramig, and S. R. Silber (1992) Rapid voice tremor, or “flutter,” in amyotrophic lateral sclerosis. Annals of Otology, Rhinology & Laryngology 101 (6), pp. 511–518. External Links: Document Cited by: §2.4.3.
  • S. N. Awan (2009) Instrumental analysis of phonation. In The Handbook of Clinical Linguistics, pp. 344–359. External Links: Document, ISBN 9781444301007 Cited by: §2.2.2.
  • E. Azarov, M. Vashkevich, and A. A. Petrovsky (2012) Instantaneous pitch estimation based on RAPT framework. In Proc. of the 20th European Signal Processing Conference (EUSIPCO), pp. 2787–2791. Cited by: item 1, item 1.
  • R. J. Baken and R. F. Orlikoff (2000) Clinical measurement of speech and voice, 2nd edition. Singular Thomson Learning. Cited by: §1, §2.1.1, §2.1.2, §2.1.3, §2.4.2.
  • A. Bandini, J. Green, L. Zinman, and Y. Yunusova (2017) Classification of bulbar ALS from kinematic features of the jaw and lips: towards computer-mediated assessment. In Proc. of Interspeech, pp. 1819–1823. External Links: Document Cited by: §1.
  • A. Benba, A. Jilbab, and A. Hammouch (2016) Discriminating between patients with Parkinson’s and neurological diseases using cepstral analysis. IEEE Transactions on Neural Systems and Rehabilitation Engineering 24 (10), pp. 1100–1108. External Links: Document Cited by: §1, §1, §1, §2.3.1.
  • P. Boersma (1993) Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Proceedings of the institute of phonetic sciences, Vol. 17, pp. 97–110. Cited by: §2.2.1, §2.2.
  • E. Castillo Guerra and D. F. Lovey (2003) A modern approach to dysarthria classification. In Proc. of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEMBS), Vol. 3, pp. 2257–2260. External Links: Document Cited by: §1, §1, §1, §2.2.2, §2.4.4.
  • H. Cordeiro and C. Meneses (2018) Low band continuous speech system for voice pathologies identification. In Proc. of the Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), pp. 315–320. External Links: Document Cited by: §2.4.4, §3.5.
  • A. K. Dubey, S. R. M. Prasanna, and S. Dandapat (2018) Pitch-adaptive front-end feature for hypernasality detection. In Proc. Interspeech, pp. 372–376. External Links: Document Cited by: §2.3.1.
  • J. R. Duffy (2013) Motor speech disorders: substrates, differential diagnosis, and management. Elsevier Health Sciences. Cited by: §1.
  • P. Flach (2012) Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, United Kingdom. Cited by: §3.3, §4.2, §4.3.
  • J. I. Godino-Llorente, P. Gomez-Vilda, and M. Blanco-Velasco (2006)

    Dimensionality reduction of a pathological voice quality assessment system based on gaussian mixture models and short-term cepstral parameters

    .
    IEEE transactions on biomedical engineering 53 (10), pp. 1943–1953. External Links: Document Cited by: §2.3.1, §2.3.1, §2.3.1.
  • J.A. Gómez-García, L. Moro-Velázquez, and J.I. Godino-Llorente (2019) On the design of automatic voice condition analysis systems. part i: review of concepts and an insight to the state of the art. Biomedical Signal Processing and Control 51, pp. 181–199. External Links: Document Cited by: §1.
  • P. Gomez-Vilda, A. R. M. Londral, V. Rodellar-Biarge, J. M. Ferrandez-Vicente, and M. de Carvalho (2015) Monitoring amyotrophic lateral sclerosis by biomechanical modeling of speech production. Neurocomputing 151, pp. 130–138. External Links: Document Cited by: §1.
  • J. R. Green, Y. Yunusova, M. S. Kuruvilla, J. Wang, G. L. Pattee, L. Synhorst, L. Zinman, and J. D. Berry (2013) Bulbar and speech motor assessment in ALS: challenges and future directions. Amyotrophic lateral sclerosis and frontotemporal degeneration 14 (7-8), pp. 494–500. External Links: Document Cited by: §1, §2.
  • T. Hastie, R. Tibshirani, and J. Friedman (2001) The elements of statistical learning. Springer Series in Statistics, Springer New York Inc., New York, NY, USA. Cited by: §3.4, §4.1, §4.3.
  • R. L. Horwitz-Martin, T. F. Quatieri, A. C. Lammert, J. R. Williamson, Y. Yunusova, E. Godoy, D. D. Mehta, and J. R. Green (2016) Relation of automatically extracted formant trajectories with intelligibility loss and speaking rate decline in amyotrophic lateral sclerosis. In Proc. of Interspeech 2016, pp. 1205–1209. External Links: Document Cited by: §2.3.2.
  • X. Huang, A. Acero, H. Hon, and R. Foreword By-Reddy (2001) Spoken language processing: a guide to theory, algorithm, and system development. Prentice hall PTR. Cited by: item 3, item 3, §2.3.1, §2.3.3.
  • A. Illa, D. Patel, B. Yaminiy, S. Meera, N. Shivashankar, P.-K. Veeramaniz, S. Vengalilz, S. N. K. Polavarapuz, A. Naliniz, and P. K. Ghosh (2018) Comparison of speech tasks for automatic classification of patients with amyotrophic lateral sclerosis and healthy subjects. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6014–6018. External Links: Document Cited by: §1.
  • Y. Iwasaki, K. Ikeda, and M. Kinoshita (2001) The diagnostic pathway in amyotrophic lateral sclerosis. Amyotrophic Lateral Sclerosis and Other Motor Neuron Disorders 2 (3), pp. 123–126. External Links: Document Cited by: §1.
  • H. Kasuya, S. Ebihara, T. Chiba, and T. Konno (1982) Characteristics of pitch period and amplitude perturbations in speech of patients with laryngeal cancer. Electronics and Communications in Japan 65 (5), pp. 11–19. External Links: Document Cited by: §2.1.1, §2.1.2.
  • R. D. Kent, G. Weismer, J. F. Kent, H. K. Vorperian, and J. R. Duffy (1999) Acoustic studies of dysarthric speech: methods, progress, and potential. Journal of Communication Disorders 32 (3), pp. 141–186. Cited by: §2.3.2, §2.4.3.
  • K. Kira and L.A. Rendell (1992) A practical approach to feature selection,. In Proc. 9th Int. Conf. Mach. Learn., pp. 249–256. Cited by: §3.3.
  • R. Kohavi (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In

    Proc. of International Joint Conference on Artificial Intelligence

    ,
    pp. 1137–1143. Cited by: §3.5.
  • I. Kononenko, E. Simec, and M. Robnik-Sikonja (1997) Overcoming the myopia of inductive learning algorithms with relieff. Applied Intelligence 7 (1), pp. 39–55. External Links: Document Cited by: §3.3.
  • K. L. Lansford and J. M. Liss (2014) Vowel acoustics in dysarthria: speech disorder diagnosis and classification. Journal of Speech, Language, and Hearing Research 57, pp. 57–67. Cited by: §2.3.2.
  • J. Lee, E. Dickey, and Z. Simmons (2019) Vowel-specific intelligibility and acoustic patterns in individuals with dysarthria secondary to amyotrophic lateral sclerosis. Journal of Speech, Language, and Hearing Research 62 (1), pp. 34–59. External Links: Document Cited by: §2.3.2, §4.1.
  • J. Lee, M. A. Littlejohn, and Z. Simmons (2017) Acoustic and tongue kinematic vowel space in speakers with and without dysarthria. International Journal of Speech-Language Pathology 19 (2), pp. 195–204. External Links: Document Cited by: §1, §2.3.2, §4.1.
  • M. Little, P. McSharry, E. Hunter, J. Spielman, and L. Ramig (2008) Suitability of dysphonia measurements for telemonitoring of parkinson’s disease. Nature Precedings, pp. 1–1. External Links: Document Cited by: §2.4.2, §2.4.2.
  • R. Liu and D. F. Gillie (2011) Feature selection using order statistics. In

    Proc. of International Conference on Pattern Recognition and Information Processing (PRIP)

    ,
    pp. 195–199. External Links: ISBN 978-985-488-722-7 Cited by: §3.3.
  • D. Michaelis, T. Gramss, and H. W. Strube (1997) Glottal-to-noise excitation ratio–a new measure for describing pathological voices. Acta Acustica united with Acustica 83 (4), pp. 700–706. Cited by: §2.2.2.
  • I. C. Miller and M. Moerman (2013) Voice therapy assistant: a useful tool to facilitate therapy in dysphonic patients. In Proc. of the 8th International Workshop: Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA), pp. 171–175. Cited by: §2.1.1.
  • R. J. Moran, R. B. Reilly, P. de Chazal, and P. D. Lacy (2006) Telephony-based voice pathology assessment using automated speech analysis. IEEE Transactions on Biomedical Engineering 53 (3), pp. 468–477. External Links: Document Cited by: §2.1.2, §2.2, §2.4.1.
  • M. V. Mujumdar and R. F. Kubichek (2010) Design of a dysarthria classifier using global statistics of speech features. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 582–585. External Links: Document Cited by: §1.
  • T. Nakano, M. Goto, and Y. Hiraga (2006) An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features. In Proc. of Interspeech, pp. 1706–1709. Cited by: §2.4.3.
  • R. Norel, M. Pietrowicz, C. Agurto, S. Rishoni, and G. Cecchi (2018) Detection of amyotrophic lateral sclerosis (ALS) via acoustic analysis. In Proc. Interspeech, pp. 377–381. External Links: Document Cited by: §1, §1, §4.2, Table 5.
  • J. R. Orozco-Arroyave, E. A. Belalcazar-Bolaños, J. D. Arias-Londono, J. F. Vargas-Bonilla, S. Skodda, J. Rusz, K. Daqrouq, F. Honig, and E. Noth (2015) Characterization methods for the detection of multiple voice disorders: neurological, functional, and laryngeal diseases. IEEE Journal of Biomedical and Health Informatics 19 (6), pp. 1820–1828. External Links: Document Cited by: §2.2.2, §2.3.1, §3.5.
  • J. Peplinski, V. Berisha, J. Liss, S. Hahn, J. Shefner, S. Rutkove, K. Qi, and K. Shelton (2019) Objective assessment of vocal tremor. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6386–6390. External Links: Document Cited by: §2.4.3, §2.4.3.
  • Orozco-Arroyave,J. R., Honig,F., Arias-Londoño,J. D., Vargas-Bonilla,J. F., Daqrouq,K., Skodda,S., Rusz,J., and E. Noth (2016) Automatic detection of parkinson’s disease in running speech spoken in three different languages. The Journal of the Acoustical Society of America 139 (1), pp. 481–500. Cited by: §1, §2.1.2.
  • Rusz,J., Cmejla,R., Ruzickova,H., and Ruzicka,E. (2011) Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease. The Journal of the Acoustical Society of America 129 (1), pp. 350–367. External Links: Document Cited by: §1, §1, §1, §2.1.2.
  • A. K. Silbergleit, A. F. Johnson, and B. H. Jacobson (1997) Acoustic analysis of voice in individuals with amyotrophic lateral sclerosis and perceptually normal vocal quality. Journal of Voice 11 (2), pp. 222 – 231. External Links: Document Cited by: §1.
  • T. Spangler, N. V. Vinodchandran, A. Samal, and J. R. Green (2017) Fractal features for automatic detection of dysarthria. In Proc. of IEEE EMBS International Conference on Biomedical Health Informatics (BHI), pp. 437–440. External Links: Document Cited by: §1, §1, §2.3.1, §4.2, Table 5, §5.
  • R. Tibshirani (1994) Regression shrinkage and selection via the lasso. Journal of the royal statistical society 58, pp. 267–288. Cited by: §3.3.
  • B. Tomik and R. J. Guiloff (2010) Dysarthria in amyotrophic lateral sclerosis: a review. Amyotrophic Lateral Sclerosis 11 (1-2), pp. 4–15. External Links: Document Cited by: §2.3.2.
  • A. Tsanas, M. A. Little, P. E. McSharry, J. Spielman, and L. O. Ramig (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Transactions on Biomedical Engineering 59 (5), pp. 1264–1271. External Links: Document, ISSN 0018-9294 Cited by: §1, §1, §2.3.1, §3.5, §4.2.
  • G. S. Turner, K. Tjaden, and G. Weismer (1995) The influence of speaking rate on vowel space and speech intelligibility for individuals with amyotrophic lateral sclerosis. Journal of Speech and Hearing Research 38 (5), pp. 1001–1013. External Links: Document Cited by: §2.3.2.
  • M. M. van der Graaff, W. Grolman, E. J. Westermann, H. C. Boogaardt, H. Koelman, A. J. van der Kooi, M. A. Tijssen, and M. de Visser (2009) Vocal Cord Dysfunction in Amyotrophic Lateral Sclerosis. JAMA Neurology 66 (11), pp. 1329–1333. External Links: Document Cited by: §1.
  • M. Vashkevich, E. Azarov, A. Petrovsky, and Y. Rushkevich (2018) Features extraction for the automatic detection of ALS disease from acoustic speech signals. In Proc. of Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), pp. 321–326. External Links: Document Cited by: §1, §2.3.2, §2.3.3.
  • M. Vashkevich, A. Petrovsky, and Y. Rushkevich (2019) Bulbar ALS detection based on analysis of voice perturbation and vibrato. In Proc. of Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Vol. , pp. 267–272. External Links: Document, ISSN 2326-0262 Cited by: item 1, §2.4.3.
  • M. Vashkevich, A. Gvozdovich, and Y. Rushkevich (2019) Detection of bulbar dysfunction in als patients based on running speech test. In International Conference on Pattern Recognition and Information Processing, pp. 192–204. External Links: Document Cited by: §1, §2.3.2.
  • Weismer,Gary, Martin,Ruth, K. D., and K. F. (1992) Formant trajectory characteristics of males with amyotrophic lateral sclerosis. The Journal of the Acoustical Society of America 91 (2), pp. 1085–1098. External Links: Document Cited by: §2.3.2.
  • Y. Yunusova, J. S. Rosenthal, J. R. Green, S. Shellikeri, P. Rong, J. Wang, and L. H. Zinman (2013) Detection of bulbar ALS using a comprehensive speech assessment battery. In Proc. of the International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, pp. 217–220. Cited by: §1.