We have acquired large sets of both written and spoken data during the implementation of campaigns aimed at assessing the proficiency, at school, of Italian pupils learning both German and English. Part of the acquired data has been included in a corpus, named ”Trentino Language Testing” in schools (TLT-school), that will be described in the following.
All the collected sentences have been annotated by human experts in terms of some predefined “indicators” which, in turn, were used to assign the proficiency level to each student undertaking the assigned test. This level is expressed according to the well-known Common European Framework of Reference for Languages (Council of Europe, 2001) scale. The CEFR defines levels of proficiency: A1 (beginner), A2, B1, B2, C1 and C2. The levels considered in the evaluation campaigns where the data have been collected are: A1, A2 and B1.
The indicators measure the linguistic competence of test takers both in relation to the content (e.g. grammatical correctness, lexical richness, semantic coherence, etc.) and to the speaking capabilities (e.g. pronunciation, fluency, etc.). Refer to Section 2 for a description of the adopted indicators.
The learners are Italian students, between 9 and 16 years old. They took proficiency tests by answering question prompts provided in written form. The “TLT-school” corpus, that we are going to make publicly available, contains part of the spoken answers (together with the respective manual transcriptions) recorded during some of the above mentioned evaluation campaigns. We will release the written answers in future. Details and critical issues found during the acquisition of the answers of the test takers will be discussed in Section 2.
The tasks that can be addressed by using the corpus are very challenging and pose many problems, which have only partially been solved by the interested scientific community.
From the ASR perspective, major difficulties are represented by: a) recognition of both child and non-native speech, i.e. Italian pupils speaking both English and German, b) presence of a large number of spontaneous speech phenomena (hesitations, false starts, fragments of words, etc.), c) presence of multiple languages (English, Italian and German words are frequently uttered in response to a single question), d) presence of a significant level of background noise due to the fact that the microphone remains open for a fixed time interval (e.g. 20 seconds - depending on the question), and e) presence of non-collaborative speakers (students often joke, laugh, speak softly, etc.). Refer to Section 2.3 for a detailed description of the collected spoken data set.
Furthermore, since the sets of data from which “TLT-school” was derived were primarily acquired for measuring proficiency of second language (L2) learners, it is quite obvious to exploit the corpus for automatic speech rating. To this purpose, one can try to develop automatic approaches to reliably estimate the above-mentioned indicators used by the human experts who scored the answers of the pupils (such an approach is described in). However, it has to be noticed that scientific literature proposes to use several features and indicators for automatic speech scoring, partly different from those adopted in “TLT-school” corpus (see below for a brief review of the literature). Hence, we believe that adding new annotations to the corpus, related to particular aspects of language proficiency, can stimulate research and experimentation in this area.
Finally, it is worth mentioning that also written responses of “TLT-school” corpus are characterised by a high level of noise due to: spelling errors, insertion of word fragments, presence of words belonging to multiple languages, presence of off-topic answers (e.g. containing jokes, comments not related to the questions, etc.). This set of text data will allow scientists to investigate both language and behaviour of pupils learning second languages at school. Written data are described in detail in Section 2.2
Relation to prior work. Scientific literature is rich in approaches for automated assessment of spoken language proficiency. Performance is directly dependent on ASR accuracy which, in turn, depends on the type of input, read or spontaneous, and on the speakers’ age, adults or children (see  for an overview of spoken language technology for education). A recent publication reporting an overview of state-of-the-art automated speech scoring technology as it is currently used at Educational Testing Service (ETS) can be found in .
In order to address automatic assessment of complex spoken tasks requiring more general communication capabilities from L2 learners, the AZELLA data set , developed by Pearson, has been collected and used as benchmark for some researches [2, 8]. The corpus contains spoken tests, each double graded by human professionals, from a variety of tasks.
A public set of spoken data has been recently distributed in a spoken CALL (Computer Assisted Language Learning) shared task111https://regulus.unige.ch/spokencallsharedtask_3rdedition/ for details. where Swiss students learning English had to answer to both written and spoken prompts. The goal of this challenge is to label students’ spoken responses as “accept” or “reject”. Refer to  for details of the challenge and of the associated data sets.
Many non-native speech corpora (mostly in English as target language) have been collected during the years. A list, though not recent, as well as a brief description of most of them can be found in . The same paper also gives information on how the data sets are distributed and can be accessed (many of them are available through both LDC222https://www.ldc.upenn.edu/ and ELDA333http://www.elra.info/en/about/elda/ agencies). Some of the corpora also provide proficiency ratings to be used in CALL applications. Among them, we mention the ISLE corpus , which also contains transcriptions at the phonetic level and was used in the experiments reported in .
Note that all corpora mentioned in  come from adult speech while, to our knowledge, the access to publicly available non-native children’s speech corpora, as well as of children’s speech corpora in general, is still scarce. Specifically concerning non-native children’s speech, we believe worth mentioning the following corpora. The PF-STAR corpus (see ) contains English utterances read by both Italian and German children, between 6 and 13 years old. The same corpus also contains utterances read by English children. The ChildIt corpus  contains English utterances (both read and imitated) by Italian children.
By distributing “TLT-school” corpus, we hope to help researchers to investigate novel approaches and models in the areas of both non-native and children’s speech and to build related benchmarks.
2 Data Acquisition
In Trentino, an autonomous region in northern Italy, there is a series of evaluation campaigns underway for testing L2 linguistic competence of Italian students taking proficiency tests in both English and German. A set of three evaluation campaigns is underway, two having been completed in 2016 and 2018, and a final one scheduled in 2020. Note that the “TLT-school” corpus refers to only the 2018 campaign, that was split in two parts: 2017 try-out data set (involving about 500 pupils) and the actual 2018 data (about 2500 pupils). Each of the three campaigns (i.e. 2016, 2018 and 2020) involves about 3000 students ranging from 9 to 16 years, belonging to four different school grade levels and three proficiency levels (A1, A2, B1). The schools involved in the evaluations are located in most part of the Trentino region, not only in its main towns; Table 1 highlights some information about the pupils that took part to the campaigns. Several tests, aimed at assessing the language learning skills of the students, were carried out by means of multiple-choice questions, which can be evaluated automatically. However, a detailed linguistic evaluation cannot be performed without allowing the students to express themselves in both written sentences and spoken utterances, which typically require the intervention of human experts to be scored.
|CEFR||Grade, School||Age||Number of pupils|
|B1||10, high school||14-15||378||124||1112|
|B1||11, high school||15-16||141||0||467|
|lexical richness: lexical properties, lexical appropriateness|
|pronunciation and fluency: pronunciation, fluency, discourse pronunciation and fluency|
|syntactical correctness: correctness and formal competences, morpho-syntactical correctness, orthography and punctuation|
|fulfillment on delivery: fulfillment of the task, relevancy of the answer|
|coherence and cohesion: coherence and cohesion, general impression|
|communicative, descriptive, narrative skills: communicative efficacy, argumentative abilities, descriptive abilities, abilities to describe one’s own feelings, etc.|
|A1||You are on a trip to Trentino with your family. Add a message to a picture you took and want to send it to a friend. Tell us: 1. where you are; 2.what you do; 3. what you like or dislike.||dear tiago . i’m swimming in the lake . there are some beautiful mountains . i swim in the levico lake . later i go home on the bikeand i eats ice creamwith my brother and my father . i like water is beautiful but i don’t like the sun is very very hot ! is very impressive levico lake . see you soon . byee ! kacper —————————————————— hello , i’m in the lake with my family . i play football with my dad and i eat a ice cream . the water is beautiful . it’s very sunny . goodbye see you soon .|
|A2||Reply to Susan Hi! How are you? I’ve just received a new tennis racket. Would you like to meet at the sports centre and play a little? We can play for an hour and then we can get an ice cream together. Can you come at 5 o’clock? Don’t forget to bring your tennis shoes, ok? I’m really looking forward to playing with you! Bye, Susan.||hello susan ! i’m fine thanks and you ? i’m very happy for you and for your message and i would like see your new racket but unfortunately today i can’t come . tomorrow i have a very important football match and i must wake up at six o’clock so i need to sleep more than usually . we can meet —————————————————— hello susan i’m fine . i’m sorry but i can’t come with you , because i go to land between london for one concert at five o’clock . bye .|
|B1||Write an English post for your blog where you talk about what you need to do to learn a language well.||if you want to learn a new language you can make it . the first thing is studying many words and make a course . the second thing is go out of your state and arrive at the state where speak the language that you want learn . the first days are more difficult , but then its more easy . you must study the grammar too .|
|B1||Write a short email to a friend of yours to tell him / her that you intend to start studying another foreign language and what the reasons are.||hi sophie ! how are you ? i am writing to you because i desire to say you that i will start to study a new langueges . why i decide it ? because i wont to live in spain in the future . what do you thing ? i wait your answer . with love ! bye !|
Tables 2 and 3 report some statistics extracted from both the written and spoken data collected so far in all the campaigns. Each written or spoken item received a total score by human experts, computed by summing up the scores related to indicators in 2017/2018 (from to in the 2016 campaign, according to the proficiency levels and the type of test). Each indicator can assume a value 0, 1, 2, corresponding to bad, medium, good, respectively.
The list of the indicators used by the experts to score written sentences and spoken utterances in the evaluations, grouped by similarity, is reported in Table 4. Since every utterance was scored by only one expert, it was not possible to evaluate any kind of agreement among experts. For future evaluations, more experts are expected to provide independent scoring on same data sets, so this kind of evaluation will be possible.
The speaking part of the proficiency tests in 2017/2018 consists of 47 question prompts provided in written form: 24 in English and 23 in German, divided according to CEFR levels. Apart from A1 level, which differs in the number of questions (11 for English; 10 for German), both English and German A2 and B1 levels have respectively 6 and 7 questions each. As for A1 level, the first four introductory questions are the same (How old are you?, Where do you live?, What are your hobbies?, Wie alt bist du?, Wo wohnst du?, Was sind deine Hobbys?) or slightly different (What’s your favourite pet?, Welche Tiere magst du?) in both languages, whereas the second part of the test puts the test-takers in the role of a customer in a pizzeria (English) or in a bar (German).
A2 level test is composed of small talk questions which relate to everyday life situations. In this case, questions are more open-ended than the aforementioned ones and allow the test-takers to interact by means of a broader range of answers. Finally, as for B1 level, questions are similar to A2 ones, but they include a role-play activity in the final part, which allows a good amount of freedom and creativity in answering the question.
2.2 Written Data
Table 2 reports some statistics extracted from the written data collected so far. In this table, the number of pupils taking part in the English and German evaluation is reported, along with the number of sentences and tokens, identified as character sequences bounded by spaces.
It is worth mentioning that the collected texts contain a large quantity of errors of several types: orthographic, syntactic, code-switched words (i.e. words not in the required language), jokes, etc. Hence, the original written sentences have been processed in order to produce “cleaner” versions, in order to make the data usable for some research purposes (e.g. to train language models, to extract features for proficiency assessment, …).
To do this, we have applied some text processing, that in sequence:
removes strange characters;
performs some text normalisation (lowercase, umlaut, numbers, …) and tokenisation (punctuation, etc.)
removes / corrects non words (e.g. hallooooooooooo becomes hallo; aaaaaaaaeeeeeeeeiiiiiiii is removed)
identifies the language of each word, choosing among Italian, English, German;
corrects common typing errors (e.g. ai em becomes i am)
replaces unknown words, with respect to a large lexicon, with the labelunk.
Table 5 reports some samples of written answers.
2.3 Spoken Data
Table 3 reports some statistics extracted from the acquired spoken data. Speech was recorded in classrooms, whose equipment depended on each school. In general, around 20 students took the test together, at the same time and in the same classrooms, so it is quite common that speech of mates or teachers often overlaps with the speech of the student speaking in her/his microphone. Also, the type of microphone depends on the equipment of the school. On average, the audio signal quality is nearly good, while the main problem is caused by a high percentage of extraneous speech. This is due to the fact that organisers decided to use a fixed duration - which depends on the question - for recording spoken utterances, so that all the recordings for a given question have the same length. However, while it is rare that a speaker has not enough time to answer, it is quite common that, especially after the end of the utterance, some other speech (e.g. comments, jokes with mates, indications from the teachers, etc.) is captured. In addition, background noise is often present due to several sources (doors, steps, keyboard typing, background voices, street noises if the windows are open, etc). Finally, it has to be pointed out that many answers are whispered and difficult to understand.
3 Manual Transcriptions
In order to create both an adaptation and an evaluation set for ASR, we manually transcribed part of the 2017 data sets. We defined an initial set of guidelines for the annotation, which were used by 5 researchers to manually transcribe about 20 minutes of audio data. This experience led to a discussion, from which a second set of guidelines originated, aiming at reaching a reasonable trade-off between transcription accuracy and speed. As a consequence, we decided to apply the following transcription rules:
only the main speaker has to be transcribed; presence of other voices (schoolmates, teacher) should be reported only with the label “@voices”,
presence of whispered speech was found to be significant, so it should be explicitly marked with the label “()”,
badly pronounced words have to be marked by a “#” sign, without trying to phonetically transcribe the pronounced sounds; “#*” marks incomprehensible speech;
speech in a different language from the target language has to be reported by means of an explicit marker “I am 10 years old @it(io ho già risposto)”.
|Ger Train All||1448||04:47:45||11.92||9878||6.82|
|Ger Train Clean||589||01:37:59||9.98||2317||3.93|
|Eng Train All||2301||09:03:30||14.17||26090||11.34|
|Eng Train Clean||916||02:45:42||10.85||6249||6.82|
|Ger Test All||671||02:19:10||12.44||5334||7.95|
|Ger Test Clean||260||00:43:25||10.02||1163||4.47|
|Eng Test All||1142||04:29:43||14.17||13244||11.60|
|Eng Test Clean||423||01:17:02||10.93||3404||8.05|
Next, we concatenated utterances to be transcribed into blocks of about 5 minutes each. We noticed that knowing the question and hearing several answers could be of great help for transcribing some poorly pronounced words or phrases. Therefore, each block contains only answers to the same question, explicitly reported at the beginning of the block.
We engaged about 30 students from two Italian linguistic high schools (namely “C” and “S”) to perform manual transcriptions.
After a joint training session, we paired students together. Each pair first transcribed, individually, the same block of minutes. Then, they went through a comparison phase, where each pair of students discussed their choices and agreed on a single transcription for the assigned data. Transcriptions made before the comparison phase were retained to evaluate inter-annotator agreement. Apart from this first 5 minute block, each utterance was transcribed by only one transcriber. Inter-annotator agreement for the 5-minute blocks is shown in Table 6 in terms of words (after removing hesitations and other labels related to background voices and noises, etc.). The low level of agreement reflects the difficulty of the task.
In order to assure quality of the manual transcriptions, every sentence transcribed by the high school students was automatically processed to find out possible formal errors, and manually validated by researchers in our lab.
Speakers were assigned either to training or evaluation sets, with proportions of and , respectively; then training and evaluation lists were built, accordingly. Table 7 reports statistics from the spoken data set. The id All identifies the whole data set, while Clean defines the subset in which sentences containing background voices, incomprehensible speech and word fragments were excluded.
4 Usage of the Data
From the above description it appears that the corpus can be effectively used in many research directions.
4.1 ASR-related Challenges
The spoken corpus features non-native speech recordings in real classrooms and, consequently, peculiar phenomena appear and can be investigated. Phonological and cross-language interference requires specific approaches for accurate acoustic modelling. Moreover, for coping with cross-language interference it is important to consider alternative ways to represent specific words (e.g. words of two languages with the same graphemic representation).
Table 8, extracted from , reports WERs obtained on evaluation data sets with a strongly adapted ASR, demonstrating the difficulty of the related speech recognition task for both languages. Refer to  for comparisons with a different non-native children speech data set and to scientific literature [34, 11, 19, 16, 1, 14, 15, 21, 29] for detailed descriptions of children speech recognition and related issues. Important, although not exhaustive of the topic, references on non-native speech recognition can be found in [33, 32, 25, 31, 30, 6, 12, 20, 18, 10].
As for language models, accurate transcriptions of spoken responses demand for models able to cope with not well-formed expressions (due to students’ grammatical errors). Also the presence of code-switched words, words fragments and spontaneous speech phenomena requires specific investigations to reduce their impact on the final performance.
We believe that the particular domain and set of data pave the way to investigate into various ASR topics, such as: non-native speech, children speech, spontaneous speech, code-switching, multiple pronunciation, etc.
4.2 Data Annotation
The corpus has been (partly) annotated using the guidelines presented in Section 3 on the basis of a preliminary analysis of the most common acoustic phenomena appearing in the data sets.
Additional annotations could be included to address topics related to other spurious segments, as for example: understandable words pronounced in other languages or by other students, detection of phonological interference, detection of spontaneous speech phenomena, detection of overlapped speech, etc. In order to measure specific proficiency indicators, e.g. related to pronunciation and fluency, suprasegmental annotations can be also inserted in the corpus.
4.3 Proficiency Assessment of L2 Learners
The corpus is a valuable resource for training and evaluating a scoring classifier based on different approaches. Preliminary results show that the usage of suitable linguistic features mainly based on statistical language models allow to predict the scores assigned by the human experts.
The evaluation campaign has been conceived to verify the expected proficiency level according to class grade; as a result, although the proposed test cannot be used to assign a precise score to a given student, it allows to study typical error patterns according to age and level of the students.
Furthermore, the fine-grained annotation, at sentence level, of the indicators described above is particularly suitable for creating a test bed for approaches based on “word embeddings” [7, 24, 26] to automatically estimate the language learner proficiency. Actually, the experiments reported in  demonstrate superior performance of word-embeddings for speech scoring with respect to the well known (feature-based) SpeechRater system [36, 35]. In this regard, we believe that additional, specific annotations can be developed and included in the “TLT-school” corpus.
|word||tot occ||good vs. bad||good vs. bad|
|German||Test GER||Train GER|
|English||Test ENG||Train ENG|
4.4 Modelling Pronunciation
By looking at the manual transcriptions, it is straightforward to detect the most problematic words, i.e. frequently occurring words, which were often marked as mispronounced (preceded by label “#”). This allows to prepare a set of data composed by good pronounced vs. bad pronounced words.
A list of words, partly mispronounced, is shown in Table 9, from which one can try to model typical pronunciation errors (note that other occurrences of the selected words could be easily extracted from the non-annotated data). Finally, as mentioned above, further manual checking and annotation could be introduced to improve modelling of pronunciation errors.
5 Distribution of the Corpus
The corpus to be released is still under preparation, given the huge amount of spoken and written data; in particular, some checks are in progress in order to:
remove from the data responses with personal or inadequate content (e.g. bad language);
normalise the written responses (e.g. upper/lower case, punctuation, evident typos);
normalise and verify the consistency of the transcription of spoken responses;
check the available human scores and - if possible - merge or map the scores according to more general performance categories (e.g. delivery, language use, topic development) and an acknowledged scale (e.g. from 0 to 4)444https://www.ets.org/s/toefl/pdf/toefl_speaking_rubrics.pdf.
In particular, the proposal for an international challenge focused on non-native children speech recognition is being submitted where an English subset will be released and the perspective participants are invited to propose and evaluate state-of-art techniques for dealing with the multiple issues related to this challenging ASR scenario (acoustic and language models, non-native lexicon, noisy recordings, etc.).
6 Conclusions and Future Works
We have described “TLT-school”, a corpus of both spoken and written answers collected during language evaluation campaigns carried out in schools of northern Italy. The procedure used for data acquisition and for their annotation in terms of proficiency indicators has been also reported. Part of the data has been manually transcribed according to some guidelines: this set of data is going to be made publicly available. With regard to data acquisition, some limitations of the corpus have been observed that might be easily overcome during next campaigns. Special attention should be paid to enhancing the elicitation techniques, starting from adjusting the questions presented to test-takers. Some of the question prompts show some lacks that can be filled in without major difficulty: on the one hand, in the spoken part, questions do not require test-takers to shift tense and some are too suggestive and close-ended; on the other hand, in the written part, some question prompts are presented both in source and target language, thus causing or encouraging code-mixing and negative transfer phenomena. The elicitation techniques in a broader sense will be object of revision (see  and specifically on children speech ) in order to maximise the quality of the corpus. As for proficiency indicators, one first step that could be taken in order to increase accuracy in the evaluation phase both for human and automatic scoring would be to divide the second indicator (pronunciation and fluency) into two different indicators, since fluent students might not necessarily have good pronunciation skills and vice versa, drawing for example on the IELTS 555https://www.ielts.org Speaking band descriptors. Also, next campaigns might consider an additional indicator specifically addressed to score prosody (in particular intonation and rhythm), especially for A2 and B1 level test-takers. Considering the scope of the evaluation campaign, it is important to be aware of the limitations of the associated data sets: proficiency levels limited to A1, B1 and B2 (CEFR); custom indicators conceived for expert evaluation (not particularly suitable for automated evaluation); limited amount of responses per speaker. Nevertheless, as already discussed, the fact that the TLT campaign was carried out in 2016 and 2018 in the whole Trentino region makes the corpus a valuable linguistic resource for a number of studies associated to second language acquisition and evaluation. In particular, besides the already introduced proposal for an ASR challenge in 2020, other initiatives for the international community can be envisaged: a study of a fully-automated evaluation procedure without the need of experts’ supervision; the investigation of end-to-end classifiers that directly use the spoken response as input and produce proficiency scores according to suitable rubrics.
8 Bibliographical References
-  (2003-Nov.) Robust Recognition of Children’s Speech. IEEE Transactions on SAP 11 (6), pp. 603–615. Note: Cited by: §4.1.
Using Deep Neural Networks to improve proficiency assessment for children English language learners. In Proc. of Interspeech, pp. 1468–1472. Cited by: §1.
-  (2005) The PF-STAR children’s speech corpus. In Proc. of Eurospeech, pp. 2761–2764. Cited by: §1.
-  (2018) Overview of the 2018 spoken call shared task. In Proc. of Interspeech, Hyderabad, India, pp. 2354–2358. Cited by: §1.
-  (2017) Methods for eliciting, annotating, and analyzing databases for childspeech development. Computer Speech and Language (45), pp. 278–299. Cited by: §6.
-  (2006) Multilingual non-native speech recognition using phonetic confusion-based acoustic model modification and graphemic constraints. In Proc. of ICSLP, pp. 109–112. Cited by: §4.1.
-  (2018) End-to-end neural network based automated speech scoring. In Proc. of ICASSP, Calgary, Canada, pp. 6234–6238. Cited by: §4.3.
-  (2014) Automatic spoken assessment of young english language learners. In Proc. of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, Cited by: §1.
-  (1994) Varieties of knowledge elicitation techniques. International Journal on Human-Computer Studies (41), pp. 801–849. Cited by: §6.
Cross-lingual transfer learning during supervised training in low resource scenarios. Proc. of Interspeech, pp. 3531–3535. Cited by: §4.1.
-  (1998-05) Improvements in Children’s Speech Recognition Performance. In Proc. of ICASSP, Seattle,WA, pp. 433–436. Cited by: §4.1.
-  (2017) Articulatory modeling for pronunciation error detection without non-native training data based on dnn transfer learning. IEICE Transactions on Information and Systems E100.D (9), pp. 2174–2182. Cited by: §4.1.
-  (2009) An overview of spoken language technology for education. speech communication. Speech Communication 51 (10), pp. 2862–2873. Cited by: §1.
-  (2007) Acoustic variability and automatic recognition of children’s speech. Speech Communication 49 (10-11), pp. 847 – 860. External Links: Cited by: §4.1.
-  (2009) Towards age-independent acoustic modeling. Speech Communication 51 (6), pp. 499 – 509. Cited by: §4.1.
-  (2003-Apr.) Investigating Recognition of Children Speech. In Proc. of ICASSP, Vol. 2, Hong Kong, pp. 137–40. Cited by: §4.1.
-  (2019) Automatic assessment of spoken language proficiency of non-native children. In Proc. of ICASSP, Cited by: §1, §1, §4.1, §4.3.
-  (2015) Mispronunciation detection without nonnative training data. In Proc. of Interspeech, pp. 643–647. Cited by: §4.1.
-  (2001-Sept.) Why is Automatic Recognition of Children’s Speech Difficult?”. In Proc. of Eurospeech, Aalborg, Denmark, pp. . Cited by: §4.1.
-  (2016) Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling. In Proc. of ICASSP, pp. 6135–6139. Cited by: §4.1.
-  (2015) Large vocabulary automatic speech recognition for children. In Proc. of Interspeech, Cited by: §4.1.
-  (2018) Non-native children speech recognition through transfer learning. In Proc. of ICASSP, Cited by: §4.1.
-  (2000) The ISLE corpus of non-native spoken English. In Proc. of LREC, pp. 957–964. Cited by: §1.
-  (2017) Deep-learning based Automatic Spontaneous Speech Assessment in a Data-Driven Approach for the 2017 SLaTE CALL Shared Challenge. In Proc. of SlaTe, Stockholm, Sweden, pp. 103–108. Cited by: §4.3.
-  (2006) Adaptation based on pronunciation variability analysis for non native speech recognition. In Proc. of ICASSP, pp. 137–140. Cited by: §4.1.
-  (2019) The university of birmingham 2019 spoken call shared task systems: exploring the importance of word order in text processing. In Proc. of SlaTe, Graz, Austria, pp. 11–15. Cited by: §4.3.
-  (2007) Non-native speech databases. In Proc. of ASRU, Kyoto, Japan, pp. 413–418. Cited by: §1, §1.
-  (2007) Analysis of Italian children’s English pronunciation. http://archive.is/http://www.eee.bham.ac.uk/russellm. Cited by: §1.
-  (2016-07) Deep-neural network approaches for speech recognition with heterogeneous groups of speakers including children. Natural Language Engineering FirstView, pp. 1–26. Cited by: §4.1.
-  (2004) Adaptation in the pronunciation space for non-native speech recognition. In Proc. of ICSLP, pp. 2901–2904. Cited by: §4.1.
-  (2009) Comparing different approaches for automatic pronunciation error detection. Speech Communication 51 (10), pp. 845–852. External Links: Cited by: §4.1.
-  (2003) Comparison of acoustic model adaptation techniques on non-native speech. In Proc. of ICASSP, pp. 540–543. Cited by: §4.1.
Non-native spontaneous speech recognition through polyphone decision tree specialization. In Proc. of Eurospeech, pp. 1449–1452. Cited by: §4.1.
-  (1996-05) A Study of Speech Recognition for Children and Elderly. In Proc. of ICASSP, Atlanta, GA, pp. I–349–352. Cited by: §4.1.
-  (2019) Automated speaking assessment: using language technologies to score spontaneous speech. Educational Testing Service, Princeton (NJ). External Links: Cited by: §1, §4.3.
-  (2009) Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication 51 (10), pp. 883–895. Cited by: §4.3.