The majority of people with language disabilities suffer in their daily lives as they cannot understand or speak the language, which is a means of communication. Therefore, they may be deprived of the opportunity to participate in social activities and they may experience financial difficulties. In general, people with speech disorders have lower employment rates than people with other types of disabilities. Moreover, the proportion of people with autism disorder has been increasing every year zablotsky2019prevalence, and accordingly, a solution is required.
Augmentative and alternative communication (AAC) has been suggested and applied to solve communication problems for people with language disabilities beukelman1998augmentative. This approach enables nonverbal communication instead of language. Although several AAC software resources are available, existing software packages are expensive, difficult to use, and only provide simple functions. To address these problems, we present a novel AAC system for children with language developmental disabilities. We refer to our AAC software as PicTalky. Neural-based grammar error correction (GEC) and a symbol-based text-to-pictogram (TP) module are utilized in our model. Thus, PicTalky offers neural- and symbol-based AAC for the improvement of communication and language learning, which have not been used in existing software.
From the perspective of NLP, the speech errors from people with language disabilities can be interpreted as grammatical errors at the morphological and syntactic levels. To handle these errors, neural GEC is loaded into PicTalky. Moreover, we consider both text and image processing for AAC education and communication. After a sentence is entered as input through the speech-to-text (STT) module, it is passed through the neural GEC and natural language understanding (NLU) modules. Finally, the corresponding pictograms are displayed.
Our proposed service is aimed at children aged 0 to 14 years who have language developmental disabilities caused by intellectual or autism disabilities. The first reason that we focus on children is that early treatment during childhood is critical. According to lenneberg1967biological, language must be acquired during a critical period that ends at approximately the age of puberty with the establishment of the cerebral lateralization of function. Unless language is learned during this period, it is difficult for language to be used freely. This may result in social deterioration, contraction, aggression, and other problematic behaviors, which eventually affect the overall quality of life and satisfaction of the person schwarz2001diagnosis.
The second reason is that there is currently insufficient social support for language therapy. Not all children with developmental disabilities can benefit from public systems owing to the limited support. Moreover, in addition to the children, their family and caregivers them experience difficulties.
Therefore, we propose PicTalky, which complements the limitation of existing products and increases the accessibility of children with language disabilities to appropriate education and treatment. We expect that not only the people with language disabilities but also their caregivers can have more easier education and communication by using this service. Furthermore, in addition to the implementation in the Web application, we apply PicTalky to the NAO robot, thereby providing the first robotics AAC. We expect that robotics AAC can draw interest of children, so that they can use AAC more friendly and easily.
Our contributions are as follows:
We propose PicTalky for people with language diabilities, which is the first AAC software with GEC and a synonym-replacement system for accurate language processing.
We analyze each detailed function of PicTalky quantitatively and qualitatively. Also, we measure the satisfaction score during the actual services.
We present a novel metric known as text-to-pictogram accuracy (TPA) to measure the performance of converting texts into pictograms.
We open PicTalky in the form of a platform, so that it can help people with language disabilities and contribute to the research in this area.
We implement robotics AAC for the first time by applying PicTalky to the NAO robot.
2 Related Work
2.1 AAC Software for Language Developmental Disabilities
Several AAC software platforms have been developed for language education. TouchChat222https://touchchatapp.com/ is a symbol- and text-based AAC tool with a text-to-speech (TTS) service. AVAZ333https://www.avazapp.com/ is a language education service that uses pictograms. TalkingBoogie shin2020talkingboogie is software that supports the caregivers of children.
Systems that use AAC have also been developed for communication in daily life. Proloquo2Go444https://www.assistiveware.com/products/proloquo2go/ and QuickTalkAAC555https://digitalscribbler.com/quick-talk-aac/ enable people to communicate by using symbols or text with a TTS service. iCommunicate666http://www.grembe.com/ is a visual and text AAC application that allows for the creation of pictures and storyboards. Although several AAC software platforms have been developed, certain problems remain, such as difficulty of use and high costs.
PicTalky is the first symbol-based AAC system with neural GEC to provide more accurate and sophisticated language education and communication. PicTalky
automatically outputs the sequence of the pictograms according to the spoken sentences. It can be used for communication between people with disabilities as well as between people with disabilities and non-disabled people. Moreover, it offers the potential to be extended to multilingual versions by using neural machine translation.
2.2 Symbolic AAC
AAC enables nonverbal communication instead of a language, and it can provide practical help for people with cognitive and linguistic disorders.
In the majority of studies on AAC, researchers have employed graphic symbols (i.e., pictograms and picture communication symbols) kang2019cultural as alternative means of language items to improve the communication skills of children with language developmental disabilities. In this manner, children can be taught how to express their needs and interact with others using symbols huang2019effects.
Most authors have claimed that graphic symbols can enhance the literacy skills and communication of children or support children with disabilities in functional competence (e.g., writing, improving their communication partner knowledge, and learning) karal2016standardization; nam2018overview; light2019challenges. Finally, AAC software is a form of symbolic knowledge representation beuke2013
. That is, symbols are verbal or visual representations of ideas and concepts. Therefore, we adopt both text and image processing mechanisms (i.e., TP) to consider symbolic knowledge with NLP in AAC. Furthermore, we use a deep learning architecture approach for our GEC module. To the best of our knowledge, no such method for a neural and symbol mechanism in AAC has yet been presented.
3.1 Communication Module
Our proposed service uses deep learning-based STT, which takes the voice of the user as input and converts it into text. We adopt Naver CLOVA Speech chung2019naver for the STT system. Furthermore, the text input can be entered with the keyboard as well as in the form of voice. Users and caregivers can enter the text input easily with the keyboards of their personal computer, tablet, or mobile phone.
3.2 Neural GEC Module
People with language disabilities tend to make grammar and pronunciation errors when speaking. The GEC system revises various linguistic errors of users, so it is useful for children to practice correct sentences.
PicTalky is equipped with a neural GEC module that accurately corrects the STT outputs. We denote the sequence-to-sequence model that is applied to the GEC task as neural GEC. From the perspective of machine translation, the neural GEC task is a system whereby a sentence with noise and a correct sentence are entered as the source and target sentences, respectively. Subsequently, translation from the input to the output is trained with the sequence-to-sequence model. In this method, training is conducted without specifying a particular error type; thus, various errors can be detected and processed simultaneously.
PicTalky enhances the software quality with the latest GEC technique. As a result, the speech errors of people with developmental disorders can be corrected on the text level.
3.3 TP Module
Pictograms are complementary and alternative means of communication that can help people with language difficulties. Unlike languages, which require an understanding of rules and symbolic systems, pictograms deliver the meaning more intuitively and rapidly. Thus, pictograms are utilized in the language rehabilitation field. For example, by using pictograms on communication boards, children can learn how to communicate with others calculator1983evaluating. Pictograms provide children who have not learned the language system with practical help in language comprehension and speaking.
This study presents a system that causes the output of the pictogram images to correspond to the input text by using text and image processing. The TP module is an N-gram base mapping system, and it returns the output images that are morphologically similar to the input text in the pictogram dataset. The pictogram dataset includes texts such as words, phrases or sentences that explain the corresponding images. For more accurate mapping, our TP module makes use of a method that scans the entire sentence by N-gram to 1-gram and provides the most similar image.
3.4 NLU module
The output of the TP module is processed by the NLU module to handle the out-of-vocabulary(OOV) text that is not in the pictogram dataset. For this reason, we propose a method that causes the input vocabulary to correspond to a semantically similar image.
In the NLU module, unknown words are replaced with substitute words by measuring the semantic similarities, and a co-reference resolution system is applied to the substitute words. The semantic similarities are measured by Word2Vec mikolov2013efficient and WordNet miller1995wordnet. Within the input text, substitute words can be resolved through the co-reference resolution function of the spaCy777https://spacy.io/ library. The remaining grammatical elements, such as unknown vocabularies, conjunctions, and articles that are not processed by measuring the semantic similarity and replacing unknown words with substitute words are designed not to be printed in the output image.
3.5 Overall Architecture of PicTalky
When voice input is entered, it is converted into text by the communication module. Subsequently, the text is corrected by the neural GEC system and the corrected texts are changed into pictograms using the TP module. If OOV text exists in the input, the NLU module addresses this problem. Finally, a corresponding pictogram sequence is output.
The overall structure of our proposed service is depicted in Figure 1. If an error sentence "I lovedd BTS" is entered as input, the neural GEC corrects the input to "I love BTS." Eventually, the text from the pictogram is generated and this module is provided to a form of Web service or robotics.
PicTalky is aimed at helping children with developmental disabilities to communicate and improve their language understanding. The simultaneous encoding and transmission of speech text, both audibly and visually, allows users to understand the speaker intentions intuitively, even if they have difficulties in using language. Furthermore, as the text and images are delivered together, implicit learning is possible for learning a language by reasoning, without directly teaching each element of the language. Thus, the proposed service is intended for children with developmental disabilities, but it can also be applied to rehabilitation for educationally disadvantaged groups.
4 Experiment and Results
To validate the performance of PicTalky qualitatively, we adopted a test set that was provided by a GEC service company888https://www.llsollu.com/. The test set was constructed while performing the actual GEC service, inspired by cases in which people with language developmental disabilities utter grammatically incorrect sentences. Thus, it can be stated that it provides high objectivity and reliability. We refer to this test set as the in-house test set. The test set consisted of 100 sentences.
4.2 Verification of Neural GEC Module
Although the majority of recent NLP studies have been conducted based on the pretrain-finetuning approach (PFA), it is difficult to service a PFA-based NLP application owing to its slow speed and high computational cost, among other factors park2021should. Although state-of-the-art neural models such as mBART liu2020multilingual
have been developed, the parameters and model sizes are too large to service in the industry. To overcome this problem, we produced a model based on the vanilla transformer, which is easy to service. The hyperparameters were set to the same values as the settings invaswani2017attention. The vocabulary size was 32,000 and sentencepiece kudo2018sentencepiece was adopted for the subword tokenization.
Performance of Neural GEC
We used GLEU napoles2015ground and BiLingual Evaluation Understudy (BLEU) papineni2002bleu
as evaluation metrics to verify the performance of the neural GEC module. GLEU is similar to BLEU, but it is a more specialized metric for the error correction system, as it considers the source sentences. The overall comparison results are presented in Table1.
|TPA with penalty||(1)||✓||✓||91.62|
The experimental results demonstrated that BLEU and GLEU scored 63.77 and 53.99, respectively. These results are sufficiently competitive with the results of other neural GEC studies (im2017denoising; choe2019neural; park2020neural; park2020comparison). This means that our neural GEC module can correct the errors from the STT module, as well as the speech errors of users.
4.3 Verification of TP Module
The results of the performance evaluation of the TP module, which is a core function of PicTalky, are presented in this section.
We propose TPA, which is a novel metric for measuring the performance of the TP module.
TPA is an objective indicator of how effectively the text in PicTalky
input is converted into pictograms. The measurements are performed as follows. First, the input sentences are separated into words and POS tagged. Thereafter, the words that are POS tagged as determiners, prepositions, conjunctions (POS), and stopwords (Stopwords) are removed, as we believe that these words are meaningless to be converted into pictograms. Thus, the words that do not contain important contents are removed during this process. The remaining words are used for the measurements and the ratio of the words that are effectively converted into pictograms is used as the TPA value. A named entity recognition (NER) penalty is also implemented when calculating the TPA value. The NER penalty is assigned when the named entities are misclassified by the NER process for the input sentences. As the named entities are important information that should be converted without errors, the NER penalty is assigned in those cases. The pseudo-code for the TPA is presented in Algorithm1.
According to the deletion setting, we conducted comparative experiments on the TPA with various cases, as indicated in Table 2. There were four cases in total for the deletion setting: (1) both POS (words tagged as determiners, prepositions and conjunctions) and Stopwords are deleted, (2) only Stopwords are deleted, (3) only POS are deleted, and (4) neither POS nor Stopwords are deleted. We also measured how the penalty affected the overall performance. We used NLTK bird2006nltk to remove the determiners, prepositions, conjunctions, and stopwords and used the BERT-based devlin2018bert NER model provided by Huggingface wolf2019huggingface for the penalty.
The experimental results demonstrated that case (1) of the TPA, which was our proposed method, achieved the highest score of 94.16. In case (2) of the TPA, the score decreased by 30.20 points. When words that were POS tagged as determiners, prepositions, and conjunctions were deleted in case (3), lower performance was exhibited than in case (2). Finally, case (4) achieved the lowest performance. These results demonstrate that excluding both the POS and Stopwords from the subjects of the measurements is the most reasonable evaluation for TP conversion. Moreover, when the NER penalty was applied, the performances decreased in all cases, which means that the NER penalty contributes to more valid measurement. We also conducted a qualitative analysis on the results of PicTalky (see Appendix A).
4.4 PicTalky Satisfaction Survey
We conducted a satisfaction survey to investigate the user satisfaction. As outlined in Appendix B, we established a total of five questions and specified the answers using a Likert scale likert1932technique of “Satisfied,” “Neither agree nor disagree,” and “Dissatisfied.” The survey results are depicted in Figure 2.
The survey results revealed that most people were satisfied with the performance of PicTalky. For each question, 80% to 90% of the responses were satisfied and approximately 90% of the responses stated that it will be helpful to people with developmental disabilities. However, the UI of PicTalky still requires improvement and the performance of the GEC system should be enhanced. In particular, according to the results of the Spearman correlation de2016comparing of the sentences, as illustrated in Figure 3, the correlation between Q1 and Q2 was high, which indicates that the purpose of this study was well reflected. Although the correlation between Q1 and Q5, and that between Q2 and Q5 were lower than the others, their p-values were lower than 0.05; thus, the results were statistically significant.
5 PicTalky with Robotics and Web Applications
We have distributed PicTalky as a Web application. The Web page is freely available and it is designed with a clear UI for easy access. In addition to the Web service, we have applied robotics technology to PicTalky. The NAO robot shamsuddin2011humanoid; jokinen2014multimodal is mounted in the communication module of PicTalky. We have created a human–robot interaction system whereby the NAO robot has a conversation with the end users and the pictograms are printed onto the connected screen. As children show substantial interest in robots, this will aid in more familiar education as opposed to Web or other applications (sennott2019aac). The video of our demo is also attached with our paper101010https://bit.ly/2SunbaW.
To the best of our knowledge, this study is the first to apply PicTalky to the NAO robot and to develop robotics AAC for the first time.
6 Conclusion and Future Work
We have proposed PicTalky, which is an AI-based AAC service. The aim of PicTalky is to provide communication and connection among all people, without anyone being excluded. In the future, we plan to expand the PicTalky data to multilingual data and to make it fully open. In addition, we will conduct various ai for accessibility studies to improve the quality of life for the disabled.
This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2018-0-01405) supervised by the IITP(Institute for Information & Communications Technology Planning & Evaluation) and supported by the MSIT(Ministry of Science and ICT), Korea, under the ICT Creative Consilience program(IITP-2021-2020-0-01819) supervised by the IITP(Institute for Information & communications Technology Planning & Evaluation). This work is the extended version of our papers [park2020ai]. Finally, the authors thank Yanghee Kim (firstname.lastname@example.org) for provide initial ideas and proofreading.
Appendix A Qualitative Analysis
In Table 3, the input sentences contain grammatical problems, including fronting, infinitive, article, spelling, plural -s, and irregular past form errors. The PicTalky Web demo shows users the most appropriate pictograms and the output sentences with the errors corrected. Note that if the Neural GEC module cannot correct grammatical errors, the NLU module can compensate for it. However, these aspects need to be supplemented through future research.
|Input sentence||Output sentence||Pictogram|
|* Is the dog is tired?||Is the dog tired?|
|* Do I can eat a pizza?||Can I eat a pizza?|
|* I love play the baseball||I love to play baseball|
|* I love danceing with a friends||I love dancing with friends|
|* He taked my toy!||He took my toy!|
Appendix B PicTalky Satisfaction Questionnaires
It is difficult to employ the nonverbal child. That is why we identified the extreme difficulty in performing a large-scale survey. Therefore, we conducted a system satisfaction survey to 53 people with 43 experts in language disabilities, including ten nonverbal children. The experts consist of thirty teachers of nonverbal children and the thirteen professionals who majored in language disabilities from Korea University Anam Hospital.
|Q1. Are you satisfied with the overall performance of PicTalky?|
|Q2. Do you think this system will be helpful to people with language developmental disabilities?|
|Q3. Are you satisfied with the usability and UI of PicTalky?|
|Q4. Are you satisfied with the performance of the grammar error correction system?|
|Q5. Are you satisfied with the results of the text-to-pictogram function?|