Using Meta-Knowledge Mined from Identifiers to Improve Intent Recognition in Neuro-Symbolic Algorithms

12/16/2020 ∙ by Claudio Pinhanez, et al. ∙ 0

In this paper we explore the use of meta-knowledge embedded in intent identifiers to improve intent recognition in conversational systems. As evidenced by the analysis of thousands of real-world chatbots and in interviews with professional chatbot curators, developers and domain experts tend to organize the set of chatbot intents by identifying them using proto-taxonomies, i.e., meta-knowledge connecting high-level, symbolic concepts shared across different intents. By using neuro-symbolic algorithms able to incorporate such proto-taxonomies to expand intent representation, we show that such mined meta-knowledge can improve accuracy in intent recognition. In a dataset with intents and example utterances from hundreds of professional chatbots, we saw improvements of more than 10 of the chatbots when we apply those algorithms in comparison to a baseline of the same algorithms without the meta-knowledge. The meta-knowledge proved to be even more relevant in detecting out-of-scope utterances, decreasing the false acceptance rate (FAR) in more than 20% in about half of the chatbots. The experiments demonstrate that such symbolic meta-knowledge structures can be effectively mined and used by neuro-symbolic algorithms, apparently by incorporating into the learning process higher-level structures of the problem being solved. Based on these results, we also discuss how the use of mined meta-knowledge can be an answer for the challenge of knowledge acquisition in neuro-symbolic algorithms.



There are no comments yet.


page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


After almost a decade of notable advances in AI using data-driven machine learning approaches, there is a growing sense in the field that symbolic knowledge needs to be included in AI systems to get to the next level of machine intelligence. This thought is materialized in the so called

Neuro-Symbolic approaches which have already produced some intriguing results [27, 4, 35, 3, 25, 17, 13].

However, even if successful, such approaches will require symbolic data or knowledge to be captured and represented for the machine. Eliciting knowledge directly from human beings has been proved to be a difficult task, both in the cases of specialized, professional knowledge [5] and common-sense [12, 31]

. Similarly, mining reliably symbolic knowledge from text sources still remains a difficult task, in spite of many advancements in mining of knowledge graphs  

[15, 19, 1]. As much as collecting large amounts of reliable data has become a bottle-neck for the use of data-driven machine learning, getting knowledge from textual data or human beings into reasonably complete and correct symbolic representations is likely to be a major issue for neuro-symbolic methods.

This paper explores an alternative path which explores meta-knowledge developers of AI systems sometimes embed in their source code. In particular, we examine the case of professional conversational systems and the symbolic knowledge their developers often embed in identifiers. Analyzing a dataset comprising thousands of different conversational systems developed in a very popular platform, we observe a very common pattern of using a symbolic structure, called here proto-taxonomy, to name the intents to be recognized by the system. This practice was also verified qualitatively in workshops we conducted with developers.

In spite many consistency and incompleteness issues with those proto-taxonomies, we show that they can be employed to improve the accuracy of recognition, using adaptations of recent neuro-symbolic methods. This seems to signal towards neuro-symbolic techniques designed to handle imperfect knowledge representations. We see as a needed compromise to bring back symbolic reasoning to AI in a sustainable form, avoiding the old pitfalls of knowledge engineering [16, 32, 33, 8].

This paper starts by looking into the recent advances in neuro-symbolic systems, reviewing the difficulties in knowledge mining, and exploring previous use of informal knowledge. We then present the evidence found that developers of conversational systems embed meta-knowledge within the source code of their systems. We follow by describing algorithms integrating such meta-knowledge into intent recognition algorithms and by evaluating them first with two typical intent recognition datasets, and then with hundreds of workspaces created in a professional tool called here ChatWorks. The results show most of those workspaces can benefit from the techniques described in this paper.

Related Work

Neuro-symbolic approaches combine statistical methods with logic symbolism: “neural-symbolic systems aim to transfer principles and mechanisms between (often nonclassical) logic-based computation and neural computation” [4]. Such kind of systems are viewed as a way to embed high-level knowledge and even some form of “consciousness” into machine learning systems, making the language to develop them closer to “what passes in a man’s own mind”  [3]

, which would likely make those systems more explainable than current deep learning algorithms.

Although neuro-symbolic systems are not new, we observe increasing interest on this approach in recent years, resulting in a myriad of novel techniques applied to different problems, contexts, and scenarios [27, 23, 11, 18, 13]. For instance, in [25], an approach for image understanding is suggested which takes the object-based scene representations and translates sentences into executable, symbolic programs. In [26], embeddings computed from knowledge graphs are used as attention layers for tasks such as autonomous driving (AV) and question-answering. And in [21], embeddings from a knowledge graph are mapped to sentence embeddings for tasks such as the inverse dictionary problem.

One important requirement for many neuro-symbolic systems is to represent knowledge in a structured format such as knowledge graphs, ontologies, or taxonomies [19]. In some cases, such as the scene ontology for AV in [26], a lot of effort was needed to be put on manual annotation. Nevertheless, as presented in [15], an unsupervised approach can sometimes be used to mine the meta-knowledge introduced by the experts, such as the categories in Wikipedia pages.

The high-level representation in intent identifiers can be viewed as similar to comments included in programming source codes. Code commentaries are one of the means employed by developers to help them organize their thought process while producing code. Research on code commentaries has shown that they can be useful for automatic generation of code, consistency check, classification and quality evaluation [39]. Similar behavior of users for organizing content can also be observed in e-mails [38], computer files [2, 20, 10], and Jupyter notebooks [30].

Considering our context of intent recognition, intent identifiers might contain a high-level representation of the main content of the intent. As shown in [9], intent identifiers can be formatted as natural language sentences to learn a model which maps training examples into those sentences, so that the meta-knowledge can be used in zero-shot learning  [37]. Unfortunately, the dataset explored in this work is very limited. Recent work has also demonstrated that intent recognition can be improved with enhanced class representations such as word-graphs [6] by mining symbolic knowledge from the example utterances.

This work aims to fill in some of those gaps by providing a better understanding of the usefulness of the meta-knowledge embedded in intent identifiers by exploring a large set of intent recognition datasets; and by going deeper into the symbolic representations of the identifiers viewing them as quasi taxonomies.

Embedded Meta-Knowledge in Intents of Conversational Systems

Most real-world, deployed conversational systems in use today have been built based on the rule-based intent-action paradigm, using platforms such as, Watson Assistant, or Alexa Skills. Each intent corresponds to a desired information or answer from the user and is defined by a set of exemplar utterances by the chatbot developers. During runtime, each utterance from the user is recognized as one of the defined intents or as out-of-scope (OOS), and the associated action is generated, often a pre-written sentence created by developers or subject-matter experts (SMEs).

Figure 1: Pre-defined intents for utilities-related chatbots of the Watson Assistant platform.
Figure 2: The intent proto-taxonomy associated to the utilities-related intents of fig. 1.

Many of those platforms also come with a pre-defined, domain-specific list of intents which can be added to any chatbot to speed-up development. For example, fig. 1 shows the list of pre-defined intents from Watson Assistant for utilities-related conversational systems. Notice that the names of intents aim to describe the meaning of each pre-defined intent by representing it through a sequence of keywords separated by underscore characters. Some of those keywords appear many times (marked in colors), with a structure which has semblance to a taxonomy.

This pattern of naming intents following a categorical path can also be found in the pre-defined intents of other platforms and, as we show in this paper, in the names of the intents defined by developers themselves. The goal seems to provide the intent classes with a summarized description of each intent in a way that the similarity of different intents is highlighted. Such patterns are also common in the way people organize files and e-mails in computers [10, 38] and how software developers name functions [39].

At the same time, by regarding the keywords in the name intents as basic concepts and the underscore characters as a connection between them, we can structure the list of intent identifiers as a sort of very basic knowledge graph [14], here referred as intent proto-taxonomies. Figure 2 depicts the intent proto-taxonomy associated to the list of the intents in fig. 1.

A basic inspection of the intent proto-taxonomy shown in fig. 2 reveals that, as a knowledge graph, it has several shortcomings, such as the lack of action verbs in some intents. In addition, even in this case of a professional list of intents provided by a highly developed tool, the meta-knowledge mined from the list of the intent identifiers has inconsistencies and seems to be incomplete. However, it has two great qualities: (1) it is embedded in the conversational system, so there is no need of knowledge acquisition from experts; (2) it is easily mined.

The key question is whether, given its limitations as discussed above, the embedded knowledge is good enough to be used by neuro-symbolic algorithms. We will show later that, indeed, this meta-knowledge can enhance machine learning algorithms. But first let us examine the evidences we found that the practice of naming intents in conversational systems using an intent proto-taxonomy is a fairly common practice, and thus able to provide structured domain meta-knowledge almost “for free” for a large number of professional systems in use today.

How Developers Use Intent Proto-Taxonomies

We conducted a 4-day design workshop with four expert developers to understand the challenges SMEs and developers of conversational systems have and what could facilitate their work [29]. Those SMEs have developed chatbots for the auto industry, banking, and telco using the ChatWorks platform (anonymized for review).

The structuring of the intent identifiers using proto-taxonomies was discussed and explored with them. The SMEs reported a very pro-active practice of naming intents following a formally defined structure, typed in a sort of taxonomy, and shared among their peers and domain experts in the clients, who were also responsible for maintaining those systems. Some of them brought mindmaps to explain the concept relations in the workspace (the set of all intents) and showed how they make available those concepts to their team in the system or using spreadsheets. They told us that often underscore characters are used to separate concepts and that the order of the concepts usually represents how the workspace is organized.

The taxonomy, as they often refer to it, is also a kind of self-indexing information for future use, a name that, by representing the semantics of an intent, can be used to simplify their work and collaboration. This study provided evidence that the use of structured meta-knowledge in the intent identifiers was an intentional and well-established practice among some developers. The remaining question was how widespread this practice was.

The Use of Intent Proto-Taxonomies in ChatWorks

The ChatWorks platform has an opt-in feature in which developers of chatbots can share their code and content (called workspaces) with the company which owns the platform for research and development purposes. We were given access to about 18K workspaces active in six months between 2019 and 2020, all of them in English language. Those workspaces were filtered to remove duplicates and workspaces with less than 8 intents. The resulting dataset is composed of 3,840 workspaces. About 81% of them had from 8 to 100 intents and the largest had 1,974 intents.

We used two criteria for taxonomy identification and size in each workspace: (1) the intent identifier must have a concept structure (words) separated by a symbol (separator); and (2) the concepts in the same position should be able to group themselves at least in two different classes. Given a workspace with a set of intent identifiers, we first ranked the best separator (period, underline, camelcase, or dash) to split the name into concepts.

To this end, we calculated the perplexity [24] of a bag of concepts using each separator and selected the one with minimum perplexity. Next, each intent identifier was split using the selected separator and the resulting list of intent identifiers was compared to each other by the concepts at the same level. In a level, if the concepts were either all equal or all different, then that level was not evaluated. When the grouping of concepts was possible, the intents with those concepts were selected as intents with taxonomy. The taxonomy rate was calculated by the ratio between the number of intents with taxonomy and the number of intents created by the user (excluding all the pre-defined domain-specific intents provided by ChatWorks).

Figure 3: Distribution of the number of workspaces according to the taxonomy rate of the 3,840 English workspaces.

Using those metrics, 76% of all 3,840 workspaces had a taxonomy rate above 10%, almost 52% had a taxonomy rate above 50%, and 16% had a very high taxonomy rate from 90% to 100%. Figure 3 shows how the 3,840 workspaces are distributed considering both the total number of intents ( axis) and the taxonomy rate ( axis). Notice that the distribution follows a sort of “step” function where, as the threshold of 64 intents in the workspace is crossed, the majority of the workspaces had more than 50% of taxonomy rate. It seems that, as the complexity of the workspace increases with the number of intents, more often developers and SMEs resort to structure the intents as a proto-taxonomy.

Notice that the same inconsistencies which were seen in the pre-defined intents from Watson Assistant of fig. 1 seem to be also present in the developers’ workspaces. Nevertheless, the results of this analysis seem to overwhelming confirm that using intent proto-taxonomies is a fairly common practice in ChatWorks, reaching at least around 80% of all workspaces and even more common in the workspaces with a high number of intents.

Using Mined Meta-Knowledge to Improve Intent Recognition

This section presents a formal description of the methodology employed in this work to take advantage of the proto-taxonomies in a neuro-symbolic approach.

Embedding the Set of Classes

An intent classification method is a function which maps a set of sentences (potentially infinite) into a finite set of classes :


To enable a numeric, easier handling of the input text, an embedding is often used, mapping the space of sentences

into a vector space

, and defining a classification function such as

. In typical intent classifiers,

is usually composed of a function

which computes the probability of

being in a given class, followed by the arg max function. In many intent classifiers, is the softmax function.


This paper explores how to use embeddings in the other side of the classification functions, that is, by embedding the set of classes into another vector space . The idea is to use class embedding functions which somehow capture the intent proto-taxonomies, as we will show later. Formally, we use a class embedding function , its inverse , and a function to map the two vector spaces so .


In our work we use typical sentence embedding methods to implement . To approximately construct the function we employ a basic Mean Square Error (MSE) method using the training set composed of sentence examples for each class .

Adapting Kartsaklis Method (LSTM)

Our algorithms are inspired by a text classification method proposed for the inverse dictionary problem, where text definitions of terms are mapped to the term they define, proposed in [21]. The embedding of the class set into the continuous vector space (equivalent to the function in equation 3) is done by expanding the knowledge graph of the dictionary words with nodes corresponding to words related to those terms and performing random walks on the graph to compute graph embeddings related to each dictionary node, using the DeepWalk algorithm [28]. DeepWalk is a two-way function mapping nodes into vectors and back.

A Long Short-term Memory

(LSTM) neural network, composed of two layers and an attention mechanism, is used in 

[21] for mapping the input texts to the output vector space. To map the two continuous vector spaces representing the definition texts and the dictionary terms, a MSE function, learned from the training dataset, is used.

For this work, the knowledge graph is replaced by a proto-taxonomy graph which associates each class to a node and connects to each of them nodes that correspond to meta-knowledge concepts related to each class. To better capture the sequential aspect of the proto-taxonomies, we also connect each class node to bigrams of concepts, i.e., the concatenation of two subsequent concepts. We represent this by the function , such as , which is also invertible. Substituting this in equation 3,


In practice, we compute the mapping from the class embedding space into the class set, called here , simply by computing the distance between a point in and the inverted projection of each class from and then considering the closest class. That is, for each , we consider the associated node in and compute the mapping in of that node, as shown here:


By substituting this function into equation 4, we obtain the algorithm we call here LSTM+T:


For comparison, the traditional corresponding classification method is tested, where the graph embedding and associated functions are replaced by discrete softmax outputs. We call this simply LSTM:


Replacing the LSTM with USE

Recently, several general-purpose language models that can be used for computing sentence embeddings have been proposed, and the Universal Sentence Encoder (USE) is one of them [7]. Such an approach consists of a Transformer neural network [36], trained on varied sources of data, such as Wikipedia, web news, web question-answer pages and discussion forums. USE has achieved state-of-the-art results in various tasks, so we decided to try in our experiments as an alternative to the LSTM for the embedding of input sentences.

In this work we employed the multilingual USE version 3111 By replacing LSTM with USE in eq. 6 we obtain algorithm USE+T:


Like in the previous case, we also use the USE algorithm with traditional discrete softmax outputs for comparison, called here USE:


Replacing DeepWalk with USE and CDSSM

To explore variants of algorithms for embedding the classes and also approaches which do not need to be trained from scratch and allow on-the-fly handling of meta-knowledge, we tried replacing DeepWalk with two different methods.

The first one consists of applying USE sentence embeddings also for class embeddings, such as in eq. 10. To simplify notation, EMB represents either LSTM or USE embeddings for the input text.


This approach is similar to the way DeepWalk works but instead of training the graph embeddings from scratch, the class embeddings are represented by the mean sentence embedding computed from different random walks starting in the class node. We name these methods LSTM+S and USE+S, for EMB set with LSTM and USE, respectively.

Additionally, we also evaluate the replacement of DeepWalk by the Convolutional Deep Structured Semantic Model (CDSSM) proposed in [9], yielding the following algorithm where EMB can be either LSTM or USE embeddings.


The CDSSM model consists of a three-layer convolutional neural network trained for creating embeddings of intent identifiers represented as sentences. In this work, we input to CDSSM the sequence of proto-taxonomies for each intent class. We refer to these algorithms as

LSTM+C and USE+C, for EMB set with LSTM and USE, respectively.

Out-of-Scope Sample Detection

In this paper we are particularly interested to determine whether the proto-taxonomies improve the detection of out-of-scope (OOS) samples. A rejection mechanism based on a pre-defined threshold is used for OOS sample detection. This method can be easily applied to all of the methods described previously without the need neither for any specific training procedure nor OOS training data.

In greater detail, suppose that for each class there is a score denoted , where . Given that represents the highest score associated to a class and that a rejection threshold has been defined on a validation set, samples can be classified as OOS whenever . If so, they are simply rejected, i.e., no classification output is produced for them. Otherwise, the sample is considered as an in-scope (IS) sample and the classification is conducted normally.

The scores in are represented either by the softmax probability for the traditional softmax-based methods or by the similarity of sentence and graph embeddings for the proposed approaches. For the latter, the similarity is computed by means of the dot product between those two embeddings.

Metrics, Datasets, and Experiments

In this section we present the experiments to evaluate the algorithms which use the proto-taxonomies with the neuro-symbolic algorithms described in the previous section. We explore their impact on the accuracy of intent recognition both in terms of classifying correctly utterances (in-scope accuracy) and determining which utterances are not covered by a set of intents (out-of-scope accuracy).

Evaluation metrics

We take into account a commonly-used metric for OOS dectection, i.e. equal error rate (EER) [34] which corresponds to the classification error rate when the threshold is set to a value where false acceptance rate (FAR) and false rejection rate (FRR) are the closest. These two metrics are defined as:


In addition, in-scope error rate (ISER) is considered to report IS performance, i.e. the accuracy considering only IS samples with set to zero, similar to the class error rate in [34]. This metric is important to evaluate whether the alternative classification methods are able to keep up with the performance of their counterparts in the main classification task.

The Larson and Telco Datasets

During the development and initial testing of the algorithms, we used two English datasets for in-depth experimentation. The first is the publicly-available Larson dataset [22]; the second is a private real-world chatbot dataset used by telecommunications provider for customer care, called here the Telco dataset. In the former, we added a proto-taxonomy by hand based on the identifiers of intents; in the latter case, we structured by hand original proto-taxonomy. The goal of the adjustments was to avoid spurious interference from taxonomy errors in the initial results.

In Larson there is a total of 22,500 in-scope samples, evenly distributed across 150 classes, where 18,000 examples are used for training and 4,500 for test. We conducted a simulation of OOS detection with the in-scope examples by doing five random samplings where we took out 30 intents and 3,600 training examples. We trained only with the remaining 120 intents and 14,400 examples. The test was then conducted on the 4,500 samples where 3,600 remained in-scope and 900 became OOS examples.

The Telco dataset contains 4,093 examples and 87 intents. From those, 3,069 examples were used for training and 1,024 for test. The OOS scenario was simulated by extracting different random samplings where 5 intents were removed. Given the smaller size of this dataset compared to Larson, we conducted 20 samplings instead of 5.

Figure 4: Different methods to incorporate the proto-taxonomy in Larson dataset, compared to the LSTM and USE baselines.
Figure 5: Different methods to incorporate the proto-taxonomy in Telco dataset, compared to the LSTM and USE baselines.

For both sets we considered the following setup defined after preliminary evaluations. For the LSTM-based methods, the input sentence embedding size was set to 150 and output embeddings to 200. DeepWalk walk sizes were set to 20 for LSTM+T and USE+T. For both USE-based methods and the softmax-ones we trained a two-layer neural network with 800 hidden neurons. They were trained for 50 epochs.

Results in the Larson and Telco Datasets

The results on the Larson dataset are presented in fig. 4. We observe that there can be a slight improvement in EER, in special with the USE-based and the LSTM+C methods. Nevertheless, there is a significant improvement in terms of FAR for all USE-based methods and LSTM+S and LSTM+C. Notice that even though the proposed approaches generally do not outperform LSTM and USE in ISER (except LSTM+C) the methods that approximate closer the ISER to the softmax counterparts tend to result in better EER and FAR rates.

In fig. 5

, the results on the Telco dataset show a different scenario. The proposed methods generally perform worse than or, at best, comparable to LSTM and USE in EER. In terms of FAR, some methods such as USE+T and USE+C seem to outperform but, considering the high standard deviation, the results are not significant. On the other hand, we also observe that the methods failed to get close in ISER compared with the softmax-based methods. That seems to indicate that for the cases where making use of meta-knowledge harms too much ISER, the symbolic knowledge creates noise and does not help improving either EER or FAR.

There were two key findings from our experiments with the Larson and the Telco. First, the improvements using LSTM or USE as a base seem to be similar, possibly slightly better for the USE algorithm. Second, and most importantly, we saw much more improvements in the use of the proto-taxonomy in the Larson than in the Telco dataset, in spite of the similar nature of the datasets and the proto-taxonomies. This motivated us to try out the ideas in a larger and more diverse number of workspaces and solely focusing on USE to simplify the experiments.

The ChatWorks Dataset

We used the large set of real, professional workspaces from ChatWorks to create a dataset where our neuro-symbolic algorithms could be tested in a context of high diversity and realism. We started with the 3,840 workspaces available in English. To eliminate possible problems due to workspaces with poor quality, only workspaces with taxonomy rates over 30% were considered. Next, workspaces with outliers in the number of intents and examples were removed following the

-rule, where values which extrapolate 3 standard deviations from the mean are not considered. Finally, the ratio between the number of examples and intents must be greater than 10. From the filtered set we randomly selected 200 workspaces.

The testing procedure involves the execution of 20 iterations for each workspace. The tests are performed for all USE-based methods (USE, USE+T, USE+S, and USE+C). Initially, the workspaces are split into training and test datasets (75% and 25%, respectively). Next, the four methods are trained and tested on such datasets. The evaluation metrics (EER, FAR and ISER) are measured on the results for the test datasets, being analyzed in terms of improvement of the proto-taxonomy models (USE+T, USE+S, and USE+C) in comparison to the base model (USE).

Results in the ChatWorks Dataset

Table 1 summarizes the results of the experiments with the ChatWorks dataset showing the distribution of the workspaces according to ranges of the percentage of improvement of each neuro-symbolic method compared to the baseline of USE (negative values signal worse than baseline). We highlight in boldface the best results for each range. Notice that when the neuro-symbolic method is worse than the baseline (), smaller is better, and conversely for when it is better than the baseline (). The last column, best, corresponds to the results considering the use of the best algorithm of the three algorithms.

EER % of workspaces
improvement USE+T USE+S USE+C best
68% 60% 55% 39%
16% 23% 18% 22%
8% 8% 7% 11%
7% 7% 12% 17%
2% 4% 9% 12%
FAR % of workspaces
improvement USE+T USE+S USE+C best
37% 47% 28% 8%
17% 17% 17% 15%
5% 7% 8% 8%
13% 11% 16% 19%
29% 19% 32% 52%
ISER % of workspaces
improvement USE+T USE+S USE+C best
96% 95% 74% 71%
4% 4% 20% 22%
1% 1% 3% 4%
0% 1% 1% 1%
0% 1% 2% 3%
Table 1: Percentage of workspaces on the ChatWorks dataset which saw different levels of improvement over the USE baseline, in terms of equal error rate (EER), false acceptance rate (FAR), and in-scope error rate (ISER).

The results clearly indicate that the USE+C algorithm achieves the best results in all three metrics, although there is a significant portion of workspaces where the other methods are also competitive. This is particularly true for the out-of-scope detection (FAR).

But, more important, the results support our claim that the meta-knowledge embedded by the developers can be used as input to neuro-symbolic algorithms to increase intent recognition performance. Notably in OOS detection, 71% of the workspaces experienced an increase of 10% or more in accuracy and more than 20% of increase in 52% of them. Even using only the best algorithm, USE+C, 48% of the workspaces saw at least a 10% improvement.

The overall results for accuracy, considering the EER metric, are also impressive. The top of table 1 shows that 28% of the workspaces had improvements of 5% or more with the USE+C algorithm and 29% of them had an increase in 10% or more in accuracy if they applied the best algorithm. However, the results for the in-scope accuracy (ISER) were much smaller with only about 6% of the workspaces having an improvement of 5% or more. We discuss these results and their implications next.

Discussion and Future Work

We started this paper by showing evidence that there is a systematic practice of embedding symbolic knowledge into the intent identifiers among developers of professional conversational systems. We explored in detail the case of the ChatWorks platform and showed that a significant number of the workspaces have some sort of intent proto-taxonomy. This result was further validated in the workshops we had with professional ChatWorks developers.

The results of the experiments indicate that the intent proto-taxonomies embedded by those developers can indeed be used in many workspaces to improve accuracy in intent recognition. More than half of the workspaces drawn from ChatWorks saw improvements of more than 20% in out-of-scope detection and in a little less than a third of them the overall error rate improved by 10% or more. But why were there so many workspaces where we did not see impact?

First of all, we must take in account that the ChatWorks repository where we draw our dataset from has workspaces in different stages of development and deployment. We can expect significant differences in overall quality both in terms of intent definition and the utterance examples. We explored briefly basic characterizations of the proto-taxonomy quality, such as taxonomy rate, depth of the taxonomy, number of concepts, etc. but we saw no clear correlation with improvements in accuracy rates. We believe more complex metrics of knowledge structure need to be employed to better characterize which proto-taxonomies are good candidates. We plan to do so in our future work.

It is important to notice that, in the workspaces where we did see impact, the symbolic knowledge we mined was in an absolutely “raw” format. In spite of that, using the basic graph mining method described in the paper it was possible to obtain a “meaningful” knowledge structure, similar to a knowledge graph which could be used by our neuro-symbolic algorithms. To improve the quality of the taxonomies, we are working on designing an interface which allows the developers to manipulate directly the intent proto-taxonomy and, possibly, make it more correct, complete, and able to impact even more the intent recognition rates.

Moreover, we explored in this paper one particular case of symbolic knowledge embedding by developers of machine learning systems. However, it is unlikely that we will find in all machine learning development platforms similar patterns of knowledge embedding. We know, as discussed in the related work section, that people use similar proto-taxonomies when they name file and e-mail folders, when giving names to functions and variables in programs and data, and when writing comments into Jupyter notebooks. Also, platforms can foster further the use of meta-data and comments by developers, aiming to organically elicit usable knowledge, even if by doing so that knowledge is found to be inconsistent or incomplete.

This can be a realistic path to knowledge acquisition for neuro-symbolic systems since we demonstrated here that such casual, organic, unsolicited knowledge can be mined and used effectively. It is likely that the fusion of neural and symbolic processing was key to handle the many mistakes and problems we found in that buried knowledge and we plan to explore this further in our experiments. But we hoped we have made the case that there is a hidden treasure of symbolic knowledge in many real-world systems and that robust neuro-symbolic methods such as the ones we described in the paper may be able to extract value from them.


  • [1] M. N. Asim, M. Wasim, M. U. G. Khan, W. Mahmood, and H. M. Abbasi (2018) A survey of ontology learning techniques and applications. Database 2018. Cited by: Introduction.
  • [2] D. Barreau and B. A. Nardi (1995-07) Finding and reminding: file organization from the desktop. SIGCHI Bull. 27 (3), pp. 39–43. External Links: ISSN 0736-6906, Link, Document Cited by: Related Work.
  • [3] Y. Bengio (2017) The consciousness prior. External Links: 1709.08568 Cited by: Introduction, Related Work.
  • [4] T. R. Besold, A. d’Avila Garcez, S. Bader, H. Bowman, P. Domingos, P. Hitzler, K. Kuehnberger, L. C. Lamb, D. Lowd, P. M. V. Lima, L. de Penning, G. Pinkas, H. Poon, and G. Zaverucha (2017) Neural-symbolic learning and reasoning: a survey and interpretation. External Links: 1711.03902 Cited by: Introduction, Related Work.
  • [5] J. H. Boose (1989) A survey of knowledge acquisition techniques and tools. Knowledge acquisition 1 (1), pp. 3–37. Cited by: Introduction.
  • [6] P. Cavalin, V. H. A. Ribeiro, A. Appel, and C. Pinhanez (2020) Improving out-of-scope detection in intent classification by using embeddings of the word graph space of the classes. In

    Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

    pp. 3952–3961. Cited by: Related Work.
  • [7] D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco, R. St. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, B. Strope, and R. Kurzweil (2018-11) Universal sentence encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium, pp. 169–174. External Links: Link, Document Cited by: Replacing the LSTM with USE.
  • [8] S. K. Chang (2001) Handbook of software engineering and knowledge engineering. Vol. 1, World Scientific. Cited by: Introduction.
  • [9] Y. Chen, D. Hakkani-Tür, and X. He (2016) Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 6045–6049. Cited by: Related Work, Replacing DeepWalk with USE and CDSSM.
  • [10] A. Civan, W. Jones, P. Klasnja, and H. Bruce (2008) Better to organize personal information by folders or by tags?: the devil is in the details. Proceedings of the American Society for Information Science and Technology 45 (1), pp. 1–13. External Links: Document, Link, Cited by: Related Work, Embedded Meta-Knowledge in Intents of Conversational Systems.
  • [11] A. d’Avila Garcez, M. Gori, L. C. Lamb, L. Serafini, M. Spranger, and S. N. Tran (2019) Neural-symbolic computing: an effective methodology for principled integration of machine learning and reasoning. External Links: 1905.06088 Cited by: Related Work.
  • [12] E. Davis (2014) Representations of commonsense knowledge. Morgan Kaufmann. Cited by: Introduction.
  • [13] L. De Raedt, R. Manhaeve, S. Dumancic, T. Demeester, and A. Kimmig (2019) Neuro-symbolic = neural + logical + probabilistic. In Proceedings of the 2019 International Workshop on Neural- Symbolic Learning and Reasoning, pp. 4 (eng). External Links: Link Cited by: Introduction, Related Work.
  • [14] L. Ehrlinger and W. Wöß (2016) Towards a definition of knowledge graphs.. SEMANTiCS (Posters, Demos, SuCCESS) 48, pp. 1–4. Cited by: Embedded Meta-Knowledge in Intents of Conversational Systems.
  • [15] M. Fossati, D. Kontokostas, and J. Lehmann (2015-09) Unsupervised learning of an extensive and usable taxonomy for dbpedia. In Proceedings of the 11th International Conference on Semantic Systems, SEM ’15. Cited by: Introduction, Related Work.
  • [16] F. Hayes-Roth (1984) The industrialization of knowledge engineering. Artificial Intelligence Applications for Business, pp. 159–177. Cited by: Introduction.
  • [17] D. Hudson and C. D. Manning (2019) Learning by abstraction: the neural state machine. In Advances in Neural Information Processing Systems, pp. 5903–5916. Cited by: Introduction.
  • [18] D. Hudson and C. D. Manning (2019) Learning by abstraction: the neural state machine. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. Alché-Buc, E. Fox, and R. Garnett (Eds.), pp. 5903–5916. External Links: Link Cited by: Related Work.
  • [19] S. Ji, S. Pan, E. Cambria, P. Marttinen, and P. S. Yu (2020) A survey on knowledge graphs: representation, acquisition and applications. External Links: 2002.00388 Cited by: Introduction, Related Work.
  • [20] W. Jones, A. J. Phuwanartnurak, R. Gill, and H. Bruce (2005) Don’t take my folders away! organizing personal information to get ghings done. In CHI ’05 Extended Abstracts on Human Factors in Computing Systems, CHI EA ’05, New York, NY, USA, pp. 1505–1508. External Links: ISBN 1595930027, Link, Document Cited by: Related Work.
  • [21] D. Kartsaklis, M. T. Pilehvar, and N. Collier (2018-October-November) Mapping text to knowledge graph entities using multi-sense LSTMs. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 1959–1970. External Links: Link, Document Cited by: Related Work, Adapting Kartsaklis Method (LSTM), Adapting Kartsaklis Method (LSTM).
  • [22] S. Larson, A. Mahendran, J. J. Peper, C. Clarke, A. Lee, P. Hill, J. K. Kummerfeld, K. Leach, M. A. Laurenzano, L. Tang, and J. Mars (2019-11) An evaluation dataset for intent classification and out-of-scope prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 1311–1316. External Links: Link, Document Cited by: The Larson and Telco Datasets.
  • [23] R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, and L. De Raedt (2018)

    DeepProbLog: neural probabilistic logic programming

    In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 3749–3759. External Links: Link Cited by: Related Work.
  • [24] C. Manning and H. Schutze (1999) Foundations of statistical natural language processing. MIT press. Cited by: The Use of Intent Proto-Taxonomies in ChatWorks.
  • [25] J. Mao, C. Gan, P. Kohli, J. B. Tenenbaum, and J. Wu (2019) The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision. In International Conference on Learning Representations, External Links: Link Cited by: Introduction, Related Work.
  • [26] A. Oltramari, J. Francis, C. Henson, K. Ma, and R. Wickramarachchi (2020) Neuro-symbolic architectures for context understanding. External Links: 2003.04707 Cited by: Related Work, Related Work.
  • [27] E. Parisotto, A. Mohamed, R. Singh, L. Li, D. Zhou, and P. Kohli (2017) Neuro-symbolic program synthesis. In International Conference on Learning Representations (ICLR), Cited by: Introduction, Related Work.
  • [28] B. Perozzi, R. Al-Rfou, and S. Skiena (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: Adapting Kartsaklis Method (LSTM).
  • [29] C. Pinhanez, H. Candello, P. Cavalin, M. Pichiliani, A. Appel, V. Ribeiro, J. Nogima, M. de Bayser, M. Guerra, H. Ferreira, and G. Malfatti (2021) Combining training data with informal knowledge from collaborative practices in neuro-symbolic conversational systems. Note: Submitted to the SIGCHI Conference on Human Factors in Computing Systems (CHI’21) Cited by: How Developers Use Intent Proto-Taxonomies.
  • [30] A. Rule, A. Birmingham, C. Zuniga, I. Altintas, S. Huang, R. Knight, N. Moshiri, M. H. Nguyen, S. B. Rosenthal, F. Pérez, et al. (2018) Ten simple rules for reproducible research in jupyter notebooks. arXiv preprint arXiv:1810.08055. Cited by: Related Work.
  • [31] P. Singh, T. Lin, E. T. Mueller, G. Lim, T. Perkins, and W. L. Zhu (2002) Open mind common sense: knowledge acquisition from the general public. In OTM Confederated International Conferences” On the Move to Meaningful Internet Systems”, pp. 1223–1237. Cited by: Introduction.
  • [32] R. Studer, V. R. Benjamins, and D. Fensel (1998) Knowledge engineering: principles and methods. Data & knowledge engineering 25 (1-2), pp. 161–197. Cited by: Introduction.
  • [33] R. Studer, D. Fensel, S. Decker, and V. R. Benjamins (1999) Knowledge engineering: survey and future directions. In German Conference on Knowledge-Based Systems, pp. 1–23. Cited by: Introduction.
  • [34] M. Tan, Y. Yu, H. Wang, D. Wang, S. Potdar, S. Chang, and M. Yu (2019-11) Out-of-domain detection for low-resource text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3566–3572. External Links: Link, Document Cited by: Evaluation metrics, Evaluation metrics.
  • [35] J. B. Tenenbaum, C. Kemp, T. L. Griffiths, and N. D. Goodman (2011) How to grow a mind: statistics, structure, and abstraction. Science 331 (6022), pp. 1279–1285. External Links: Document,, ISSN 0036-8075, Link Cited by: Introduction.
  • [36] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: Replacing the LSTM with USE.
  • [37] W. Wang, V. W. Zheng, H. Yu, and C. Miao (2019) A survey of zero-shot learning: settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10 (2), pp. 1–37. Cited by: Related Work.
  • [38] S. Whittaker, T. Matthews, J. Cerruti, H. Badenes, and J. Tang (2011) Am I wasting my time organizing email? a study of email refinding. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, New York, NY, USA, pp. 3449–3458. External Links: ISBN 9781450302289, Link, Document Cited by: Related Work, Embedded Meta-Knowledge in Intents of Conversational Systems.
  • [39] B. Yang, Z. Liping, and Z. Fengrong (2019) A survey on research of code comment. In Proceedings of the 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences, ICMSS 2019, New York, NY, USA, pp. 45–51. External Links: ISBN 9781450361897, Link, Document Cited by: Related Work, Embedded Meta-Knowledge in Intents of Conversational Systems.