In literature, the concept of “Industry 4.0” is often linked to its nine enabling technologies, defined for the first time by the Boston Consulting Group . Although, many works underline that Industry 4.0 is not only a mere application of these technologies, but also involves many organizational and management challenges to better face the competition : technologies are meaningful if the company are able to extract the real value with workers able to use and maintain them. Thus, knowledge management becomes crucial for companies facing the Fourth Industrial Revolution.
Knowledge management could be supported by the implementation of dialog systems, able to spread knowledge within an organization. But actually, dialog systems are hand-coded , first identifying the needed functions, and then planning the interactions with the users. Therefore, the building of a chatbot is a time-consuming and not scalable process.
The research goal is to design a methodology for automatic building of human-machine conversational system, able to interact in an industrial environment. The aim of this paper is to provide a first step towards this objective, by automatically capturing the knowledge underpinning a maintenance process, in order to build knowledge base that will be the input for the conversational system.
Since the methodology is based on text mining techniques, an initial maintenance taxonomy, containing entities (such as components, verbs etc.) likely to be found in a maintenance manual, is used to identify relevant sentences in a technical document provided by the company BOBST SA. Then, the taxonomy is automatically expanded using these sentences and the main result is a taxonomy network, representing the entities and their relations.
This paper is structured as follow: in section 2, a literature review is carried out for helping the reader to deeply understand both the importance of the topic of knowledge in the digital age and the link between knowledge management and conversational systems. Moreover, an overview on the tools underpinning the mapping of the knowledge repository of a chatbot is provided. Then, the designed methodology and its application are explored in section 3 and the main results of the research are outlined in section 4. In the end, in section 5, the comments of the authors about the results are remarked and the future developments of the work are highlighted.
2 Literature Review
2.1 Knowledge Management in Industry 4.0
Knowledge in an organization is the collection of expertise, experience, and information that individuals and workgroups use during the execution of their tasks . Even if in literature there are many taxonomies and models for describing knowledge from different perspectives, the most relevant distinction is between tacit and explicit knowledge. Tacit knowledge is embedded in people mind and it is difficult, if not impossible, to be exploited . Explicit knowledge exists as text documents, structured databases, images and many other forms. For this reason, it is easier to be formalized and, consequently, shared within an organization.
Despite the fact that philosophers, scientists and writers have been wondering for centuries about how to create, acquire, communicate knowledge and, in particular, how to re-use it, only in the last 25-30 years knowledge management has been universally recognized as a self-sustaining research topic . Currently, the knowledge management could be described by using different models. In particular, the model we have taken into account for this research consists of six core phases : generate, refine, store, transfer, share and use knowledge. Knowledge management’s main objective is to improve each one of the six steps.
The knowledge management becomes even more relevant in the context of Industry 4.0. I4.0 is defined as a trend of automation that differentiates itself from previous industrial paradigms because of its global scope, its exponential growth and its still uncertain (but for sure powerful) impact . People talking about this new paradigm usually focus on its enabling technologies (such as 3D printing and clouds), leaving aside the role that data and information play in the digital age. Actually, data are hidden behind each one of the technologies 4.0: just think about the already mentioned clouds, which allow the storage and transmission of data, or the simulation, which is made possible only by the availability and modeling of data. This thesis is confirmed also by the Acatech study on the new “Industrie 4.0” , that recognizes knowledge management as one of the missing building blocks of the Fourth Industrial Revolution.
Therefore, the most revolutionary aspect of I4.0 is not acquiring new machines but being able to manage the knowledge needed to take full advantage from them. I4.0 phenomenon leads the workers to solve non-standardized problems by using their knowledge: in this, workers could be supported by many technologies 4.0 for generating, refining, storing, transferring, sharing and use (and re-use) knowledge . The concept of “knowledge worker”, the man or woman who applies to productive work ideas, concepts, and information rather than manual skill or brawn , and, consequently, the concept of “knowledge management” are definitely enhanced by the Fourth Industrial Revolution.
2.2 Conversational systems to manage knowledge
In literature, it is recognized that the knowledge sharing could be supported by the implementation of a query system and routing the queries to a knowledge expert . Although, the success of this kind of approach is strongly dependent from human intervention and a bottleneck arises when the limit of queries that can be processed by a human being is reached. This problem could be solved by implementing a conversational system that is potentially able to answer to an infinite number of queries coming from any locations . In fact, many researches on knowledge management, as for example the one conducted by Schacht and Mädche , provide not only mechanisms for direct communication with experts (e.g. social networking sites, forums, chats), but also design principles mainly focused on the externalization of knowledge, its storage, retrieval and reuse.
Conversational systems could be classified in two categories: chatbots and dialog systems. The main differences between the two ones are that dialog systems (1) are usually built to be used in a more specific domain than chatbots and (2) have a more complex architecture than chatbot . In fact, while chatbots consist only in a set of predefined responses and a pattern matching module, dialog systems consist of four modules, each one with a different function (pre-processing, natural language understanding, dialog managing and response generating) . However, even in literature the two categories are not clearly distinct, so we can consider the two expressions mostly interchangeable and having the same meaning: machines able to hold a conversation with another agent or with a human .
Such systems are increasingly gaining ground in the consumer market (for example Amazon Echo, which incorporates the intelligence of the virtual assistant Alexa). Moreover, they are upsetting the communication between companies and customers, too. Many cases of chatbot applications for customer care and e-commerce management could be identified. Although, chatbot could be used in many other scenarios: one of the most interesting applications is in supporting workers in their everyday tasks. At the same time, implementing chatbots in business scenarios represents a big challenge: in fact, they still encounter some resistance in industrial contexts, revealing that the conversational systems designed for the workplace are not achieving the same success of the other ones.
Regardless of the application field (business or consumer), the brain of the chatbot is the knowledge base in which the possible responses are contained: on the basis of the user input, the dialog managing module is able to generate the most likely matching output. The main issue of the knowledge bases currently used in chatbots is that they are hand-coded. Therefore, the building of chatbot knowledge base is time-consuming and difficult to adapt to different cases and domains . There are many works aimed to automatically build a chatbot knowledge base, among these we can mention the one conducted by Shawar and Atwell . However, as far as we know, these works use as key tool the human annotation: therefore, they cannot solve the problem of time-consuming human intervention.
The intuition behind this research is that, conversely, the knowledge base of a dialog system for industrial applications can be built from scratch by simply extracting rules from technical documentation. Such rules will be then fed to an engine (expert system) suitable to manage the chatbot. This makes the chatbot building independent from the human intervention and, consequently, faster and cost-efficient. The interest in this direction is confirmed by many previous works: Ramesh et al.  highlight that chatbots have now come a long way from simple retrieval-based pattern matching approach to deep learning based neural networks one.
The extraction rules could be applied to every kind of technical documentation: technical specifications, standards, quotations, user manuals etc. The focus of this research is on the building of a chatbot to support maintenance processes, so the work involves the extraction of information from a specific type of technical document: the maintenance manual. The reason behind this choice is giving a strong priority to maintenance topic. In fact, maintenance assumes a key role in I4.0 context: with the increasing complexity, scope, and organisational role of new technologies 4.0, their maintenance is becoming a critical factor for determining the competitive success of an organization .
2.3 Knowledge Mapping for Conversational Systems
The building process of such conversational systems able to interact in an industrial maintenance process environment, relies upon the retrieval of the effective value-added information stored in the process documents.
These chatbots must be able to connect human speech with the intent they were equipped and helping an operator to find answers about a particular process or machine. These systems owe their skill to Natural Language Processing (NLP) that examines human speech and makes use of knowledge about the sentences structure, extracting entities about the maintenance process using machine-learned pattern recognition. So that entities, nothing more than data buckets used in conversation, are the building blocks of the knowledge base that run over the conversational system and extracting them to fed this body of knowledge becomes of crucial importance.
In general, Named Entity Recognition (NER) is the task of identifying entity names like people, organizations, places, temporal expressions or numerical expressions . Methods and algorithms to deal with the entity extraction task are different, and since the most effective are the ones based on supervised methods, our dealt with a very closed domain moves our approach through an automatic extraction that focuses on evaluation sentence-oriented statistics of individual word . In the very first step of this line of research, we focus on the development of such entities and their relations that occur in the maintenance process as representation of knowledge that will be the input for designing and building our conversational system.
3 Materials and Methods
For reaching the goal, a dedicated workflow has been designed (Figure 1).
Firstly, the maintenance manual is pre-processed, obtaining a list of sentences (1). Concurrently, a taxonomy of entities expected to be found in a maintenance manual (such as verbs, components, DPI, etc.) is automatically expanded (2) through synonyms extracted using MultiWordNet . The output is an expanded maintenance taxonomy. Then, regular expressions associated to the entities are set up, building the seed, a collection of elements useful for running the extraction process (3). After that, the seed is used for the extraction of relevant sentences, containing at least one of its elements (4). New entities are automatically extracted from the relevant sentences (5), by using an automatic keyword extraction algorithm . Then, the extracted entities are manually checked in order to validate them and the taxonomy is expanded with the new approved entities, for the process improvement in further applications. Finally, the entities and their relation are represented in a taxonomy network.
In this paper, we ran an experiment using as a source a maintenance manual provided by BOBST SA, a global company that produces presses for rotogravure printing and coating and laminating machines for the flexible materials industry. The manual describes the maintenance operations of a flexographic printing machine for labels and flexible packaging and it is written in Italian. The results of method application will be deeply described and shown in the following sections.
The work started with the pre-processing of the maintenance manual. The manual was a pdf document: firstly, it was converted in a plain text file in order to be processed. Then, the text was splitted in single sentences.
This manual consists of many pages, each one containing a maintenance card corresponding to a single maintenance operation (Figure 2).
This implies that all the pages have the same structure. This reasoning could be applied to every maintenance manual: each manual is almost structured and, once identified the recurring structures, we can easily understand its recurring contents. For example, the procedures are always expressed by using bullet points. Based on these criteria, the typology of sentence was automatically identified, among the following:
preliminary notes and warnings;
other (not relevant elements).
This assignment is relevant not only for the next phase of entities extraction, but also for the setting of conversational system interaction (in fact, the chatbot will be asked to “understand” if it is communicating a procedure rather than a warning). Table 1 shows some example of splitted sentences and associated sentence types.
|sentence ID||sentence type||sentence|
|00001OTH5||other||© BOBST 2018|
|00025OTH1||operation name||Controllo impianto pneumatico|
|00035OTH1||operation name||Controllo generale sistema di trasmissione|
We can state that in the technical domain of maintenance, texts (such as manuals) follow not only the same page architecture, but also the same language formalism. This implies that different sentence types follow different recurring grammatical structures (for example, the procedures contain only verbs in the infinitive form). So, we can be confident that, once identified these recurring grammatical structures, the sentence extraction will not be subject to ambiguity issues and will be replicable without human intervention.
3.2 Automatic taxonomy expansion
Concurrently, a maintenance taxonomy was manually generated. Because of the wide variety of entities expected to be found in a maintenance manual, it is crucial to organize them in classes. In the specific maintenance domain, the following classes were identified:
advantage: verbs, nouns and adjectives describing an advantage for the user;
alert: words giving an alert to the user;
chemistry: chemical elements;
chemistry: chemical elements;
component: machine components;
DPI: personal protective equipements;
drawback: verbs, nouns and adjectives describing a drawback for the user;
lifecycle: phases of machine life cycle;
math: mathematical symbols and expressions;
qualification: qualifications of the operators;
tool: tools to be used during operations;
The identification and population of the classes are due to this specific manual and, of course, the analysis of additional maintenance manuals could lead to the identification of further classes and entities. These two phases were performed by using both the expertise of domain technicians and the results of previous research works (such as advantages and drawbacks  and functional verbs ). 1271 entities were identified, stored and classified in the maintenance taxonomy.
The taxonomy was then automatically expanded. In fact, due to the large number of ways to express a concept in Italian, it has proved necessary to identify entities synonyms. Synonyms were identified by using MultiWordNet, reaching an expanded taxonomy of 3943 entities. Table 2 shows an extract from the maintenance taxonomy expanded, containing entities and associated synonyms and classes.
|component||martinetto||cricco, binda, martinello|
|DPI||elmo||elmetto, cimiere, cimiero, casco|
|qualifica||modellista||disegnatore_di_moda, couturier, stilista|
|utensile||cesoie||trancia, forbici_da_tosatore, tranciatrice|
3.3 RegEx Building
In order to perform the sentences extraction, it is necessary to translate the human language in a language that the machine is able to use in order to perform the extraction task. This is possible by using regular expressions (also called regEx), sequences of characters that define a search pattern . One regular expression for each class was built. The output of the phase is the seed, a collection of elements useful for running the extraction process, whose structure is shown in the extract in Table 3.
3.4 Sentences Extraction
The seed was used for identifying, among the sentences, only those containing at least one of its elements. The extracted sentences are those more relevant to the maintenance topic. The phase allowed the extraction of 151 relevant sentences. Table 4 6 shows 3 examples of extracted sentences and associated ID, along with the matching entity and associated class.
|sentence ID||sentence||entity matched||entity class|
3.5 New Entities Extraction
The results of the previous phase have highlighted the need to further expand the taxonomy. In fact, as could be seen in Table 4, the relevant sentences contain, as well as the entities of the seed, many others not considered but relevant for the building of an expert system able to manage a conversation. The RAKE algorithm  permits the automatic extraction of those new entities. As parameter, it has been decided to set to 2 the maximum number of words composing the entities. After the extraction, the new expressions were manually checked in order to validate each one of them before updating the taxonomy. In case of positive check, the entities are used to:
expand the existing classes;
create new classes, in case the entity does not match an already existing one.
Since we can assume that, for different manuals, we could have a common list of rejected entities from future new entities extractions, it has been decided to include the rejected entities in a blacklist, namely a list of entities not to be considered for the taxonomy expansion. The blacklist could reduce the complexity of future checking phase. Although, it has to be considered that this approach could introduce some biases, because the rejected entities are strongly linked not only to the domain (in this case maintenance), but also to other variables (for example the typology of machine).
In total, 505 new entities were identified. During the checking process, besides the blacklist, 10 new classes were created:
operation: nouns (not verbs) describing operations;
material: materials to be used during maintenance that are not chemical elements;
mode: ways in which something could be performed;
measurement unit: measurement units;
machine: typologies of machine (not simple components);
time: adverbs or expressions indicating frequencies, periods etc.;
security: collective (not personal) security devices;
user: categories of users;
company: company names;
standard: laws, UNI, CEN and ISO standards, legislative decrees etc.
New entities distribution among classes is shown in Table 5.
The new entities were also automatically associated to their frequency, representing the number of times the same entity was identified in the relevant sentences. Table 6 shows an extract of new entities and associated classes and frequencies.
4 Final Results
The most relevant result of this research is not only the automatic extraction of relevant entities from the maintenance manual, but also the definition of their relations, because a chatbot engine requires them in order to manage a conversation.
The correlation and interconnection of entities contained within the taxonomy is explained through Network Analysis, quantified by measuring their co-occurrence in BOBST manual sentences. Building a graph based only on the co-occurrence measure is the first step towards the construction of a more complex network of relations (such as lexical or semantic similarities). Furthermore, the graph is the best way of representing and visualizing intuitively the output of the whole process.
Once we collected the entities and quantified the relation among them, we represented this structure as a graph where we can find:
a set of nodes, representing the entities, whose attributes are:
size: the absolute frequency in the manual;
colour: the class of the entity;
label: the name of the entity
a set of edges, representing the relation between entities: the higher the thickness, the stronger the relation.
The taxonomy network is shown in Figure 3.
As could be seen in Figure 3, many entities are strongly linked, making the results of the research very interesting to be interpreted. Analyzing the network, the relations among the entities could be deeply explored. In order to make the reader understand the potential of the network, we decided to show in detail an extract of the graph in Figure 4.
The lower part of Figure 4 shows some entities strongly linked: all these have in common the component ghiera (the ring). In purple there are 5 verbs that are linked to the component in the manual: this means that the ring could be for example rotated (ruotare), or pressed (premere). These operations should be performed using a certain pressure, pressione (the unit of measure in pink) and following different ways (for example the ring could be rotated in manual position, posizione manuale, colored in blue). The upper part of Figure 4 shows 3 verbs (svitare, to unscrew, lubrificare, to lubricate, and raccogliere, to pick up) representing actions performed on two very similar components (the valve, vvolantino, and the regulator, registro).
The strength of the system is that it is able to generate different graphs for different sources and this result leads to a first attempt of knowledge representation of the maintenance process. Although this is specific for the BOBST case, it could capture a broader view of the domain we are dealing.
5 Conclusions and Future Developments
This research highlights the first step towards automatic building of a chatbot to support maintenance operations by automatically extracting, from technical documents, entities and their relations and mapping them in a taxonomy network. The lack of human intervention makes the process scalable, enabling, for example, the support for other business functions. Further applications on a large number of maintenance manuals will enlarge the body of knowledge, improving the entity extraction and providing more accurate relations.
The commitment of a multinational company makes the contribution to practice relevant since the methodology is applied on a real case, based on practical needs and makes chatbots development faster and cost-efficient. For what concerns the contribution to scholarship, it is mainly identifiable in the improvement of every phase of the knowledge management, proved to be a crucial topic in Industry 4.0 environment.
The application of the method led us to face different issues and, consequently, to reflect upon the weaknesses of the approach that need to be improved. For this application, it has been decided not to stem (i.e. transform into the root form) the entities belonging to the expanded taxonomy. Although, the limitations concerning the italian language entail more complex considerations in future works, such as stemming and PoS tagging. For example, since stemming (the task of extracting the root of a word) is useful to avoid problems linked to plural, gender and verb inflections, the large number of derived forms in the italian language could wrongly merge two words that originally have different meanings (for example, vita (the life) and vite (the screw) whose stem is “vit”). Furthermore, the classes of entities need to be organized in different hierarchical levels and the use of lexical pattern will improve the extraction task: for example, UNI, CEN and ISO standard are always expressed by following the same lexical pattern; the string UNI (or CEN or ISO) is always followed by a string of three, four or five numeric characters, the two strings could be consecutive or separated by a blank space.
Although the proposed methodology is a first approach to the automatic building of a chatbot mining information from technical documentation, it allowed us to achieve a good result. We are confident that the approach could be improved starting from the previous considerations. The next step of the research will be the building of a domain-specific knowledge base for BOBST maintenance operations.
The authors are very thankful to the company BOBST SA for the material provided and for its support to the research.
-  BCG Group. Report on Industry 4.0: The Future of Productivity and Growth in Manufacturing Industries. 2015
-  Nagy, J., Oláh, J., Erdei, E., Mate, D., Popp, J. The Role and Impact of Industry 4.0 and the Internet of Things on the Business Strategy of the Value Chain - The Case of Hungary. Sustainability. 2018
-  Wu, Y., Wang, G., Li, W., Li, Z Automatic Chatbot Knowledge Acquisition from Online Forum via Rough Set and Ensemble Learning IFIP International Conference on Network and Parallel Computing, 2008.
-  Ghahfarokhi, A. D., Zakaria, M. S. Knowledge retention in knowledge management system: Review 2009 International Conference on Electrical Engineering and Informatics, 2009
-  Polanyi, M. The tacit dimension New York: Doubleday, 1966.
-  King, W. R. (2009). Knowledge Management and Organizational Learning. Knowledge Management and Organizational Learning Annals of Information Systems, pp. 3–13.
-  Last, C. Global Commons in the Global Brain Technological Forecasting and Social Change, 114, 48-64., 2017
-  Schuh, G., Anderl, R., Gausemeier J., ten Hompel, M., Wahlster, W. Industrie 4.0 Maturity Index. Managing the Digital Transformation of Companies (acatech STUDY). Munich: Herbert Utz Verlag., 2017
-  Mittelmann, A. Personal Knowledge Management as Basis for Successful Organizational Knowledge Management in the Digital Age Procedia Computer Science,99, 117-124., 2016
-  Drucker, P. F. The age of discontinuity: Guidelines to our changing society. London: Transaction Publisher, p. 264, 1968
-  Khan, A. Z., Khader, S. A. An approach for externalization of expert tacit knowledge using a query management system in an e-learning environment The International Review of Research in Open and Distributed Learning., 2014
-  Narendra, U. P., Pradeep, B. S., Prabhakar, M. Externalization of tacit knowledge in a knowledge management system using chat bots 3rd International Conference on Science in Information Technology (ICSITech), 2017
-  Schacht, S., Mädche, A. How to Prevent Reinventing the Wheel? – Design Principles for Project Knowledge Management Systems. How Design Science at the Intersection of Physical and Virtual Design Lecture Notes in Computer Science, 1-17, 2013
-  McTear, M. F. Spoken Dialogue Technology-Toward the Conversational User Interface Springer, London, 2004
-  Lester, J., Branting, K., Mott, B. Conversational agents The Practical Handbook of Internet Computing, 2004
-  Masche, J., Le, N.-T. A Review of Technologies for Conversational Systems, 2018
-  Shawar, B. A., Atwell, E. Machine learning from dialogue corpora to generate chatbots Expert Update Journal, vol.6, no.3, pp.25-29., 2003
-  Ramesh, K., Ravishankaran, S., Joshi, A., Chandrasekaran, K. A survey of design techniques for conversational agents, 2017
-  Gola A., Swic A. Computer-Aided Machine Tool Selection for Focused Flexibility Manufacturing Systems Using Economical Criteria Actual Problems of Economics, 10 (124) 383-389, 2011
-  Nadeau, D., & Sekine, S. A survey of named entity recognition and classification, 2009
-  Rose, S., Engel, D., Cramer, N., & Cowley, W. RAKE algorithm: Automatic Keyword Extraction from Individual Documents John Wiley & Sons., 2010
-  Pianta, E., Bentivogli, L., Girardi, C. MultiWordNet: Developing and Aligned Multilingual Database In Proceedings of the First International Conference on Global WordNet, Mysore, India, January 21-25, 2002, pp. 293-302., 2002
-  Chiarello, F., Fantoni, G., Bonaccorsi, A. Product description in terms of advantages and drawbacks: Exploiting patent information in novel ways., 2017
-  Bonaccorsi, A., Apreda, R., Fantoni, G. A theory of the constituent elements of functions, 2009
-  Ruslan Mitkov The Oxford Handbook of Computational Linguistics, 2003