Automation systems in Industry 4.0 [1, 2] and the Internet-of-Things (IoT)  are designed as networks of interacting elements, which can include thousands to hundreds of thousands of physical sensors and actuators. Efficient operation and flexible production require that physical and software components are well integrated, and increasingly that such complex automation systems can be swiftly reconfigured and optimized on demand using models, simulations and data analytics [4, 5]. Achieving this goal is a nontrivial task, because it requires interoperability of physical devices, software, simulation tools, data analytics tools and legacy systems from different vendors and across standards [6, 7, 8, 4]. Standardisation of machine-to-machine (M2M) communication, like the OPC Unified Architecture (OPC UA111https://opcfoundation.org/about/opc-technologies/opc-ua/)  which offers scalable and secure communication over the automation pyramid, and development of Service Oriented Architectures (SOA), like the Arrowhead Framework , are developments supporting the vision of interoperability in Industry 4.0 and the IoT.
However, in addition to data exchange by protocol-level standardisation and translation , information models are required to correctly interpret and make use of the data. There are many different information models and standards defining semantics of data and services, which are developed and customized to fit different industry segments, products, components and vendors. This implies that the problem to translate data representations between different domains is increasingly relevant for robust on-demand interoperability in large-scale automation systems. This capacity is referred to as dynamic interoperability , and operational interoperability  meaning that systems are capable to access services from other systems and use the services to operate effectively together. Thus, focus needs to shift from computing and reasoning in accordance with a representational system to automatic translation and computing over multiple representational systems [11, 4] and engineers should operate at the levels where system-of-systems goals and constraints are defined.
In this paper, we outline a mathematical model of the problem to translate between representational systems in cyber-physical systems (CPS) with integrated physical and software components, and we map some alternative definitions of the translation problem in this model to machine learning tasks and the corresponding state-of-the-art methods. In this model, concepts like symbol grounding, semantics, translation and interpretation are mathematically formulated and possibilities to more automatically create semantic translators with machine learning methods are outlined.
Ii Interoperability model
When integrating SOA systems and services, which are designed according to different standards and specifications, various interfaces that are also subject to domain-specific assumptions and implementation characteristics need to be interconnected. It is common practice to engineer the connections between such interfaces in the form of software adapters that make different components, data, services and systems semantically interoperable, so that functional and non-functional system requirements can be met. This way, a modular structure is maintained, which makes testing and the eventual replacement of a module and updates of the related adapters tractable in otherwise complex systems, at the cost of a quadratic relationship between the number of adapters and the number of interfaces.
In deterministic protocol translation, where representational and computational completeness allows for the use of an intermediate “pivot” representation of information, the quadratic complexity of the adapter concept can be reduced to linearity, see for example  and . However, in the case of semantic translation considered here, it is not clear that such universal intermediate representations exist and constitute a resource-efficient and feasible approach to translation. Furthermore, the research field of dynamic and operational interoperability in SOA lacks a precise mathematical formulation and consensus about the key problem(s). Therefore, we approach the translation problem by formulating it in precise mathematical terms that can be mapped to machine learning tasks.
We define the M2M interoperability problem in terms of translator functions, , which map messages, , from one domain named CPS A to messages in another domain, , named CPS B, see Figure 1. The translators can be arbitrarily complex functions that are generated as integrated parts of the overall SOA, thereby maintaining a modular architecture as in the case of engineered adapters. In general, the translated messages, , cannot be semantically and otherwise identical to the messages communicated within CPS B, , but we can optimize the translator functions to make the error small with respect to an operational loss or utility function. In the following, we elaborate on the latter point and introduce the additional symbols and relationships of the model as the basis for defining translator learning tasks, which in principle can be addressed with machine learning methods.
The model is divided in three levels: cyber (white), physical representation (light gray) and the shared physical environment (gray), see Figure 1. At the cyber level, the graphs and define all discrete symbolic and sub-symbolic metadata that is specific for CPS A and CPS B, respectively. For example, the nodes and edges of these graphs can represent subject, predicate, and object semantic triples defined in the Resource Description Framework (RDF). Each CPS also has discrete internal states, and respectively, such as the computer program variables of all devices in a CPS, which are not directly readable or writeable in the SOA but may be read and modified indirectly via the messages and services. The environment has inputs, , which can be affected by actuator devices, and outputs, , which can be measured with sensor devices. In CPS A, the outputs of the sensor devices are represented at the cyber level as discrete variables and the actuators are controlled by discrete variables , and similarly for CPS B. From the viewpoint of causality, influences and thus changes of elements of may influence the values of elements in both and , and vice versa.
Messages are generated by encoder functions on the form
which typically are implemented in the form of computer programs. Similarly, the internal states are updated by decoder functions
which are matched to the corresponding encoder functions. However, a decoder can in general not be combined with an encoder , and vice versa.
Although some technical details and challenges are hidden in this abstract model, the model enables us to define concepts and relationships that otherwise are ambiguous and described differently in the literature depending on the context. The task to model dynamic relationships between and in terms of and (or and etc) is the central problem of system identification . The task to model and control one CPS in terms of the relationships between , , and sometimes also is more complex  and typically involves hybrid models with state-dependent dynamic descriptions. This is a central problem in automatic control and CPS engineering.
Symbol grounding  refers to the relations between a symbol defined by and the related discrete values of (similarly for ) and the property of the environment that the symbol represents. A grounding problem appears when a symbol defined in have an underfitted relationship to the referenced property of the environment represented via (similarly for ), such that symbols in and
cannot be conclusively compared for similarity although both systems are defined in the same environment. Therefore, symbol grounding is just as relevant for translator learning as it is for reliable inference in cognitive science and artificial intelligence.
Listing 1 presents two examples of SenML messages that are constructed to illustrate the character of a semantic translation problem, .
Both messages encode information about the temperature in one office at our university and thus represents related physical properties. A and B can for example refer to the heating and ventilation systems in the office, respectively, and thus the temperatures are not necessarily identical. The message from System A includes the service URI and the time, longitude and latitude of the temperature measurement with unit ‘K’ for Kelvin and numeric value . The message from System B includes the name of the temperature sensor, the unit “Cel” for Celsius, the value and the time of the temperature measurement.
A translator, , could in this scenario for example be used by an indoor climate and energy optimization service that is capable to interpret messages of the second kind in Listing 1, but not of the first kind. By using the translator this service could for example improve the quality of the indoor climate, further reduce the energy used, or be more fault resilient in case of a sensor fault. As outlined above, messages encoded by CPS A in Figure 1 can in general not be correctly interpreted by CPS B, and vice versa. How can a translator that solves this problem be automatically generated?
We approach this problem by defining a computable function, , that determines to what extent the system-of-systems (SoS) formed by CPS A, CPS B etc fulfils particular operational requirements and goals. For example, the function
could be formulated as a loss function in machine learning, or a utility function of a multi-agent system, and the translator learning task is to minimize the loss or maximize the expected utility. Some possible definitions of the functionare listed in Table I.
The key points are that engineering resources are focused on defining in terms of SoS goals and requirements, and that it is possible to optimize by defining and updating using machine learning methods on the form
and similarly for other choices of and in the case of expected utility maximization.
For example, in the office example introduced above, could be a causality type loss and
could be a recurrent neural network, which is trained until the ventilation system decodesso that the effects of varying on are correctly predicted across instrumented offices.
In general, the translator function should depend on symbols in and , and it can depend also on other information sources, like public datasets  and historical CPS data used to fit sub-symbolic relationships more accurately. In principle, the translator can be considered to perform three tasks:
Estimate the decoder, .
Map information from domain A to domain B.
Estimate the encoder, .
Like in the field of machine translation of natural language we can attempt to explicitly model these individual mappings, or we can model the overall mapping . We elaborate on machine learning tasks and methods that may be useful to address the translator learning task outlined above after briefly introducing some related work in the next section.
Iii Related work on interoperability solutions
To fully exploit the potential of the IoT and Industry4.0, engineering resources should to a larger extent be focused on high-level benefits of interoperability and system integration . Automated approaches to establish and maintain interoperability are needed to enable on-demand service composition and meet the demands for flexible production and high efficiency given the high complexity and diversity of automation systems driven by the rapid technological development . Architectures similar to the model presented here have been independently developed by Maló  who describe architectures that allow for maximum interoperability. Our model describes the specific task to translate between services and data formats, but can in general be considered as a special case of the architectures considered in that work. Concepts and methods developed for the semantic web  are widely used to integrate human- and machine-readable metadata to support the adapter engineering and system integration processes, such as ontologies, ontology alignment and ontology-based reasoning engines. The semantic web tags websites with ontological metadata, typically encoded in RDF or higher-level ontology languages like the Web Ontology Language (OWL). The Semantic Sensor Network (SSN) , an ontology specialized for describing sensors, is one example of a domain-specific ontology. The Open Semantic Framework (OSF)  combines many such specific ontologies into an extendable framework, fusing both general and domain specific knowledge. Ontologies form the core of semantic technologies, but not all ontologies can be combined and function together. Ontologies that are based on different standards and definitions can model related physical and cyber entities in different ways, thus leading to contradictions and under-determined relationships between symbols when different technologies are combined.
In addition to semantic interoperability, which focuses on supporting the engineering process with such standardized metadata models, methods and tools for automatic on-demand dynamic interoperability  and operational interoperability  are developed. Symbolic reasoners can be applied to create Web-like mashups in highly dynamic environments , but suffers from state-space explosion when physical states are included. This challenge is recognized also in the domain of symbolic artificial intelligence. Furthermore, automatic reasoning in terms of symbolic metadata is unreliable in complex and uncertain real-world environments because symbolic data does not include all necessary information about the context, environment and system (cf. comments on symbol grounding and underfitted symbol relationships in the former section). Therefore, ontology-based translation is extended with sub-symbolic mapping and reasoning mechanisms. A recent example in this direction is deep alignment of ontologies 
. Deep alignment enables discovery of sub-symbolic mappings between elements of ontologies by a data-driven optimization method, where textual descriptions are represented by word-vectors learned from an auxiliary data set, similar to techniques used in natural language processing.
The development of more potent interoperability methods and technologies are of central importance for modern SOA, like the aforementioned Arrowhead Framework. For example, ontology-based XML-message translation has been extended with semantic annotations , see also former work in . That translator can map elements, perform unit conversion, detect missing data and, in certain cases, find and add the missing data. Another example is the architecture for device management using autonomic computing , where a manager monitors and plans execution using ontologies and a reasoning engine.
Data lakes, like the The Big Data Europe platform222https://www.big-data-europe.eu/platform/, is another approach where heterogeneous data annotated with RDF metadata are combined to allow querying, machine learning and inference across different representational domains. The metadata model considered in this context is based on similar concepts, but the problem addressed is different compared to the problem of dynamic and operational interoperability of SOA services in CPS systems.
Iv Mapping to Natural Language Processing
Vector embedding of sub-symbolic relations is a powerful concept often applied in natural language processing (NLP). Vector embeddings of words, sentences and contexts enable mappings between words in terms of vector operations in an -dimensional space, where typically . Initially, relatively simple vector space models  were used and can represent some important word and document relations . Lately, neural network based approaches like Word2Vec 
have shown great performance, and thereby the use of simpler embeddings like one-hot vectors have mostly disappeared. Word2Vec maps the words (symbols) to a manifold in a vector space, thereby creating model with sub-symbolic representations. Recent advancements have been achieved using attention models to create embeddings that produce different mappings for the same word given different contexts . Sub-symbolic vector embeddings of this type has recently been used for ontology alignment purposes .
Work in the field of machine translation is another important source of examples and guidance. Translation based on traditional statistics have recently been outdated by neural machine translation (NMT) as the state of the art. This switch was exemplified by Google, who have been using NMT for their translation service since 2016. An upgrade to the translation system allowed them to translate between unseen language pairs , a process they call as zero-shot translation. The translation system in  uses recurrent neural networks with attention. More recent translation systems use pure attention models based on the transformer model  to achieve state-of-the-art results . All translation systems referenced above are based on word or sub-word input features. There are also examples of fully character-level convolutional approaches .
To achieve good results on NLP tasks, the training protocol is of key importance. A major recent advancement in NLP is the step to semi-supervised pre-training. With pre-training, a language model in the form of a neural network is created using a large dataset, which can subsequently be fine-tuned for other problems with little data and computational resources. One of the latest improvements in semi-supervised pre-training is BERT333http://jalammar.github.io/illustrated-bert/ , which can be downloaded in pre-trained form. It is an exciting open problem to adapt these concepts and recent technological advancements to the problem of sub-symbolic ontology alignment and more generally to the M2M translator learning task introduced in Section II.
V Mapping to Graph Neural Networks
Graph neural network (GNN) models are relatively new in the machine learning field. Conventional architectures like recurrent- and convolutional neural networks are based on the assumption that there is repetitive structure in the input. In contrast, GNNs cannot be based on that assumption because graph data is more irregular in nature, see and  for an overview of the field. Several concepts from the image recognition and NLP fields have been adapted to GNNs, like graph convolution , graph attention  and graph embeddings . The resulting methods have been successfully used for example to study molecule structures in chemistry and perform traffic route planning .
An interesting development in the field of semantic technologies is RDF2Vec , which is an extension of the Word2Vec model to graph embeddings. Much like word embedding, graph embedding is a powerful tool to represent graphs in a metric space, where for example graph clustering and similarity tasks can be addressed. The Relational Graph Convolutional Network (R-GCN)  is another interesting recent development for processing of RDF-graphs, which is based on the message-passing network architecture in . By allowing for different convolution operators for different kinds of edges the R-GCN represents RDF-data more effectively than if all edges are treated the same. The R-GCN is validated on entity classification and link prediction tasks.
GNNs are currently actively developed and offer interesting new possibilities to perform graph embeddings and data-driven ontology alignment and mappings between , , etc needed to address the M2M translator learning task outlined in Section II.
Vi Discussion: Translator Learning Strategies
Inspired by the recent developments in NLP it is tempting to adopt an encoder-decoder translation scheme, similar to that in  or  (see Figure 1(a)). These models are typically trained end-to-end (E2E) with pairs of known translations, using a message reconstruction loss on the form . This is feasible in NLP where large repositories of such translation pairs have been developed. The M2M translation case is challenging because the repertoire of input representations, “languages” and “dialects” is diverse and more quickly growing, and data sets contains information about production and operations that typically cannot be publicly shared for the collection of such large data sets. Thus, we will likely have access to less data and relatively few known translation pairs, since identifying and tagging these pairs is costly and time consuming, which is challenging for obtaining a scalable on-demand interoperability solution.
Accurate one-to-one word translation is not always possible in NLP. For example, a round-trip translation of the Swedish word “Lagom” to English with Google Translate results in “Moderate”, followed by “Måttlig”, which is semantically related albeit different (sub-symbolic representations are different in most Swedish native speakers). This is expected because there is no one-to-one mapping between that Swedish word and a word in the domain of English language. However, the meaning of the word “Lagom” can essentially be explained to an English native speaking person by a longer description, with one or a few follow-up questions needed to validate and further align the interpretation of the concept. Similarly, some messages in CPS A might require several messages to be accurately represented in CPS B, and vice versa. That is why we define the translators, , as an integrated part of the overall SOA of the SoS, such as the aforementioned Arrowhead Framework. This way it is possible, in principle, that the translator requests or provides additional information needed to proceed with a translation. For example, although the messages in Listing 1 refers to the same location, an external information source is needed to identify this relationship, for example as described in . It should be noted that NLP translations of this type are currently challenging to learn.
Instead of learning the translators in an E2E fashion, it is possible to use the messages communicated within each system as a starting point. Even if we cannot expect to have access to large data sets of prealigned A–B message pairs, we do expect high rates of internal messages in each CPS. Thus, we can optimize the embeddings of the messages in each domain separately and make use of the common environmental degrees of freedom to learn relationships between such embeddings. For example, vector space embeddings can be learned in the form of latent representations of autoencoders as illustrated in Figure 2, and in this context methods and concepts that are successfully used for NLP can be reused and further developed. Translations between the latent spaces of the CPS A and CPS B encoders can for example be learned by solving Equation4 using loss/utility functions of the type listed in Table I. Such an autoencoding scheme does not solve the problem of missing translation pairs. However, with sub-symbolic representations of symbols that are optimized with metadata and data from the physical domain, the problem to learn mappings between symbols is simplified and enables faster convergence and learning with less data. It also enables clustering and classification of messages, which is useful to improve training and testing protocols. The use of auxilliary goals have for example helped when solving NLP tasks .
A final remark in this discussion concerns the nature of the environment, which up to this point was considered to be reality, so that the mathematical relationships between and can be described in terms of physical models. The translator learning task introduced in Section II is not limited to natural environments because it only requires that the systems have related degrees of freedom in the environment. If the transformations from to and to are orthogonal there is little that can be learned using the approach proposed here. However, in systems of our primary interest correlations between some and are expected and causal relationships are expected between some and , and vice versa. This is the case also for interconnected simulation models like digital twins and at higher levels and across levels of the (ISA-95) automation pyramid because most systems and services do not function independently of the others.
Vii Concluding remarks
Industrial IoT and Industry 4.0 require adaptable solutions to deal with the high heterogeneity of systems and data. In this paper, we have presented a mathemathical interoperability model in which we can describe data-, dynamic- and operational interoperability as machine learning tasks. Unlike previous works, which focus on interoperability as an engineering task, our model allows engineers to define operational goals which can be used for automatic optimization of translators. The model is flexible and can be used with a variety of machine learning tools and methods.
Using the model, we propose learning strategies based on advances in natural language processing and graph neural networks, allowing for grounded translators. Symbol grounding is achieved using sub-symbolic representations learned in a shared environment. In this paper we have mostly assumed that the shared environment is physical, but in principle the shared environment could be any environment suitable for fitting sub-symbolic relationships, for example simulations involving digital twins. Using digital twins, translators can be trained and tested virtually, potentially reducing the time-to-deployment and probability of errors.
While engineered adapters based on ontology alignment and proof engines are explainable, and eventual problems that occur at runtime can be analyzed and solved by the engineers, translators generated with machine learning methods can be more challenging to comprehend. This is something industry often find undesirable. Data availability is also an issue since there are no large public data sets of semantically dissimilar M2M-type messages available as far as we know.
In future work we aim to address these issues and provide proof of concept of the model and translator learning task using a simulated environment.
We thank Magnus Sahlgren for helpful advice in the area of natural language processing and Sergio Martin del Campo for helpful comments on an early version of the manuscript.
- Lasi et al.  H. Lasi, P. Fettke, H.-G. Kemper, T. Feld, and M. Hoffmann, “Industry 4.0,” Business & Information Systems Engineering, vol. 6, no. 4, pp. 239–242, Aug 2014.
- Hankel and Rexroth  M. Hankel and B. Rexroth, “The reference architectural model industrie 4.0 (RAMI 4.0),” ZVEI, vol. 2, p. 2, 2015.
- Borgia  E. Borgia, “The internet of things vision: Key features, applications and open issues,” Computer Communications, vol. 54, pp. 1–31, 2014.
- IERC, AC  IERC, AC, “Iot semantic interoperability: research challenges, best practices, recommendations and next steps,” European Commission Information Society and Media, Tech. Rep, vol. 8, 2013.
- Wang et al.  S. Wang, J. Wan, D. Zhang, D. Li, and C. Zhang, “Towards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordination,” Computer Networks, vol. 101, pp. 158 – 168, 2016, industrial Technologies and Applications for the Internet of Things.
- Nilsson and Sandin  J. Nilsson and F. Sandin, “Semantic interoperability in industry 4.0: Survey of recent developments and outlook,” in 2018 IEEE 15th International Conference on Industrial Informatics (INDIN). IEEE, 2018, pp. 127–132.
- Gürdür and Asplund  D. Gürdür and F. Asplund, “A systematic review to merge discourses: Interoperability, integration and cyber-physical systems,” Journal of Industrial information integration, vol. 9, pp. 14–23, 2018.
- Derhamy et al.  H. Derhamy, J. Eliasson, and J. Delsing, “Iot interoperability—on-demand and low latency transparent multiprotocol translator,” IEEE Internet of Things Journal, vol. 4, no. 5, pp. 1754–1763, 2017.
- Leitner and Mahnke  S.-H. Leitner and W. Mahnke, “OPC UA–service-oriented architecture for industrial applications,” in Gemeinsamer Workshop der Fachgruppen Objektorientierte Softwareentwicklung (OOSE), Software-Reengineering (SRE) und Software-Architekturen (SW-Arch), October 2006.
- arr  “Arrowhead - ahead of the future,” [Online], 2019, accessed: 2019-03-11. [Online]. Available: http://www.arrowhead.eu
- Licato and Zhang  J. Licato and Z. Zhang, “Evaluating representational systems in artificial intelligence,” Artificial Intelligence Review, Dec 2017.
- Maló  P. M. N. Maló, “Hub-and-spoke interoperability: an out of the skies approach for large-scale data interoperability,” Ph.D. dissertation, Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, 2013.
- Ljung  L. Ljung, “Perspectives on system identification,” Annual Reviews in Control, vol. 34, no. 1, pp. 1–12, 2010.
-  I. Graja, S. Kallel, N. Guermouche, S. Cheikhrouhou, and A. Hadj Kacem, “A comprehensive survey on modeling of cyber-physical systems,” Concurrency and Computation: Practice and Experience, p. e4850.
- Cubek et al.  R. Cubek, W. Ertel, and G. Palm, “A critical review on the symbol grounding problem as an issue of autonomous agents,” in KI 2015: Advances in Artificial Intelligence, 2015, pp. 256–263.
- Kolyvakis et al.  P. Kolyvakis, A. Kalousis, and D. Kiritsis, “Deepalignment: Unsupervised ontology matching with refined word vectors,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), vol. 1, 2018, pp. 787–798.
- Shadbolt et al.  N. Shadbolt, T. Berners-Lee, and W. Hall, “The semantic web revisited,” IEEE intelligent systems, vol. 21, no. 3, pp. 96–101, 2006.
- Compton et al.  M. Compton, P. Barnaghi, L. Bermudez, R. GarcíA-Castro, O. Corcho, S. Cox, J. Graybeal, M. Hauswirth, C. Henson, A. Herzog et al., “The ssn ontology of the w3c semantic sensor network incubator group,” Web semantics: science, services and agents on the World Wide Web, vol. 17, pp. 25–32, 2012.
- Mayer et al.  S. Mayer, J. Hodges, D. Yu, M. Kritzler, and F. Michahelles, “An open semantic framework for the industrial internet of things,” IEEE Intelligent Systems, vol. 32, no. 1, pp. 96–101, 2017.
- Kovatsch et al.  M. Kovatsch, Y. N. Hassan, and S. Mayer, “Practical semantics for the internet of things: Physical states, device mashups, and open questions,” in Internet of Things (IOT), 2015 5th International Conference on the. IEEE, 2015, pp. 54–61.
- Moutinho et al.  F. Moutinho, L. Paiva, J. Köpke, and P. Maló, “Extended semantic annotations for generating translators in the arrowhead framework,” IEEE Transactions on Industrial Informatics, vol. 14, no. 6, pp. 2760–2769, 2018.
- Lam and Haugen  A. N. Lam and Ø. Haugen, “Supporting iot semantic interoperability with autonomic computing,” in 2018 IEEE Industrial Cyber-Physical Systems (ICPS). IEEE, 2018, pp. 761–767.
- Turney and Pantel  P. D. Turney and P. Pantel, “From frequency to meaning: Vector space models of semantics,” Journal of artificial intelligence research, vol. 37, pp. 141–188, 2010.
- Sahlgren  M. Sahlgren, “The word-space model,” Ph.D. dissertation, Ph. D. thesis, Stockholm University, 2006.
- Mikolov et al.  T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
- Vaswani et al.  A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
- Devlin et al.  J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- Wu et al.  Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation,” arXiv preprint arXiv:1609.08144, 2016.
- Johnson et al.  M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. Viégas, M. Wattenberg, G. Corrado et al., “Google’s multilingual neural machine translation system: Enabling zero-shot translation,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 339–351, 2017.
- Dehghani et al.  M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, and Ł. Kaiser, “Universal transformers,” arXiv preprint arXiv:1807.03819, 2018.
- Lee et al.  J. Lee, K. Cho, and T. Hofmann, “Fully character-level neural machine translation without explicit segmentation,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 365–378, 2017.
- Wu et al.  Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” arXiv preprint arXiv:1901.00596, 2019.
- Lee et al.  J. B. Lee, R. A. Rossi, S. Kim, N. K. Ahmed, and E. Koh, “Attention models in graphs: A survey,” arXiv preprint arXiv:1807.07984, 2018.
- Gilmer et al.  J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 1263–1272.
- Veličković et al.  P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.
- Ristoski and Paulheim  P. Ristoski and H. Paulheim, “Rdf2vec: Rdf graph embeddings for data mining,” in International Semantic Web Conference. Springer, 2016, pp. 498–514.
- Schlichtkrull et al.  M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling, “Modeling relational data with graph convolutional networks,” in European Semantic Web Conference. Springer, 2018, pp. 593–607.