Deep learning is a machine learning discipline that focuses on the specification and training of deep neural networks. Deep learning is currently the driving force behind a multitude of AI applications, such as speech recognition, computer vision, robotics and others[1, 2], with applications as diverse as the classification of exoplanets , inverse atmospheric dispersion , weather forecasting  and information retrieval .
Data scientists, engineers and machine learning practitioners often tackle diverse problems via the use of increasingly complex configurations of deep artificial neural networks (ANNs). Such complex configurations comprise series of co-trained networks, where each network’s objective directly or indirectly affects the objectives of its sibling networks. This interplay contributes to the overall objective of the deep learning configuration. Examples of such multi-network configurations are the generative adversarial network 
, the adversarial autoencoder, and others.
Despite their effectiveness, working with deep neural networks has a number of shortcomings. Arguably the most important one is the difficulty for users to understand and argue about their internal operation, hence why they often describe them as “black boxes”. Another disadvantage is that they can be difficult to train effectively, often requiring multiple trial-and-error iterations. Topological choices, such as depth and connectivity, as well as the choice of hyperparameters, are also often the result of trial-and-error. These difficulties become greater as the complexity of neural network configurations increases.
In this paper we present ANNETT-O, an ontology which can capture and link many of the topological, training and evaluation characteristics of existing and in-development ANN configurations, in order to create knowledge bases that could drive the design of deep learning solutions. As complexity increases, ubiquitous deep learning can only be maintained if the knowledge and intuition gained can be harvested and encoded in useful ways. The Semantic Web , and the ontology specification language OWL111https://www.w3.org/OWL/ can enable the creation of abstractions and tools towards this direction. The expressiveness and extensibility of OWL and the distributed nature of the Semantic Web are important prerequisites for creating useful knowledge bases and resources for users.
This paper provides a generic-enough, usable and computer-actionable topology for researchers and practitioners to describe their deep learning configurations, training procedures and experiments. The proposed ontology connects three aspects of deep learning design:
The connectivity of complex, multi-network, configurations;
The algorithms used for training complex ANN configurations;
Quantitative performance information.
In the next section we present relevant work. In Section 3 we present the main classes of the ontology for describing ANN configuration evaluation, topology and training. In Section 4 we present three exemplary use-cases of increasing complexity, while in Section 5 we provide sample queries in SPARQL. In Section 6 we provide pointers for future work.
2 Related Work
To the best of our knowledge, ANNETT-O is the first ontology suitable to describe complex, multi-network neural configurations with an emphasis on studying, understanding and improving future algorithms. The closest ontology to ANNETT-O is the Artificial Neural Network Ontology – ANNO222https://tw.rpi.edu/web/Courses/Ontologies/2016/projects/ArtificialNeuralNetworkOntology
, whose primary purpose, according to its authors, is to recommend weight initialization for Keras333https://keras.io/, viewed March 2018 neural network models. Despite its seemingly narrow scope, ANNO includes many generally useful concepts, such as classes for algorithms and functions. ANNETT-O follows a similar class arrangement for the common concepts, however it has an entirely different focus.
Another relevant resource is the Predictive Model Markup Language – PMML . PMML is a detailed and mature XML schema for describing predictive models, including neural networks. PMML is in use in many applications, such as the popular KNIME data analytics platform444https://www.knime.com/, viewed March 2018 [11, 12], in manufacturing , and elsewhere. One of the core objectives of PMML is to allow for cross-platform execution of pre-trained predictive models. This requires a high level of detail regarding the numerical aspect of the models (e.g. the learned weights in an ANN) while it focuses less on the methodology applied to training them. In addition, PMML being an XML schema makes it less flexible and extensible than it would be required in use-cases addressed by ANNETT-O. However, aspects of PMML could potentially complement ANNETT-O in certain application contexts, but this is left as future work.
3 The Ontology
Figure 1 shows the main classes and interactions of the ontology. ANNETT-O defines 160 classes, 50 object properties and 32 data properties. The ontology is assigned the permanent identifier http://w3id.org/annett-o/, also acting as a URL redirecting to the resource’s current location: https://github.com/iaklampanos/annett-o. ANNETT-O is licensed under a Creative Commons Attribution 4.0 International License555https://creativecommons.org/licenses/by/4.0/.
ANNETT-O is designed with multi-network, multi-objective configurations in mind. Such configurations are expressed through the instantiation of the ANNConfiguration class, which relates neural network individuals as part of an overall configuration. In the simplest case of describing a single-objective neural network, an ANNConfiguration individual is associated with a single Network individual. Each neural network can be described in terms of its layers, modeled via the Layer
class or one of its subclasses. Layer subclasses allow for the description of different types of ANNs, such as feedforward and recurrent configurations. ANNETT-O does not model networks down to the node (or neuron) level since it is common practice that each layer has a uniform behavior (activation or other transformation) implemented by its constituent nodes.
Multi-network configurations involve multiple objectives and complex training methodologies. Often constituent networks get trained one after another per batch iteration666ANNs are typically trained iteratively in data batches, which are small subsets of the complete dataset., while in other cases one or more networks might get pre-trained followed by a more complete “fine-tuning” procedure. In more involved training procedures there may be multiple training iterations of a subset of networks before training another subset, within the same batch iteration. The link between a training sequence and the choice of layers and their connectivity is often unclear. ANNETT-O enables researchers to study and extract insights from both, therefore discovering best practices and improving their algorithms.
TrainingStrategy is the main training class. A training strategy is composed of a series of TrainingSessions, with each session interpreted as a complete training procedure over an entire training dataset. Each training session defines at least one TrainingStep, which denotes the training within batch iterations. Training steps may also form a sequence with each step denoting either the training of a single network, a pass through a network creating a new dataset to be used in a subsequent training session, or a loop of steps.
The choices for topology and for training strategies affect the performance of an ANN configuration. ANNETT-O supports describing network evaluation results via its NetworkEvaluation class, which is connected to both topological choices and to training strategies.
ANNETT-O allows for the description and reasoning of topological, training and evaluation characteristics of complex, multi-ANN configurations. In the remainder of this section we introduce the main classes involved.
In ANNETT-O the topology of an ANN configuration is described via the description of its constituent neural networks, their corresponding layers and the way they connect, i.e. their activation paths.
ANNConfiguration A neural network configuration potentially comprises multiple networks. A neural network configuration has a well-defined purpose, while the networks constituting the array may be disjoined with one another. An example of such an ANN configuration may require a set of networks to be trained separately from another set, while collectively having a single purpose or a common, in the loose sense, objective.
Network Individuals of the Network class describe neural networks. Each network must be associated with at least one ANN configuration. Networks are described in terms of connected layers. A network may share certain layers with other networks. In ANNETT-O this is implemented via the object property sameLayerAs, discussed below. Each Network individual can only have a single objective, described by a single objective function.
Layer The class Layer and its subclasses describe various types of layer that may be present in a neural network. Immediate subclasses include HiddenLayer and InOutLayer. HiddenLayer has a number of subclasses, namely ActivationLayer, AggregationLayer, SeparationLayer and ModificationLayer. ActivationLayer
individuals describe layers with trainable weights carrying activation functions. Activation layers are associated to activation functions via the object propertyhasActivationFunction. Modification layers modify their inputs in a static, non-trainable way, e.g. by introducing a fixed amount of noise. The rest are layers with special roles in the network topology, e.g. a separation layer may denote the cloning of the previous layer’s outputs into multiple following layers.
Connecting layers: Each network is associated with a number of layers via its object property hasLayer. The connectivity of layers within networks is described via the object properties nextLayer and its symmetric property previousLayer. Each Layer individual can connect to at most one layer following or coming before it, with the following exceptions: InputLayer may not have layers connecting into it; OutputLayer may not have layers connecting from it; SeparationLayer may have more than one layers coming after it; and AggregationLayer may have more than one layers connecting into it.
In ANNETT-O each Layer individual can be associated strictly with a single Network individual. This restriction allows the ontology to enforce the cardinality rules regarding nextLayer and previousLayer described above without increasing complexity – if Layer individuals belonged to more than one networks, nextLayer would need to represent a three-party relation between one Network and two Layer individuals. Even though there are patterns to describe n-ary relations777https://www.w3.org/TR/swbp-n-aryRelations/, viewed March 2018, it was decided that this would increase the complexity of the ontology without users gaining on flexibility or descriptiveness. ANNETT-O describes the presence of common layers in different networks by introducing the sameLayerAs object property between two Layer individuals.
TrainingStrategy A TrainingStrategy individual describes the steps taken to train a complex ANN configuration. This may involve one or more sequential TrainingSessions pointed at via the hasTrainingSession object property.
TrainingSession Each TrainingSession individual represents a complete training session, i.e. a complete training process over a dataset. The dataset is expected to be used in batches, per standard practice. For instance, if the training of an ANN configuration does not depend on prior training of one of its constituent networks then a single TrainingSession suffices. On the other hand, the case of a network needing to be pre-trained on a dataset before it can be fine-tuned, would require the use of two TrainingSession individuals. A chain of training sessions can be formed via the nextTrainingSession object property, which may appear at most once per TrainingSession individual.
TrainingStep Each training session is composed of a sequence of TrainingSteps implemented by the property nextTrainingStep. The sequence of training steps is expected to be repeated for each batch of the training session dataset(s). Each training step is typically associated with a neural network, the sequence therefore allowing to model the simultaneous training of neural networks888Simultaneous training refers to the alternate training of two or more networks within the same batch iteration. of the ANN configuration – see Section 4 for examples.
During simultaneous training, sometimes a network is trained for a number of times, or until a condition on its performance has been satisfied, before other networks are trained. This is accomplished by the TrainingLoop class, which is a subclass of TrainingStep and can therefore be part of a training step sequence. A training loop itself contains a sequence of training steps performed for a number of repetitions or until a condition has been met. Using TrainingLoop individuals allows users to model training session such as “train network for 5 times and then train network , before proceeding to the next training iteration”.
In ANNETT-O evaluation results are modeled via the NetworkEvaluation class. A NetworkEvaluation individual associates Network, ANNConfiguration, TrainingStrategy, Metric and Dataset
individuals. The dataset represents a training dataset, while the metric is the evaluation metric used. The evaluation score is recorded via the data propertyeval_score. NetworkEvaluation may optionally record a timestamp for the evaluation via the data property eval_date. It follows that an ANN configuration may involve multiple evaluations involving different neural networks, using different metrics and datasets, for various training strategies.
3.4 Auxiliary Classes
Function The Function class and its subclasses describe reusable mathematical functions and are associated with a number of the ANNETT-O concepts, e.g. activation layers, introduced below, are associated with ActivationFunctions via the hasActivationFunction property. Function individuals may provide the function’s mathematical form in some pre-agreed notation (e.g., in LaTeX) using the function_math data property.
Dataset Individuals of this class describe a homogeneous dataset used for training and evaluation purposes. A dataset may denote a well-known resource, such as MNIST999http://yann.lecun.com/exdb/mnist/, viewed March 2018, a private or application-specific dataset, or a transient dataset. For instance, transient or temporary datasets may be created to aid during the training of an ANN configuration. A Labelset (a subclass of Dataset) may be used to denote a set of labels useful in classification tasks.
DatasetPipe Individuals of this class associate InOutLayer to Dataset individuals. This allows for the description of multiple connections to and from datasets in a number of cases, such as in the definition of an evaluation strategy.
TrainedModel Individuals of this class describe a trained model, namely the result of training taking place over a complete neural network configuration. This information may be useful for studying distributions of weights or for replicating ANN configurations in different contexts. Especially for the latter, PMML  descriptions may also be linked via subclassing.
TaskCharacterization Individuals of this class and its subclasses may be used to characterize the purpose of a neural Network. For instance, TaskCharacterization subclasses include Clustering, Classification, Generation, etc.
DataCharacterization Individuals of this class and its subclasses can be used to characterize a Dataset. For instance, TaskCharacterization subclasses include NumericanDigits, Flora, People, etc.
4 Example Use-Cases
In this section we present three exemplary ANNETT-O use-cases of increasing complexity: a simple ANN used in classification tasks [14, Ch. 6], a generative adversarial network (GAN) , and an adversarial autoencoder (AAE) . These examples are distributed along with the ontology.
4.1 Simple classification network
The simple classification network consists of one input layer, three stacked hidden layers (two of which carry rectified linear (ReLU) activations and one activated by SoftMax) and one output layer, as shown in Figure(a)a. The layers are characterized by using the subclasses of Layer: InputLayer, FullyConnectedLayer and OutputLayer. Activations are specified by the hasActivationFunction property. This ANN configuration is described by a single Network, with data flowing from the input to the output layer. The use of input training or evaluation data is described by associating corresponding DatasetPipes with the InputLayer.
This use-case represents a single-objective, single-network configuration. Training it therefore involves calculating a single loss based on the network’s output and updating all the weights so that this loss is minimized. To is described by defining a TrainingStrategy with a single TrainingSession. The TrainingSession performs a single TrainingStep in every iteration. A TrainingOptimizer is associated with the TrainingStep individual. The network’s ObjectiveFunction is described in terms of a CostFunction and is linked to the Network individual.
Classification effectiveness is measured in terms of the accuracy of predicted labels compared to the ground truth. We define a DatasetPipe individual to join the InputLayer with an evaluation dataset. An Accuracy:Metric:Function individual describes the evaluation metric. This metric and the evaluation dataset are associated with the configuration’s NetworkEvaluation.
4.2 Generative Adversarial Network (GAN)
GANs are multi-network configurations (Figure (b)b).
GANs (Figure (b)b) consist of two networks, the Generator and the Discriminator. Since in ANNETT-O every network must have a single objective, a GAN is described in terms of three Network individuals. These are named after the task that they perform i.e. Generator, Discriminator and GAN. The Generator, Discriminator and GAN networks have task characterizations of Generation, Discrimination and Adversarial respectively. The GAN network describes the Adversarial process between the Generator and Discriminator and is composed of layers that are shared with the other two networks. Using the property sameLayerAs, we are able to express layer sharing between these networks. The individuals used to describe the overall topology are depicted in Figure 3.
GANs have multiple objectives, and ANNETT-O associates each objective with a Network individual. As the actual objective function is the same, it is associated with every network but with different Labelset individuals. While the Generator and the Discriminator are trained alternately, we describe the case where the Discriminator gets updated 5 times per 1 Generator update – such strategies often improve an adversarial network’s learning efficiency. This procedure is described by a TrainingLoop along with appropriate TrainingSingleForwardOnly and TrainingSingle individuals (Algorithm 1). Figure (a)a shows the ANNETT-O equivalent.
GANs are evaluated in terms of generated sample quality, given random noise as input. In ANNETT-O we create a log-likelihood Parzen window estimateMetric associated with a NetworkEvaluation. GAN is evaluated on the Generator network output. Optionally, we can describe the training outcome as a TrainedModel individual for storing purposes or further evaluation.
4.3 Adversarial Autoencoder (AAE)
Adversarial Autoencoders are multi-network configurations (Figure (c)c). This example describes the clustering variant.
Clustering in AAEs is a result of an adversarial process over the output of the network’s encoder. ANNETT-O can express this multi-network configuration using 7 Network individuals, each of which performing a different task (Figure 4). The encoder part of the Autoencoder is split into two branches (Style and Label) with these branches clustering the data sample and generating synthetic samples respectively. The splitting of the dataflow in the Encoder is expressed by a SeparationLayer instance. Each of these two branches is also part of a Generator, as in an GAN. We describe the Discriminators using four Network individuals, plus an additional one for the generative adversarial network as a whole. The decoder, which performs input reconstruction, concatenates each branch using an AggregationLayer.
The training process involves multiple objectives that simultaneously update the shared layers. In AAEs the order in which we update the layers affects performance. We describe the training procedure via a sequence of TrainingStep individuals. Since there are multiple shared layers in each network the property updatesLayer is used to define which layers are updated in every TrainingStep. Using DatasetPipes along with TrainingSteps and the updatesLayer property we can follow the dataflow in each individual network and observe the training process of the ANNConfiguration as a whole. The training process is described in Algorithm 2, while Figure (b)b shows the ANNETT-O equivalent.
Adversarial Autoencoders can be evaluated by measuring the classification accuracy on the predicted cluster membership. This can be modeled by creating an Accuracy metric and assigning it to a NetworkEvaluation. Similar to the GAN example, performance is evaluated based on the output of a single network. The property basedOnTrainingStrategy is used to associate a NetworkEvaluation to a TrainingStrategy individual.
5 Query Examples
The representation of deep learning configurations using ANNETT-O constructs allows for the retrieval of ANN configurations or their constituents using various criteria. This allows for configurations or networks to be discovered based on their topological characteristics, their evaluation outcomes, or even procedural characteristics, e.g. on features of the training process of a network. Such use-cases are highlighted by the following query examples.
The following query retrieves training strategies based on their evaluation outcomes. Specifically, it seeks training strategies yielding evaluation scores greater than 0.7 in the classification task. Over the knowledge base accompanying the paper, the query retrieves the simple_classification_Strategy configuration: [fontsize=] select ?configuration ?evaluation_score where ?configuration a :ANNConfiguration. ?configuration :hasTrainingStrategy ?tstrategy; :hasNetwork ?n. ?n :hasTaskType ?type. ?type a :Classification. ?evaluation a :NetworkEvaluation; :evaluatesNetwork ?n; :eval_score ?evaluation_score. select ?tstrategy (count(?step) as ?steps) where ?tstrategy :hasTrainingSession ?tsession. ?tsession :hasTrainingStep ?step GROUP BY ?tstrategy HAVING (?steps ¿ 2)
The following two queries search for ANN configurations having specific topological characteristics. The next query retrieves the configurations comprising at least one network with a minimum of four hidden layers, of which at least one is a concatenation layer. In our example, it retrieves the AAE configuration. [fontsize=] select distinct ?c where ?c a :ANNConfiguration; :hasNetwork ?n. select ?n (count(?hl) as ?layers) where ?n :hasLayer ?l; :hasLayer ?hl. ?hl a :HiddenLayer. ?l a :ConcatLayer. GROUP BY ?n HAVING (?layers ¿ 3)
The following query retrieves configurations that include a network with at least one separation layer, whose branches lead to ReLU layers immediately before they merge again via concatenation. Executing the query over the demonstrative knowledge base returns the AAE_AE network. [fontsize=] select distinct ?n where ?n a :Network; :hasLayer ?l. ?l a :SeparationLayer.
?l :nextLayer ?left; :nextLayer ?right.
FILTER (?left != ?right)
?left :nextLayer+ ?c. ?right :nextLayer+ ?c.
?c a :ConcatLayer. ?c :previousLayer ?cpl. ?c :previousLayer ?cpr.
?cpl :hasActivationFunction ?fcpl. ?fcpl a :Relu.
?cpr :hasActivationFunction ?fcpr. ?fcpr a :Relu.
Last, the following query retrieves configurations comprising a clustering network that achieved an evaluation score greater than 0.5, with at least one training strategy involving at least one training session with more that two training steps. In our example, it retrieves the AAE configuration, which is associated with a training strategy satisfying the aforementioned constraints and an evaluation score equal to 0.68.
[fontsize=] select ?configuration ?evaluation_score where ?configuration a :ANNConfiguration. ?configuration :hasTrainingStrategy ?tstrategy; :hasNetwork ?n. ?n :hasTaskType ?type. ?type a :Clustering. ?evaluation a :NetworkEvaluation; :evaluatesNetwork ?n; :eval_score ?evaluation_score. select ?tstrategy (count(?step) as ?steps) where ?tstrategy :hasTrainingSession ?tsession. ?tsession :hasTrainingStep ?step GROUP BY ?tstrategy HAVING (?steps ¿ 2)
6 Conclusions and Future Work
Increasingly complex deep learning configurations are being researched and used in a multitude of applications. These configurations require involved training procedures, while their effectiveness typically varies, depending on the task and the data size and type. This paper presented ANNETT-O, an OWL ontology able to encode topological, training and evaluation characteristics of complex ANN configurations. Its purpose is to drive the development of knowledge bases capturing current and best practices of deep learning in order to enable researchers and practitioners understand existing systems and make better informed decisions when designing new ones. We have shown that ANNETT-O is expressive enough to capture the topological and training characteristics of ANN configuration of high complexity using a small number of meaningful classes and properties. It also allows for the specification of sophisticated queries bridging different aspects of deep learning design. To the best of our knowledge this is the first ontology able to describe such a wide variety of ANN configurations.
We believe that efforts that contribute to the gathering and systematization of useful knowledge for R&D in AI and data science can have tangible impact in these areas. However, for ANNETT-O to reach its uptake and impact goals, software tools must be created to automate knowledge extraction from source implementations. Furthermore, aside from providing SPARQL endpoints, repositories and higher-level query services would be needed for users to make the most of the existing knowledge bases with minimal further training and therefore disruption to their work – designing new deep learning algorithms. Last, even though deep learning is often and increasingly used on its own, it is also used in conjunction with other machine learning algorithms. On the front of modeling AI semantics, ANNETT-O could be assimilated into future ontologies catering for the wider area of AI and machine learning. We anticipate that in the near future we will prototype accompanying tools and web services to allow users make the most of ANNETT-O.
-  Bengio, Y.: Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning 2(1) (2009) 1–127
-  Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553) (May 2015) 436–444
-  Shallue, C.J., Vanderburg, A.: Identifying Exoplanets with Deep Learning: A Five Planet Resonant Chain around Kepler-80 and an Eighth Planet around Kepler-90. arXiv:1712.05044 [astro-ph.EP] (December 2017)
-  Klampanos, I.A., Davvetas, A., Andronopoulos, S., Pappas, C., Ikonomopoulos, A., Karkaletsis, V.: Autoencoder-Driven Weather Clustering for Source Estimation during Nuclear Events. Environmental Modelling & Software 102 (April 2018) 84–93
-  Ghaderi, A., Sanandaji, B.M., Ghaderi, F.: Deep Forecast: Deep Learning-based Spatio-Temporal Forecasting. In: Time Series Workshop, ICML’17. (2017)
-  Mitra, B., Craswell, N.: Neural Models for Information Retrieval. arXiv:1705.01509 [cs.IR] (May 2017)
-  Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Nets. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., eds.: Advances in Neural Information Processing Systems 27. Curran Associates, Inc. (2014) 2672–2680
-  Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial Autoencoders. arXiv:1511.05644 [cs.LG] (November 2015)
-  Berners-Lee, T., Hendler, J., Lassila, O., Others: The Semantic Web. Scientific American 284 (2001) 34–43
-  Grossman, R., Bailey, S., Ramu, A., Malhi, B., Hallstrom, P., Pulleyn, I., Qin, X.: The management and mining of multiple predictive models using the predictive modeling markup language. Information and Software Technology 41(9) (1999) 589–595
-  Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: Knime: The konstanz information miner. In Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R., eds.: Data Analysis, Machine Learning and Applications, Berlin, Heidelberg, Springer Berlin Heidelberg (2008) 319–326
-  Morent, D., Stathatos, K., Lin, W.c., Berthold, M.R.: Comprehensive PMML preprocessing in KNIME. Proceedings of the 2011 workshop on Predictive markup language modeling - PMML ’11 (2011) 28–31
-  Lee, Y.T., Lee, J.Y.: PMML in manufacturing applications. In: Fall Simulation Interoperability Workshop, 2014 Fall SIW. (2014) 113–117
-  Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. (2016)