T he use of neural networks in language processing applications is still an open problem in research, which needs to be addressed in more depth, especially on task-specific issues (which architecture correspondS to which tasks), more global optimizations and learning transfer techniqueS in order to be able to use networks already trained on new problems easily. One of these subjects is the application of neural language with routing processes. In Infolegale’s production processes, the use of data science methods to process the text of documents in natural language is a necessity. In fact, they automatically extract the exploitable data and therefore meet the needs of the COMPLIFY projects. The project is part of a global intensification of the fight against money laundering, corruption and fraud. The company wishes to accompany its clients in the mapping of their risks, including the knowledge of the client and the identification of the person(s) with the control of the company (beneficial owner). Scanning of legal announcements, prediction of events and extraction of named entities were put in place to answer the first part of the project. The goal is to be able to make the process of acquiring legal announcements faster and faster. The process is nowadays complicated and relies on 724 newspapers to be collected and about 1 200 000 advertisements per year to dematerialize, to codify, to inform in a database and to sirenize then.
2 II-Context of the project
The company in question is revolutionizing its acquisition process of daily articles and newspapers by applying state-of-the-art NLP models in order to automate the whole process. In order to achieve such goal, the mission was defined during my end of study project where I worked on setting up a set of models that will determine the events contained in the legal announcements according to the types of events standardized by the company using Neuro-linguistic programming techniques and optimize the process by applying a Content-Based approach in order to route the events into their corresponding sphere.
3.1 1 Understanding the target output
Once the legal ads have been extracted from the logs and PDF articles, the second process begins: identifying the events described by the ad. The objective of this part is to set up a set of models that will determine the events contained in the legal announcements according to the types of events standardized by Infolegale. And to accomplsh this, we will use NLP models with Content-Based Routing process in order to classify each of the 204 event that could be found in an ad.
3.2 2) Convolutional neural network
Convolutional neural networks or CNNs can be defined as a class of artificial networks of deep-feed artificial neurons. This is an algorithm with layers stacked so that input information at the beginning can be refined carefully in all layers in a unidirectional order until the desired result is achieved.
CNN architectures uses the layers presented as follows:
Convolutional layers; used to preserve the spatial orientation of entities in an image.
Grouping layers; used to downsample an image (reduce it).
Fully connected layers; to stack the convolution / convolution layer output on a single-column matrix and compress it continuously to obtain a smaller but reasonable output.
Soft-max output; to present the output generated by the fully connected layers.
In the next section, we define a convolutional neuron network subtype which is the residual convolutional neural network.
3.3 3) Residual Convolutional neural network
A residual convolutional neural network  is an artificial neural network (ANN) of a type that relies on known constructs from pyramidal cells of the cerebral cortex. To do this, residual neural networks use skipped connections or shortcuts to jump over certain layers. 
Figure 1 simplifies the step of passing an L-1 layer of L-2 activation. These typical models are implemented with double or triple settees containing non linearity and batch normalization between the two.
Figure 2 describes the structure of a network of residual neurons. The main idea behind this network is the residual block. The network allows the development of extremely deep neural networks, which can contain 100 or more layers.
The equations that describe this structure are:
Residual network :
Where is the activations or outputs of neurons in layer ,
the activation function for layerand the weight matrix for neurons for the layer .
Absent an explicit matrix (aka ResNets), forward propagation through the activation function simplifies to the equation (10).
With this extra connection, gradients can travel more easily backwards.  It becomes a flexible block that can increase the capacity of the network or simply turn into an identity function that does not affect the formation.
3.4 4) Content-Based Routing
The end of the task is to set up a system for order processing. We first confirm the order when an incoming order is issued and then check that the ordered item is available in the warehouse. The stock system performs this task.This processing stage series is a perfect candidate for the design of Pipes and Filters. We create two filters, one for the validation phase and one for the inventory system, and route through both filters the incoming messages. Nevertheless, there are more than one distribution system in many business integration scenarios where each system can manage only specific items.
The Content-Based Router checks the content of the message and routes the message to another channel based on the message’s information. The routing can be based on a number of factors, such as field presence, specific field values, etc. Great care must be taken when installing a content-based router to make it easy to control the routing function as the router may become a point of regular maintenance.The Content-Based Router can take the form of a configurable rules engine that determines the destination channel based on a set of configurable rules in more complex integration scenarios.
3.5 5) Input Data Analysis
The data that will be used throughout our study will be a set of random legal ads taken from the local database of the company. The data is consisted of raw french text with infinity word possibilities and no clear pattern of string format. The ad contains one or more event type which are needed to be detected. At Infolegale, there are 161 types of events in 7 family types. The volume and variety of legal notice events has prompted us to put in place a decision tree through which an ad will have to go to determine the events in question. In order to implement the decision tree, we will be using a the content-based routing process as described in a previous paragraph.
During this module, we faced many difficulties, namely:
Database problems: the existence of duplicates of ads with a different identifier, the pollution of results with input errors…
The analysis of false positives: this step asked us the expertise of the quality team.Not available, it was difficult to find a niche to perform the analysis.
Grouping events: this step had us took a long time to set up the decision tree. Indeed, grouping ads is not easy and requires a lot of business expertise and performance testing for each attempt.
3.6 6) Multi-label Residual Convolutional Neural Network text classifier using Content-Based Routing implementation process
3.6.1 Events grouping
In order to set up our decision tree, the different event types contained in the ads have been analyzed in order to regroup them by the context meaning. Even though the data presented a huge variety in the content, a primary decision tree was put in place. Each of the 7 event families regrouped one or more sub-levels of groups and sub-groups and that is to assure the best model classification process and to predict the corresponding type of event with the highest accuracy, precision and recall possible. The tree is consisted of 43 node with 5 levels of precision and 161 events grouped by their meaning and text content.  The tree was then implemented using nested dictionaries and sub-classes with Python. Each node is consisted of a group name, a cutoff, a group description and most importantly, the children. In a matter of fact, the group children will be responsible for redirecting the original ad to the corresponding event. Due to the critical aspect of this point of the project, a high cutoff value is selected to each event (more than or equal to 0.999).
3.6.2 Model Building
In order to optimize our Multi-label Residual Convolutional Neural Network, we will be using the Stochastic gradient descent (SGD) in ordrer to minimize the sum of squared residuals. The algorithm is very useful in cases where the optimal points cannot be found by equating the slope of the function to 0.
One other point, the data set that we will use contains different sizes of events and subgroups, which means that our deep learning neural network model is more likely to quickly over-fit a training data set with few examples, which results in poor performance when the model is evaluated on new data. To overcome this problem, the dropout has the effect of making the schooling system noisy, forcing nodes within a layer to probabilistically take on greater or less responsibility for the inputs. This conceptualization suggests that perhaps dropout breaks-up conditions where community layers co-adapt to correct mistakes from prior layers, in turn making the model extra robust. 
Dropout simulates a sparse activation from a given layer, which interestingly, in turn, encourages the community to absolutely research a sparse representation as a side-effect. As such, it is able to be used as an opportunity to pastime regularization for encouraging sparse representations in auto encoder models. The hyper-parameter is introduced which specifies the probability at which outputs of the layer are dropped out, or inversely, the probability at which outputs of the layer are retained. We will give the probability of 0.35 to our dropout value for retaining the output of each node in a hidden layer.
3.6.3 Model Training
A set of 2.3 Million line of ads has been udes to train the model, regrouping most of the ad types. the groups and subgroups varies in size, ranging from 30 to 400 000. The presented model is designed to be effective in this particular case. It will shuffle our data set, divide it on multiple pre-sized batches and iterates over the whole data sets. The size configuration of the batches are taken from 512 to 1024 using a 1.001 compounding factor. once the model is trained over all batches, it will retrain itself and will iterate the same process 100 times in order for the losses to converge to the minimum. The values are selected in a way to optimize the usage of our GPU machine with a maximum of efficiency.
3.7 7) Experimental Results and Performance Analysis
After the first model training and implementation, the results was quite impressive and high for the 3 metrics (accuracy, precision and recall). The first results per family are as it follows :
As we can see, the results are quite promising and further implementations are made for the rest of the events. What follows the results for the subfamilies of the common procedures, which are the most sensitive legal ads to be decoded :
The results of the model are detailed in what follows. We have randomly selected 60 of the events in question due to data protection legislatives.
3.8 8) Discussion
We propose a multi-label Residual Convolutional Neural Network text classifier using Content-Based Routing process which train the model using content based orienting method described above. The benefits of our model is mainly in the optimization of the number of models needed to be trained in a later moment. with this method, we have succeeded in developing a highly accurate multi-class classification model which will classify all of the 161 different events based on the text. One of the greatest difficulties is how to handle the variety and the veracity of the data which was available with different sizes. In order to overcome this problem, we have developed a bigger dataset containing as much types of events possible.
During this module, we developed a multi-label Residual Convolutional Neural Network text classifier using Content-Based Routing processs to predict the final event code for each ad. We developed a decision tree to predict the code with the least possible false positives and the most accurate result possible. The tree was implemented using a content-based routing process, which allowed us to validate the model.
 Graves, Alex; Mohamed, Abdel-rahman, Hinton, Geoffrey. Speech Recognition with Deep Recurrent Neural Networks, 2013.
 He, Kaiming; Zhang, Xiangyu, Ren, Shaoqing, Sun, Jian.Deep Residual Learning for Image Recognition, December 10th, 2015.
 Khan, Salman; Rahmani, Hossein, Ali Shah, Syed Afaq. Bennamoun, Mohammed. A Guide to Convolutional Neural Networks for Computer Vision
A Guide to Convolutional Neural Networks for Computer Vision. Morgan and Claypool publishers,2018.
 Lafferty, John; McCallum, Andrew, Pereira, Fernando. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In InternationalConference on Machine Learning
InternationalConference on Machine Learning, 2001.
 Nielsen, Michael, Using neural nets to recognize handwritten digits, Neural Networks and Deep Learning, June 2019.
 Hohpe, Gregor, Woolf, Bobby. Enterprise Integration Patterns , October 10th, 2003, Princeton University.
 Serra, Joan; Karatzoglou, Alexandros. Getting Deep Recommends Fit: Bloom Embeddings for Sparse Binary Input/Output Networks, June 13th, 2017.
 Sherstinsky, Alex. From RNN to Vanilla LSTM Network. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network, chapter 5, page 17, Directly Software Incorporated, August 9th, 2018.