Detecting Logical Relation In Contract Clauses

by   Alexandre Yukio Ichida, et al.

Contracts underlie most modern commercial transactions defining define the duties and obligations of the related parties in an agreement. Ensuring such agreements are error free is crucial for modern society and their analysis of a contract requires understanding the logical relations between clauses and identifying potential contradictions. This analysis depends on error-prone human effort to understand each contract clause. In this work, we develop an approach to automate the extraction of logical relations between clauses in a contract. We address this problem as a Natural Language Inference task to detect the entailment type between two clauses in a contract. The resulting approach should help contract authors detecting potential logical conflicts between clauses.



There are no comments yet.


page 1

page 2

page 3

page 4


Reasoning about Polymorphic Manifest Contracts

Manifest contract calculi, which integrate cast-based dynamic contract c...

ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts

Reviewing contracts is a time-consuming procedure that incurs large expe...

Signature in Counterparts, a Formal Treatment

"Signature in counterparts" is a legal process that permits a contract b...

Machine Learning Guided Cross-Contract Fuzzing

Smart contract transactions are increasingly interleaved by cross-contra...

Contract Statements Knowledge Service for Chatbots

Towards conversational agents that are capable of handling more complex ...

Neural Contract Element Extraction Revisited

We investigate contract element extraction. We show that LSTM-based enco...

A Benchmark for Lease Contract Review

Extracting entities and other useful information from legal contracts is...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Understanding existing logical relations between sentences is a difficult task that requires an accurate understanding of natural language meaning. The ambiguity and variability of linguistic expression in natural language complicates the recognition of these relations such as entailment and contradiction contained in texts. The ability to classify these logical inferences among different text is a significant feature of an intelligent system 

[4]. Detecting these logical relations can help humans to interpret a more complex text, where entailment and contradiction are crucial aspects to fully understanding such as norms and contracts. Contracts are documents that contain normative sentences formalizing agreements among the related parties, which involve people and companies. The normative sentences describe the duties that the related parties are subject to and the penalties in case of rule violation. In a contract, the norms may be logically related, so that some are entailed, or contradict each other [6].

For instance, in a contract that contains the following norms “All companies must pay the Y tax” and “The company X must pay the Y tax”, it is not possible the first norm is satisfied while the second norm is not satisfied. In the case of company X not paying the tax Y, automatically violates both norms due to the conditions of compliance. Considering that both norms are logically linked and are in the same context, we have an entailment relation between them. By contrast, conflicts in a contract may emerge through problems related to a logical contradiction between norm clauses. Taking the example above, we have a contradiction relation if we change the second norm to “The company X must not pay the tax Y” due to their contradictory compliance condition. Analyzing these conflicts in a contract demands a careful analysis from both related parties. An automated way to detect a conflict between contract clauses addresses these reviews of contract clauses, which is a long and complex issue even for human experts.

Classifying the logical relation between norms is analogous to Natural Language Inference (NLI), which is the task of determining whether a natural language hypothesis h can be inferred from a natural language premise p [11]. In an entailment relation, if p is true then h

cannot be false, otherwise there is a contradiction. NLI is a broader task than conflict identification, and thus, good models to classify logical relations will naturally be applicable to detect contract conflicts. Importantly, since NLI has seen a surge in research, including new machine learning models and dataset curation 

[5, 18], it offers substantial labelled training data in much larger quantities than contract conflict datasets [3].

We develop an automated approach to detecting logical relations between norms in a contract as a natural language inference problem. First, we develop a neural network that addresses the Natural Language Inference task to classify the logical relation between two sentences written in natural language. Second, we apply the trained neural network on conflicting norm pairs reporting the logical relation that our model predicts concerning the conflict types. The resulting model can help identify potential inconsistencies between contract clauses by detecting logical relationships between natural language sentences.

Natural Language Inference

Automated reasoning and logical inference studied in natural logic are important topics of artificial intelligence. Natural language inference (NLI) is a widely-studied natural language processing task that is concerned with determining the inferential relation between a premise p and a hypothesis h [5]. In NLI, the entailment relation inferred is formulated based on the following representations: two-way classification and three-way classification [11].

Two-way classification is the simplest representation of NLI, which describes the task as a binary decision. The objective of this NLI task is to classify whether the hypothesis follows the premise (entailment) or does not (non-entailment). Alternatively, in three-way classification form, the relations are divided into three categories: entailment, contradiction and neutral. Given a pair of premise-hypothesis p and h, the entailment relation occurs when h can be inferred from p [5]. When h entails the negation of p, the pair results in a contradiction. Otherwise, if none of these relations can be inferred, the relation of p and h is neutral.

In NLI, both p and h are sentences written in natural language. The challenge of this task differs of formal deduction from logic due to its focus in informal reasoning [11]. The emphasis of the NLI is on aspects of natural language such as lexical semantic knowledge and the deal with the variability of linguistic expression. Consider the following premise p and hypothesis h as an instance of an NLI scenario [11]:

  • p: Several airlines polled saw costs grow more than expected, even after adjusting for inflation.

  • h: Some of the companies in the poll reported cost increases.

This example is considered a valid entailment inference in the NLI context because any person that interprets p would likely accept that h implies the information of p. Although this is a valid NLI classification, h is not a strict logical consequence of p due to the fact that p informs that airline companies saw the growth of the cost, not necessarily reporting the growth of the cost. This example reflects the informal reasoning of the task definition due to deal with ambiguity of natural language [11].

Natural Language Inference Classifier

In this section, we explain our NLI model to predict over normative sentences. First, we explain the Transformer neural network architecture that we use to deal with NLI task. Second, we detail the attention mechanism we use to help our classifier focus on key parts of normative sentences. Third, we describe the feed-forward neural network that refines the internal representation of words in sentences. Finally, we show how the model predicts a class given its output representation.


Transformer is a type of neural network architecture that processes sequences based solely on attention mechanisms instead of using recurrent connections in the network. Vaswani et al. [17] developed this architecture to deal with the machine translation task, achieving the state-of-the-art performance. This architecture uses an Encoder-Decoder approach based on other machine translation neural networks such as Sequence to Sequence learning [16]. Approaches that use Transformer variations recently achieve state-of-the-art results on other natural language processing tasks, such as Question Answering [7] and Sentiment Classification [7].

Instead of using the entire Transformer architecture from machine translation tasks, we use only the Decoder part based on the work by Radford et al. [14]

that deals with the NLI classification task. The Transformer Decoder contains blocks that consist of a layer with a Self Attention Mechanism and a feed-forward neural network module. In the decoder block, we add a residual connection applying a sum over the input of each layer with its output followed by layer normalization. Figure 

1 illustrates this overall architecture with details of Decoder Transformer block of our model and how we adapt to a classification task.

Figure 1: Diagram of a Transformer Decoder block that may contain more than one stacked block in a single model.


Self-attention is a neural network mechanism that relates different positions of a single sequence to create a representation of the sequence. In natural language processing models, the objective of self-attention mechanism is to focus on relevant words rather than all words of the input sentence, giving an attention score for each word relation of the sequence. For example, the word ”not” is more relevant to check whether exists some contradiction between two sentences. The self-attention mechanism in a Transformer calculates the attention of each word using a query that maps an output given a set of key-value pairs [17]. Initially, our model uses a fully connected layer on the input representation and then split its output into three matrices, which represent the initial query, key and value

matrices of our attention model.

Our neural network uses the Scale Dot-Product attention model, which applies a matrix multiplication between the query and key matrices followed by a softmax function with the value matrix. To obtain more stable gradients, the attention model scales the matrix multiplication dividing it by , where represents the dimension of key matrix. Equation 1 computes the Scale Dot-Product attention model given a query, key, and value matrices represented by Q, K and V respectively.


To prevent the self-attention mechanism from computing the attention score to subsequent positions of a single word, we apply a mask that removes the score of subsequent words. Given a certain word, we put a value in its subsequent words in input of softmax generating extremely low attention scores in these illegal positions [17]. This masking process ensures that a word can depend only on already seen ones [17] ignoring its subsequent words. Table 1 describes an example of attention masking while processing the word “being” in sentence pair.

A Car is being driven . A Car is stuck [EOS]
0.5 0.78 0.1 current
Table 1: Example of sentence pair masked that contains very low values () on subsequent positions.

To learn diverse representations of attention, we use the Multi-Head attention approach [17] producing different attention scores for each word position. We initialize each head randomly to represent different attention projections with its respective , and matrices. Consequently, after the training process, the Multi-Head attention approach produces a different representation subspace for each head projecting distinct attention score of each word. To propagate its results for the subsequent layers, we concatenate all attention head results and apply a fully connected layer to reshape the output to the original size. Figure 2 illustrates how the Multi-Head approach computes each head in parallel receiving a word representation

resulting in a hidden representation


Figure 2: Diagram of attention model using Multi-Head approach using heads given an word input representation and output representation .

Position-Wise Feed Forward Network

After computing the self-attention layer, our decoder block uses a feed-forward neural network to process each word of the sentence separately and identically. Instead of using a ReLU activation function as the original transformer work 

[17], we follow radford2018improving radford2018improving in using the gaussian error linear unit [8] as the activation function computed in Equation 2. Equation 3 illustrates feed-forward network (FFN) operations over each word , which and represent the weight and bias of the -th layer respectively.


Output Representation

The output of a Transformer decoder is a sequence of learned embeddings of all tokens contained in the input sentence pair (i.e. the premise and hypothesis pair). However, we need to convert the decoder output into a single representation in order to predict the class of the whole sentence pair. The Transformer decoder computes the embedding of this special token considering all previous words due to the masking process that we defined in attention model, which ignores the subsequent positions. We include a special token at the end of the input pair to represent the whole sentence since the self-attention mechanism computes its embedding considering all previous words. With the special token embedding, we apply a fully connected layer to represent the predicted class of sentence pair. Our model uses a softmax activation function to generate a vector of probabilities containing a single value for each class.

Implementation Details

In this section, we describe how we develop our NLI classifier. First, we detail the NLI corpus that we use to train our neural network. Second, we describe how we represent a premise-hypothesis pair to feed our model. Third, we detail how we implemented our model describing the sizes of each layer and hyperparameters used. Finally, we report how we optimize our model and measure its error reporting training results on NLI dataset.

Stanford Natural Language Inference Corpus

The Stanford Natural language Inference (SNLI) corpus

111The corpus is free available on is a dataset that contains 570 thousand sentence pairs that were written in English and manually labeled by humans. The pair consists of a premise and a hypothesis sentence that could follow the premise or not. Each sentence pair contains a label that follows the NLI representation of three-way classification, which is categorized as entailment, contradiction or neutral.

Bowman et al. [5] used Amazon Mechanical Turk for data collection by asking each worker to supply a hypothesis text based on a scene description from a pre-existing corpus [19]. After this collection step, each worker received a sentence pair and was asked to choose a single label (E for entailment, C for contradiction or N for neutral) for each pair. This dataset contains five judgments of different workers and a consensus judgment, which is described in Table 2. Since the SNLI dataset contains descriptive phrases, all sentences are in the present tense, which is a substantial limitation for the contract conflict task, which contains multiple modal verbs.

Premise Text Hypothesis text Judgements
A soccer game
with multiple males
playing a sport.
Some men are
playing a sport.
A black race car
starts up in front
of a crowd of
A man is driving
down a lonely
Table 2: Examples of sentence pair and judgement resulted from the data collection process [5].

Word Representation

Given a word vocabulary, our neural network receives as input the word indexes followed by its relative position in the sentence. Since the Transformer architecture does not have any recurrent network layer to obtain word order, we use the information of relative position to provide order meaning for our model. We use a numeric value that represents the word position to retrieve positional embeddings as well as word embedding lookup process.

A positional embedding is a vector that contains the meaning of the position of each item in a sequence. Instead of using fixed embeddings for each position, our model learns positional embeddings via backpropagation. Finally, we sum both embeddings resulting in a word representation that contains both semantic and order information feeding the following layers.

Due to our model input being composed of two sentences (premise and hypothesis), we concatenate the premise with its hypothesis followed by a special token. This special token serves to retrieve the last hidden state, which we use as a representation of the entire pair. Table 3 shows an example of a premise-hypothesis contradiction pair followed by an end-of-sentence token (EOS), which the first row is the word indexes, second is the position indexes, third is the word embeddings, fourth is the position embeddings and finally the last row is the sum of embeddings.

Sentence pair: A Car is being driven . A Car is stuck [EOS]
Word indexes: 4 5 1 7 10 11 4 5 1 9 20
Positional indexes: 1 2 3 4 5 6 7 8 9 10 11
Word embedding: [1.3, …] [2.5, …] [6.7, …]
Positional embedding: [1.5, …] [3.7, …] [0.2, …]
Input representation of pair: [2.8, …] [6.2, …] [6.9, …]
Table 3: Example of input representation of our model composed by the meaning of word itself and its position in sentence.

Neural Network Architecture Details

We implemented our neural network model using 12 stacked blocks of Transformer decoders and included 12 heads in the attention layer of each decoder block following Radford et al [14]. Considering that the SNLI dataset contains a vocabulary that holds 56220 distinct tokens, we include 56580 vectors with 240 dimensions in our embedding layer representing word embeddings including 360 vectors for position embeddings. Although our model supports sentences with variable-length, we require to define a maximum length supported due to the embedding layer has a finite size. We defined as maximum length supported 360 considering that the number of words in a contract clause is greater than SNLI sentences.

Training Details

We train our neural network with the Adam algorithm [9] using an initial learning rate of 6.25e-5, which decays linearly with a scheduled warmup over 0.2% of training [10]

. We then apply gradient clipping during optimization to avoid exploding gradients restricting the parameter’s values between -1 and 1 


To measure network error, we apply the negative log-likelihood (NLL) loss function in output probabilities computed by the output layer 

[14]. Since we deal with a multi-class classification task, our loss function accumulates the log loss values of each class prediction. Given an output label for class and a premise-hypothesis pair , the goal of our model is to minimize the function shown in Equation 4.


During training execution, we define batches containing 16 randomly sampled instances of SNLI dataset. We created batches with similar sizes to prevent an excess of padding through ordering samples by the sum of premise and hypothesis sentence lengths and apply early stopping to avoid overtraining our neural network using the validation set as reference. In our training procedure, we selected as metric the accuracy obtained by our model in the validation set and 10 epochs of waiting. Our training stopped after 22 epochs and chose the parameters with the highest accuracy obtained in epoch 12. We validate our training results using accuracy and loss obtained in training and validation sets throughout epochs. Figure 

3 shows the value of the training metrics obtained by our model up to epoch 12.

Figure 3: Accuracy and loss obtained on training and validation set throughout the epochs.

Concerning the test set, our trained neural network achieved an accuracy of 82.1%. Although our work is similar to Radford et al[14]

, their work uses a pre-trained model with unsupervised learning to improve language understanding. Table

4 shows details about the state-of-the-art model and our neural network concerning accuracy and number of parameters.

Radford et al Our work
Training Accuracy (%) 96.6 90.5
Test Accuracy (%) 89.9 82.1
Number of parameters 80m 20m
Table 4: Results comparison of our model with Radford et al model improved with pre-training.

Model Application on Conflicting Norms

In this section, we describe the application of our trained model in conflicting norms stratified by conflict type. First, we introduce the norm conflict dataset and explain the conflict types. Second, we report on the results of norm pairs that contain conflicts based on conflicting modality. Third, we report on the results of norm pairs that contain conflicts based on norm structure. Finally, we discuss the limitations and issues of our model that we found which concern in normative actions in conflicting norm pairs.

Norm Conflict Classification Dataset

The Norm Conflict Dataset [1] consists of a corpus that contains clauses from existing contracts labeled with different conflict types. The source of these contract clauses is the Onecle222 site, which is a repository of business contracts. Aires et al [1] labeled the normative sentence pairs manually using a web-based tool that selects randomly a contract clause requesting a human to create a second norm in such a way as to create conflict with the selected norm. This dataset contains the following conflict types: deontic-modality, deontic-structure, deontic-object and object-conditional.

The deontic-modality conflict type indicates conflicts originated by the deontic statement of each clause, i.e., prohibition obligation, obligation permission, and permission prohibition. Deontic-structure conflict types involves different deontic meaning but with different sentence structure. Deontic-object conflict occurs when norm actions of the pair are conflicting, which represents the object of normative sentences. The object-conditional conflict occurs when condition of norm actions are conflicting. Table 5 shows examples of norm pairs contained in norm conflict dataset with their respective conflict types.

Norm Pair Conflict Type
- The Specifications may be amended by the NCR design release process.
- The Specifications shall not be amended by the NCR design release process.
deontic modality
- All inquiries that Seller receives on a worldwide basis relative to Buyer’s air chamber ”Products” as
specified in Exhibit III, shall be directed to Buyer.
- Seller may not redirect inquiries concerning Buyer’s air chamber ”Products”.
deontic structure
- Autotote shall make available to Sisal one (1) working prototype of the Terminal by May 1, 1998.
- Autotote shall make available to Sisal one (1) working prototype of the Terminal by June 12, 1998.
deontic object
- The Facility shall meet all legal and administrative code standards applicable to the conduct of the
Principal Activity thereat.
- Only if previously agreed, the Facility ought to follow legal and administrative code standards.
object conditional
Table 5: Examples of norm pairs with the respective conflict type.

In this work, we apply our trained model on conflicting pairs of this dataset to find logical relations between contract clauses different conflict types. We execute two times the model over pairs interleaving the roles of premise and hypothesis.

Deontic Modal Conflicts

In executions on deontic-modal conflict pairs, we note that our model could detect the intensity of modal verbs. Given both sentences with similar intensity, our model results in high scores of entailment and neutrality considering the inference direction. For example, our model outputs the entailment relation between a considerable number of pairs that have the modal verbs shall and may in both directions. However, when the premise contains the shall verb, the entailment score is greater than premises with may.

These scores show that our model increases entailment probability when the premise contains an obligation and the hypothesis norm contains a permissible action. On the other hand, our model shows that the opposite is not true, which increases neutrality score when obligation comes from hypothesis norm. However, our model fails to infer these relations when the hypothesis or premise has the modal verb will. Since our training dataset contains only sentences in the present tense, we suspect that our model does not recognize the word will accurately. Table 6 shows examples of pairs with these modal verbs with softmax score of our neural network output layer for neutral and entailment classes.

Norm Pair
(a) Purchaser shall also be responsible for all property taxes on the equipment.
(b) Purchaser may also be responsible for all property taxes on the equipment.
0.96 0.03 0.88 0.08
(a) CoPacker shall deliver all the Products that WWI purchases under this Agreement to
(b) CoPacker may deliver all the Products that WWI purchases under this Agreement to
0.71 0.20 0.30 0.53
(a) CBSI will retain the originals in its archives.
(b) CBSI may retain the originals in its archives.
0.59 0.35 0.91 0.07
Table 6: Examples of norm pairs with deontic-modal conflicts describing the model results given two norms with different modal verbs and deontic meaning.

In norms that contains negation between deontic meanings, our model had problems on bidirectional executions. The trained neural network does not classify contradictions with reasonable accuracy when negations come from the premise norm. We consider that this issue is related to our training dataset (SNLI), which may be unbalanced concerning the negation side.

Deontic Structure

In the deontic structure conflict type, our model has the same problems as in deontic-modality conflicts. However, in this conflict case, we note that our neural network could generalize contradictions and entailment regarding modal verb in different sentence structures. This shows that our model can infer a logical relation where norm pairs contain different words with similar meanings. Table 7 shows unidirectional results of our model where the sentence (a) is the premise and sentence (b) is the hypothesis.

Norm Pair
(a) Autotote will own the Intellectual Property Rights to all said prototypes.
(b) Autotote shall not own the Intellectual Property Rights to prototypes.
0.04 0.94 0.02
(a) Medica shall also maintain records with respect to its costs, obligations, and performance under
this agreement.
(b) With respect to its costs, obligations, and performance under this agreement, Medica is not
obliged to maintain records.
0.11 0.81 0.08
(a) Teknika will notify LSI that it considers that a Triggering Event has occurred.
(b) Teknika shall not notify LSI of any regular event.
0.01 0.98 0.01
(a) Customer will notify USF at least [***] days in advance of special promotions that may cause
unusual or excessive demand on inventory.
(b) Customer should notify USF of special promotions that may cause unusual and excessive
demand on inventory.
0.91 0.00 0.09
Table 7: Norm pairs with distinct sentence structures and their softmax scores for entailment (E), contradiction (C) and neutral (N) classes generated by our model.

Deontic-Object and Object-Conditional

Given conflicts that involve a difference between norm’s object, our model fails to generate reasonable classifications. These results indicate that our neural network does not detect object context in a normative sentence on conflict context. Based on results in both conflicts type that concern norm actions, our neural network output its prediction based on keywords such as modal verb and negation words (not). Therefore, our model tends to disregard norm actions when words of both norms are different or norm action structure is composed of different words.

Table 8 shows examples of norm pairs that contain conflicts related to the action of clauses. The first example shows that our model does not recognize accurately the sentence action ignoring the associated objects. The second example illustrates a contradiction instance that our model could capture due to the similar structure between words of pair. Finally, the third example describes an instance that involves conflicts that concerns the conditional definition of norms.

Norm Pair
(a) The arbitration shall be conducted in Tampa, Florida.
(b) The arbitration shall be conducted in St. Petersburg, Florida.
0.92 0.02 0.06 deontic-object
(a) Hershey will cooperate in no shipping procedures.
(b) Hershey will cooperate in all shipping procedures.
0.05 0.88 0.07 deontic-object
(a) Where applicable, Taxes shall appear as separate items on Adaptec’s
(b) If shipping products, the Taxes shall appear along the other items on
Adaptec’s invoice.
0.91 0.04 0.05 object-conditional
Table 8: Example of norm pairs with conflicts that concerns in norm action with respective entailment (E), contradiction (C) and neutral (N) softmax score.

Related Work

In this section, we present related work that analyze normative sentences and contract clauses. We describe the related works explaining the problem dealt, their objectives and how they represent a normative sentence. We compare the objective of this work with the related work and discuss the differences.

Aires et al  [2] develop an approach that identifies potential conflicts between norms in contracts. First, they focus on norm identification, which results in a formal representation of a norm. Second, they use the formal representation to detect and classify potential conflicts between norms using techniques of the formal logic. This approach assumes that a norm follows a well-defined 4-component structure: an indexing number or letter, one or more named parties, a modal verb, and a behavior description. Given this structure, they apply a regular expression to decide whether a sentence is a norm sentence or not. After identifying norms, they create a formal representation of the norm sentence extracting three components: party name, deontic meaning, and the norm action. With the formal representation, they detect potential conflicts following three relations between deontic meaning in norm pairs [15]:

  • Permission and Prohibition

  • Permission and Obligation

  • Obligation and Prohibition

Instead of using a formal representation to use a strict logic approach, we explored the use of techniques that deal with the informal reasoning of natural language through neural network application. We use a neural network considering SNLI dataset to deal with the challenges of NLI such as lexical semantic knowledge and the variability of linguistic expression [11].

Aires et al [1] introduce a typology of conflicts in normative sentences and present machine learning methods that classify these conflict types. These learning methods rely on the semantic representation of norms using Sent2Vec [12]

to create embedding vectors to represent the norms. First, they describe an extension of Aires Norm Dataset to include the conflict typology, which introduces 228 new conflicting norms including the existing 111 from the previous dataset. Second, they present an unsupervised learning method to detect the presence or absence of norm conflicts. Finally, they present a supervised learning method that deals with binary (i.e., conflicts and non-conflict) and multi-class classification method to classify the conflict types created. In this work, we use the dataset made by the authors to validate logical relations. Instead of classifying conflicts, we use the conflict type to illustrating how our model can help in contract analysis showing potential conflicts that concern logical problems. These conflicts help us to detect points that our model can improve regarding logical inference.


In this work, we present an approach to identify a logical relation between contract clauses. At this point, we have trained a neural network with SNLI dataset to classify inference between a premise and a hypothesis. We use this trained neural network on a corpus that contains a conflicting set of norms to validate whether our model can help in contract analysis. The application of our neural network on conflicting norms could help us to identify some issues of our approach. Although our model has issues that involve the direction of NLI inference, we show that our model can detect potential contradictions in contract clauses regardless of their structures. Furthermore, we reported that our model can identify deontic meaning between norms assigning an entailment score based on modal verb intensity.

As future work, we intend to improve our neural network to detect the gaps identified during contract clauses analysis. First, we intend to explore others training datasets such as the Multi-Genre NLI Corpus (MNLI) [18], which were modeled based no SNLI but differs in that covers a range of genres of spoken and written text. Second, we aim to use pre-trained models such as BERT [7], which are state-of-the-art in a wide variety of Natural Language Process tasks such as Natural Language Inference.


  • [1] J. P. Aires, R. Granada, J. Monteiro, R. C. Barros, and F. Meneguzzi (2019) Classification of contractual conflicts via learning of semantic representations. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. To appear. Cited by: Norm Conflict Classification Dataset, Related Work.
  • [2] J. P. Aires, D. Pinheiro, V. S. d. Lima, and F. Meneguzzi (2017-12-01) Norm conflict identification in contracts. Artificial Intelligence and Law 25 (4), pp. 397–428. External Links: ISSN 1572-8382, Document, Link Cited by: Related Work.
  • [3] J. P. Aires, D. Pinheiro, and F. Meneguzzi (2017-03) Norm Dataset: Dataset with Norms and Norm Conflicts. External Links: Document, Link Cited by: Introduction.
  • [4] J. Bos and K. Markert (2005) Recognising textual entailment with logical inference. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 628–635. Cited by: Introduction.
  • [5] S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning (2015) A large annotated corpus for learning natural language inference. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing, pp. 632–642 (English (US)). Cited by: Introduction, Natural Language Inference, Natural Language Inference, Stanford Natural Language Inference Corpus, Table 2.
  • [6] J. P. de Souza Aires (2015) Identifying potential conflicts between norms in contracts. Master’s Thesis, Faculdade de Informática – PUCRS, Porto Alegre, RS, Brasil. External Links: Link Cited by: Introduction.
  • [7] J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: Transformer, Conclusion.
  • [8] D. Hendrycks and K. Gimpel (2016) Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415. Cited by: Position-Wise Feed Forward Network.
  • [9] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: Training Details.
  • [10] I. Loshchilov and F. Hutter (2017) Fixing weight decay regularization in adam. arXiv preprint arXiv:1711.05101. Cited by: Training Details.
  • [11] B. Maccartney (2009) Natural language inference. Ph.D. Thesis, Stanford University, Stanford, CA, USA. Note: AAI3364139 External Links: ISBN 978-1-109-24088-7 Cited by: Introduction, Natural Language Inference, Natural Language Inference, Natural Language Inference, Related Work.
  • [12] M. Pagliardini, P. Gupta, and M. Jaggi (2018)

    Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

    In NAACL 2018 - Conference of the North American Chapter of the Association for Computational Linguistics, Cited by: Related Work.
  • [13] R. Pascanu, T. Mikolov, and Y. Bengio (2013)

    On the difficulty of training recurrent neural networks

    In International conference on machine learning, pp. 1310–1318. Cited by: Training Details.
  • [14] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever (2018) Improving language understanding by generative pre-training. URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/languageunsupervised/language understanding paper. pdf. Cited by: Transformer, Neural Network Architecture Details, Training Details, Training Details.
  • [15] A. Sadat-Akhavi (2003) Methods of resolving conflicts between treaties. Vol. 3, Martinus Nijhoff Publishers. Cited by: Related Work.
  • [16] I. Sutskever, O. Vinyals, and Q. V. Le (2014) Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pp. 3104–3112. Cited by: Transformer.
  • [17] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 5998–6008. External Links: Link Cited by: Self-Attention, Self-Attention, Self-Attention, Position-Wise Feed Forward Network, Transformer.
  • [18] A. Williams, N. Nangia, and S. Bowman (2018) A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1112–1122. External Links: Link Cited by: Introduction, Conclusion.
  • [19] P. Young, A. Lai, M. Hodosh, and J. Hockenmaier (2014) From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics 2, pp. 67–78. Cited by: Stanford Natural Language Inference Corpus.