PunFields at SemEval-2017 Task 7: Employing Roget's Thesaurus in Automatic Pun Recognition and Interpretation

by   Elena Mikhalkova, et al.

The article describes a model of automatic interpretation of English puns, based on Roget's Thesaurus, and its implementation, PunFields. In a pun, the algorithm discovers two groups of words that belong to two main semantic fields. The fields become a semantic vector based on which an SVM classifier learns to recognize puns. A rule-based model is then applied for recognition of intentionally ambiguous (target) words and their definitions. In SemEval Task 7 PunFields shows a considerably good result in pun classification, but requires improvement in searching for the target word and its definition.



There are no comments yet.


page 1

page 2

page 3

page 4


Detecting Intentional Lexical Ambiguity in English Puns

The article describes a model of automatic analysis of puns, where a wor...

Automatic classification of bengali sentences based on sense definitions present in bengali wordnet

Based on the sense definition of words available in the Bengali WordNet,...

Conditional Generators of Words Definitions

We explore recently introduced definition modeling technique that provid...

Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence

Sememes, defined as the minimum semantic units of human languages in lin...

A rule based algorithm for detecting negative words in Persian

In this paper, we present a novel method for detecting negative words in...

Self reference in word definitions

Dictionaries are inherently circular in nature. A given word is linked t...

Latent Cognizance: What Machine Really Learns

Despite overwhelming achievements in recognition accuracy, extending an ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The following terminology is basic in our research of puns. A pun is a) a short humorous genre, where a word or phrase is intentionally used in two meanings, b) a means of expression, the essence of which is to use a word or phrase so that in the given context the word or phrase can be understood in two meanings simultaneously. A target word is a word, that appears in two meanings. A homographic pun is a pun that “exploits distinct meanings of the same written word” Miller and Gurevych (2015) (these can be meanings of a polysemantic word, or homonyms, including homonymic word forms). A heterographic pun is a pun, in which the target word resembles another word, or phrase in spelling; we will call the latter the second target word. Consider the following example (the Banker joke):

“I used to be a banker, but I lost interest.”

The Banker joke is a homographic pun; interest is the target word. Unlike it, the Church joke below is a heterographic pun; propane is the target word, profane is the second target word:

“When the church bought gas for their annual barbecue, proceeds went from the sacred to the propane.”

Our model of automatic pun analysis is based on the following premise: in a pun, there are two groups of words, and their meanings, that indicate the two meanings in which the target word is used. These groups can overlap, i.e. contain the same polysemantic words, used in different meanings.

In the Banker joke, words, and collocations banker, lost interest point at the professional status of the narrator, and his/her career failure. At the same time, used to, lost interest tell a story of losing emotional attachment to the profession: the narrator became disinterested. The algorithm of pun recognition, which we suggest, discovers these two groups of words, based on common semes111Bits of meaning. Semes are some parts of meaning, present both in the word and in its hypernym. Moving up the taxonomy, like Thesaurus, or WordNet, hypernyms become more general, and the seme, connecting them to the word, becomes more general, too. (Subtask 1), finds the words, which belong to the both groups, and chooses the target word (Subtask 2), and, based on the common semes, picks up the best suitable meaning, which the target word exploits (Subtask 3). In case of heterographic puns, in Subtask 2, the algorithm looks for the word, or phrase, which appears in one group and not in the other.

2 Subtask 1: Mining Semantic Fields

We will call a semantic field a group of words and collocations, which share a common seme. In taxonomies, like WordNet Kilgarriff and Fellbaum (2000), and Roget’s Thesaurus Roget (2004) (further referred to as Thesaurus), semes appear as hierarchies of word meanings. Top-levels attract words with more general meanings (hypernyms). For example, Thesaurus has six top-level Classes, that divide into Divisions, that divide into Sections, and so on, down to the fifth lowest level. WordNet’s structure is not so transparent. CITE!!! 10 TOP-semes Applying such dictionaries to get semantic fields (the mentioned common groups of words) in a pun is, therefore, the task of finding two most general hypernyms in WordNet, or two relevant Classes among the six Classes in Thesaurus. We chose Thesaurus, as its structure is only five levels deep, Classes labels are not lemmas themselves, but arbitrary names (we used numbers instead), and it allows parsing on a certain level, and insert corrections (adding lemmas, merging subsections, etc.222For example, we edited Thesaurus, adding words, which were absent in it. If a word in a pun was missing in Thesaurus, the system checked up for its hypernyms in Wordnet, and added the word to those Sections in Thesaurus, which contained the hypernyms.). After some experimentation, instead of Classes, we chose to search for relevant Sections, which are 34 subdivisions of the six Classes333Sections are not always immediate subdivisions of a Class. Some Sections are grouped in Divisions..

After normalization (including change to lowercase; part-of-speech tagging, tokenization, and lemmatization with NLTK tools Bird et al. (2009); collocation extraction444To extract collocations and search for them in Thesaurus, we applied our own procedure, based on a part-of-speech analysis.; stop-words removal555After lemmatization, all words are analyzed in collocations, but only nouns, adjectives, and verbs compose a list of separate words.), the algorithm collects Section numbers for every word, and collocation, and removes duplicates (in Thesaurus, homonyms proper can belong to different subdivisions in the same or different Sections). Table 1 shows what Sections words of the Banker joke belong to.

Word Section No., Section name in Thesaurus
I -
use 24, Volition In General
30, Possessive Relations
to -
be 0, Existence
19, Results Of Reasoning
a -
banker 31, Affections In General
30, Possessive Relations
but -
lose 21, Nature Of Ideas Communicated
26, Results Of Voluntary Action
30, Possessive Relations
19, Results Of Reasoning
interest 30, Possessive Relations
25, Antagonism
24, Volition In General
7, Causation
31, Affections In General
16, Precursory Conditions And Operations
1, Relation
Table 1: Semantic fields in the Banker joke

Then the semantic vector of a pun is calculated. Every pun is a vector in a 34-dimensional space:

The value of every element equals the number of words in a pun, which belong to a Section . The algorithm passes from a Section to a Section, each time checking every word in the bunch of extracted words . If a word belongs to a Section, the value of RAISES BY???? 1:

For example, the semantic vector of the Banker joke looks as follows: see Table 2.

Table 2: Semantic vector of the Banker joke

To test the algorithm, we, first, collected 2484 puns from different Internet resources and, second, built a corpus of 2484 random sentences of length 5 to 25 words from different NLTK corpora Bird et al. (2009) plus several hundred aphorisms and proverbs from different Internet sites. We shuffled and split the sentences into two equal groups, the first two forming a training set, and the other two a test set. The classification was conducted, using different Scikit-learn Pedregosa et al. (2011) algorithms. We also singled out 191 homographic puns, and 198 heterographic puns, and tested them against the same number of random sentences. In all the tests666The tests were run before the competition. Results of the competition for our system are given in Table 6.

, the Scikit-learn algorithm of SVM with the Radial Basis Function (RBF) kernel produced the highest average F-measure results (

). In addition, its results are smoother, comparing the difference between precision, and recall (which leads to the highest F-measure scores) within the two classes (puns, and random sentences), and between the classes (average scores). Table 

3 illustrates results of different algorithms in class “Puns” (not average results between puns, and not puns). The results were higher for the split selection, reaching 0.79 (homographic), and 0.78 (heterographic) scores of F-measure. The common selection got the maximum of 0.7 for average F-measure in several tests. The higher results of split selection may be due to a larger training set.

Method Precision Recall F-measure
Common selection
SVM with linear kernel 0.67 0.68 0.67
SVM with polynomial kernel 0.65 0.79 0.72
SVM with Radial Basis Function (RBF) kernel 0.70 0.70 0.70
SVM with linear kernel, normalized data 0.62 0.74 0.67
Homographic puns
SVM with RBF kernel 0.79 0.80 0.79

Multinomial Naive Bayes

0.71 0.80 0.76
Logistic Regression, standardized data 0.77 0.71 0.74
Heterographic puns
SVM with RBF kernel 0.77 0.79 0.78
Logistic Regression 0.74 0.75 0.74
Table 3: Tests for pun recognition.

3 Subtask 2: Hitting the Target Word

We suggest that, in a homographic pun, the target word is a word, which immediately belongs to two semantic fields; in a heterographic pun, the target word belongs to at least one discovered semantic field, and does not belong to the other. However, in reality, words in a sentence tend to belong to too many fields, and they create noise in the search. To reduce influence of noisy fields, we included such non-semantic features in the model as the tendency of the target word to occur at the end of a sentence, and part-of-speech distribution, given in Miller and Gurevych (2015). A-group () and B-group () are groups of words in a pun, which belong to the two semantic fields, sharing the target word. Thus, for some , becomes , or  777 is always an integer; and are always lists of words; is always an integer, is a list of one or more integers.. A-group attracts the maximum number of words in a pun:

In the Banker joke, (Possessive Relations); words, that belong to this group, are use, lose, banker, interest. B-group is the second largest group in a pun:

In the Banker joke, . There are three groups of words, which have two words in them: , Results Of Reasoning: be, lose; , Volition In General: use, interest; , Affections In General: banker, interest. Ideally, there should be a group of about three words, and collocations, describing a person‘s inner state (used to be, lose, interest), and two words (lose, interest) in are a target phrase. However, due to the shortage of data about collocations in dictionaries, is split into several smaller groups. Consequently, to find the target word, we have to appeal to other word features. In testing the system on homographic puns, we relied on the polysemantic character of words. If in a joke, there are more than one value of , candidates merge into one, with duplicates removed, and every word in becomes the target word candidate: . In the Banker joke, is a list of be, lose, use, interest, banker; . Based on the definition of the target word in a homographic pun, words from , that are also found in , should have a privilege. Therefore, the first value , each word gets, is the output of the Boolean function:

The second value is the absolute frequency of a word in the union of , , etc., including duplicates: .

The third value is a word position in the sentence: the closer the word is to the end, the bigger this value is. If the word occurs several times, the algorithm counts the average of the sums of position numbers.

The fourth value is part-of-speech probability

. Depending on the part of speech, the word belongs to, it gets the following rate:

The final step is to count rates, using multiplicative convolution, and choose the word with the maximum rate:

Values of the Banker joke are illustrated in Table 4.

Word form
be 1 1 4 0.338 1.352
lose 2 1 9 0.338 6.084
use 2 1 2 0.338 1.352
interest 2 2 10 0.502 20.08
banker 2 1 6 0.502 6.024
Table 4: Values of the Banker joke.

In the solution for heterographic puns, we built a different model of B-group. Unlike homographic puns, here the target word is missing in (the reader has to guess the word or phrase, homonymous to the target word). Accordingly, we rely on the completeness of the union of and : among the candidates for (the second largest groups), such groups are relevant, that form the longest list with (duplicates removed). In Ex. 2 (the Church joke), , and two groups form the largest union with it: . Every word in and can be the target word. The privilege passes to words, used only in one of the groups. Ergo, the first value is:

Frequencies are of no value here; values of position in the sentence, and part-of-speech distribution remain the same. The function output is:

Values of the Church joke are illustrated in Table 5.

Word form
propane 2 18 0.502 18.072
annual 2 8 0.131 2.096
gas 2 5 0.502 5.02
sacred 2 15 0.338 10.14
church 2 3 0.502 3.012
barbecue 2 9 0.502 9.036
go 2 12 0.338 8.112
proceeds 2 11 0.502 11.044
buy 2 4 0.338 2.704
Table 5: Values of the Church joke.

4 Subtask 3: Mapping Roget’s Thesaurus to Wordnet

In the last phase, we implemented an algorithm which maps Roget’s Sections to synsets in Wordnet. In homographic puns, definitions of a word in Wordnet are analyzed similarly to words in a pun, when searching for semantic fields, the words belong to. For example, words from the definitions of the synset interest belong to the following Roget’s Sections: Synset(interest.n.01)=a sense of concern with and curiosity about someone or something: (21, 19, 31, 24, 1, 30, 6, 16, 3, 31, 19, 12, 2, 0); Synset(sake.n.01)=a reason for wanting something done: 15, 24, 18, 7, 19, 11, 2, 31, 24, 30, 12, 2, 0, 26, 24, etc. When A-Section is discovered (for example, in the Banker joke, A=30 (Possessive Relations)), the synset with the maximum number of words in its definition, which belong to A-Section, becomes the A-synset. The B-synset is found likewise for the B-group with the exception that it should not coincide with A-synset. In heterographic puns the B-group is also a marker of the second target word. Every word in the index of Roget’s Thesaurus is compared to the known target word using Damerau-Levenshtein distance. The list is sorted in increasing order, and the algorithm begins to check what Roget’s Sections every word belongs to, until it finds the word that belongs to a Section (or the Section, if there is only one) in the B-group. This word becomes the second target word.

Nevertheless, as we did not have many trial data, but for the four examples, released before the competition, the first trials of the program on a large collection returned many errors, so we changed the algorithm for the B-group as follows.

Homographic puns, first run. B-synset is calculated on the basis of sense frequencies (the output is the most frequent sense). If it coincides with A-synset, the program returns the second frequent synset.

Homographic puns, second run. B-synset is calculated on the basis of Lesk distance, using built-in NLTK Lesk function Bird et al. (2009). If it coincides with A-synset, the program returns another synset on the basis of sense frequencies, as in the first run.

Heterographic puns, first run. The second target word is calculated, based on Thesaurus and Damerau-Levenshtein distance; words, missing in Thesaurus, are analyzed as their WordNet hypernyms. In both runs for heterographic puns, synsets are calculated, using the Lesk distance.

Heterographic puns, second run. The second target word is calculated on the basis of Brown corpus (NLTK Bird et al. (2009)): if the word stands in the same context in Brown as it is in the pun, it becomes the target word. The size of the context window is (0; +3) for verbs, (0;+2) for adjectives; (-2;+2) for nouns, adverbs and other parts of speech within the sentence, where a word is used.

Table 6 illustrates competition results of our system.

Task Precision Recall Accuracy F1
1, Ho.888Homographic. 0.8019 0.7785 0.7044 0.7900
1, He.999Heterographic. 0.7585 0.6326 0.5938 0.6898
Task Coverage Precision Recall F1
2, Ho., run 1 1.0000 0.3279 0.3279 0.3279
2, Ho., run 2 1.0000 0.3167 0.3167 0.3167
2, He., run 1 1.0000 0.3029 0.3029 0.3029
2, He., run 2 1.0000 0.3501 0.3501 0.3501
3, Ho., run 1 0.8760 0.0484 0.0424 0.0452
3, Ho., run 2 1.0000 0.0331 0.0331 0.0331
3, He., run 1 0.9709 0.0169 0.0164 0.0166
3, He., run 2 1.0000 0.0118 0.0118 0.0118
Table 6: Competition results.

5 Conclusion

The system, that we introduced, is based on one general supposition about the semantic structure of puns and combines two types of algorithms: supervised learning and rule-based. Not surprisingly, the supervised learning algorithm showed better results in solving an NLP-task, than the rule-based. Also, in this implementation, we tried to combine two very different dictionaries (Roget’s Thesaurus and Wordnet). And, although reliability of Thesaurus in reproducing a universal semantic map can be doubted, it seems to be a quite effective source of data, still, when used in Subtask 1. The attempts to map it to Wordnet seem rather weak, so far, concerning the test results, which also raises a question: if different dictionaries treat meaning of words differently, can there be an objective and/or universal semantic map, to apply as the foundation for any WSD task?


  • Bird et al. (2009) Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc.
  • Kilgarriff and Fellbaum (2000) Adam Kilgarriff and Christiane Fellbaum. 2000. Wordnet: An electronic lexical database.
  • Miller and Gurevych (2015) Tristan Miller and Iryna Gurevych. 2015. Automatic disambiguation of English puns. In ACL (1). pages 719–729.
  • Pedregosa et al. (2011) Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011.

    Scikit-learn: Machine learning in Python.

    Journal of Machine Learning Research 12(Oct):2825–2830.
  • Roget (2004) Peter Mark Roget. 2004. Roget’s thesaurus of English words and phrases. Project Gutenberg.