Representing the semantics of linguistic items in a machine-interpretable form has been a major goal of Natural Language Processing since its earliest days. Among the range of different linguistic items, words have attracted the most research attention. However, word representations have an important limitation: they conflate different meanings of a word into a single vector. Representations of word senses have the potential to overcome this inherent limitation. Indeed, the representation of individual word senses and concepts has recently gained in popularity with several experimental results showing that a considerable performance improvement can be achieved across different NLP applications upon moving from word level to the deeper sense and concept levels. Another interesting point regarding the representation of concepts and word senses is that these models can be seamlessly applied to other linguistic items, such as words, phrases and sentences.
This tutorial111Slides available at http://goo.gl/az7tBD
will first provide a brief overview of the recent literature concerning word representation (both count and neural network based). It will then describe the advantages of moving from the word level to the deeper level of word senses and concepts, providing an extensive review of state-of-the-art systems. Approaches covered will not only include those which draw upon knowledge resources such as WordNet, Wikipedia, BabelNet or FreeBase as reference, but also the so-called multi-prototype approaches which learn sense distinctions by using different clustering techniques. Our tutorial will discuss the advantages and potential limitations of all approaches, showing their most successful applications to date. We will conclude by presenting current open problems and lines of future work.
2.1 Semantic Representation: Foundations
This session provides the necessary background for semantic representation. We will briefly cover the traditional vector space model [Turney and Pantel2010] followed by the more recent approaches based on neural networks [Mikolov et al.2013a]. We then provide reasons for the need to produce semantic representations for the deeper word sense level, focusing on the main limitation of the word-based approaches which is their inherent ambiguity. Finally, we show how sense-based representations are bound to overcome these limitations, hence providing improvements across several tasks.
2.2 Knowledge-based sense representations
We start this session by briefly introducing some of the most popular lexical knowledge resources that have been used by different sense representation techniques. We put emphasis on WordNet [Miller et al.1990], the de facto standard sense inventory in the community, and Wikipedia, the largest collaboratively-constructed resource of the type, both of which have been extensively used by many researchers in the area. We discuss the advantages each of these resources provides and show how they are usually viewed as semantic networks and exploited for representation purposes.
Then, we provide a deep review of different techniques that learn representations for individual concepts in a target sense inventory. We cover all the existing approaches that model concepts in WordNet [Pilehvar and Navigli2015], articles in Wikipedia [Hassan and Mihalcea2011], or concepts in larger sense inventories such as BabelNet [Iacobacci et al.2015, Camacho-Collados et al.2016b] or FreeBase [Bordes et al.2011, Bordes et al.2013]. We will also cover some approaches that make use of additional external corpora (or word representations learned on the basis of statistical clues) besides the target knowledge resource [Chen et al.2014, Chen et al.2015, Johansson and Piña Nieto2015, Jauhar et al.2015, Rothe and Schütze2015, Pilehvar and Collier2016]. We discuss the advantages of these knowledge-based representations and focus on how neural network-based learning has played a role in this area in the past few years.
2.3 Unsupervised sense representations
In this session we cover the so-called multi-prototype techniques that learn multiple representations per word, each corresponding to a specific meaning of the word. We will illustrate how these approaches leverage clustering algorithms for dividing the contexts of a word into multiple contexts for its different meanings [Reisinger and Mooney2010, Huang et al.2012, Neelakantan et al.2014, Tian et al.2014, Wu and Giles2015, Li and Jurafsky2015, Liu et al.2015, Vu and Parker2016, Šuster et al.2016].
2.4 Advantages and limitations
This session reviews some of the advantages and limitations of the knowledge-based and unsupervised techniques, describing the applications for which they are suitable and mentioning some issues such as the knowledge acquisition bottleneck.
This session focuses on different applications of sense representations. We briefly mention some of the main applications and tasks to which sense representations can be applied. Sense representations may be used in virtually every task in which word representations have been traditionally applied. Examples of such tasks include automatic thesaurus generation [Crouch1988, Curran and Moens2002], information extraction [Laender et al.2002], semantic role labelling [Erk2007, Pennacchiotti et al.2008], and word similarity [Deerwester et al.1990, Turney et al.2003, Radinsky et al.2011, Mikolov et al.2013b] and clustering [Pantel and Lin2002]. We will provide comparisons between word and sense representations performance, discussing the advantages and limitations of each approach. Moreover, we will show how sense representations can also be applied to a wide variety of additional tasks such as entity linking and word sense disambiguation [Navigli2009, Chen et al.2014, Camacho-Collados et al.2015b, Rothe and Schütze2015, Camacho-Collados et al.2016a], sense clustering [Snow et al.2007, Camacho-Collados et al.2015a], alignment of lexical resources [Niemann and Gurevych2011, Pilehvar and Navigli2014], taxonomy learning [Espinosa-Anke et al.2016], knowledge-base completion [Bordes et al.2013], information extraction [Delli Bovi et al.2015], or sense-based semantic similarity [Budanitsky and Hirst2006, Pilehvar et al.2013, Iacobacci et al.2015], to name a few.
2.6 Open problems and future work
This last session provides a summary of possible directions of future work on semantic sense representation. We discuss various problems associated with the current representation approaches and propose lines of research in order to effectively apply sense representations in natural language understanding tasks.
José Camacho Collados
is a Google Doctoral Fellow and PhD student at the Sapienza University of Rome222http://wwwusers.di.uniroma1.it/~collados/, working under the supervision of Prof. Roberto Navigli. His research focuses on Natural Language Processing and on the area of lexical semantics in particular. He has developed Nasari333http://lcl.uniroma1.it/nasari/ [Camacho-Collados et al.2016b], a novel semantic vector representation for concepts based on Wikipedia that features a high coverage of named entities and has been successfully used on different NLP tasks. José has also worked on the development of evaluation benchmarks for word and sense representations [Camacho-Collados et al.2015c, Camacho-Collados and Navigli2016] and is the co-organizer of a SemEval 2017 task on multilingual and cross-lingual semantic similarity. His background education includes an Erasmus Mundus Master in Natural Language Processing and Human Language Technology and a 5-year BSc degree in Mathematics.
is a PhD student at the Sapienza University of Rome444https://iiacobac.wordpress.com/
, working under the supervision of Prof. Roberto Navigli. His research interests lie in the fields of Machine Learning, Natural Language Processing, Neural Networks. He is currently working on Word Sense Disambiguation and Distributional Semantics. Ignacio presented SensEmbed555http://lcl.uniroma1.it/sensembed/ at ACL 2015 [Iacobacci et al.2015], a novel approach for word and relational similarity built from exploiting semantic knowledge for modeling arbitrary word senses in a large sense inventory. His background includes a MSc. in Computer Science and 8 years as a developer including 4 years as a Machine Learning - NLP specialist.
is an Associate Professor in the Department of Computer Science at La Sapienza University of Rome and a member of the Linguistic Computing Laboratory666http://wwwusers.di.uniroma1.it/~navigli/
. His research interests lie in the field of Natural Language Processing, including: Word Sense Disambiguation and Induction, Ontology Learning, Knowledge Representation and Acquisition, and multilinguality. In 2007 he received a Ph.D. in Computer Science from La Sapienza and he was awarded the Marco Cadoli 2007 AI*IA national prize for the Best Ph.D. Thesis in Artificial Intelligence. In 2013 he received the Marco Somalvico AI*IA prize, awarded every two years to the best young Italian researcher in Artificial Intelligence. He is the creator and founder of BabelNet777http://www.babelnet.org [Navigli and Ponzetto2012], both a multilingual encyclopedic dictionary and a semantic network, and its related project Babelfy888http://www.babelfy.org [Moro et al.2014], a state-of-the-art multilingual disambiguation and entity linking system. He is also the Principal Investigator of MultiJEDI999http://multijedi.org/, a 1.3M euro 5-year Starting Grant funded by the European Research Council and the responsible person of the Sapienza unit in LIDER, an EU project on content analytics and language technologies. Moreover, he is the Co-PI of ”Language Understanding cum Knowledge Yield” (LUcKY), a Google Focused Research Award on Natural Language Understanding.
Mohammad Taher Pilehvar
is a Research Associate in the Language Technology Lab of the University of Cambridge101010http://www.pilevar.com/taher/ where he is currently working on NLP in the biomedical domain. Taher completed his PhD in 2015 under the supervision of Prof. Roberto Navigli. Taher’s research lies in lexical semantics, mainly focusing on semantic representation, semantic similarity, and Word Sense Disambiguation. He has co-organized three SemEval tasks [Jurgens et al.2014, Jurgens and Pilehvar2016] and has authored multiple conference and journal papers on semantic representation and similarity in top tier venues. He is the first author of a paper on semantic similarity that was nominated for the best paper award at ACL 2013 [Pilehvar and Navigli2013].
The authors gratefully acknowledge the support of the ERC Starting Grant MultiJEDI No. 259234.
- [Bordes et al.2011] Antoine Bordes, Jason Weston, Ronan Collobert, and Yoshua Bengio. 2011. Learning structured embeddings of knowledge bases. In Twenty-Fifth AAAI Conference on Artificial Intelligence.
- [Bordes et al.2013] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems, pages 2787–2795.
- [Budanitsky and Hirst2006] Alexander Budanitsky and Graeme Hirst. 2006. Evaluating WordNet-based measures of Lexical Semantic Relatedness. Computational Linguistics, 32(1):13–47.
- [Camacho-Collados and Navigli2016] José Camacho-Collados and Roberto Navigli. 2016. Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations. In Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany.
- [Camacho-Collados et al.2015a] José Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2015a. NASARI: a Novel Approach to a Semantically-Aware Representation of Items. In Proceedings of NAACL, pages 567–577.
- [Camacho-Collados et al.2015b] José Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2015b. A Unified Multilingual Semantic Representation of Concepts. In Proceedings of ACL, pages 741–751, Beijing, China.
- [Camacho-Collados et al.2015c] José Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2015c. A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics – Short Papers, pages 1–7, Beijing, China.
- [Camacho-Collados et al.2016a] José Camacho-Collados, Claudio Delli Bovi, Alessandro Raganato, and Roberto Navigli. 2016a. A large-scale multilingual disambiguation of glosses. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pages 1701–1708, Portoroz, Slovenia, May. European Language Resources Association (ELRA).
- [Camacho-Collados et al.2016b] José Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2016b. NASARI: Integrating Explicit Knowledge and Corpus Statistics for a Multilingual Representation of Concepts and Entities. Artificial Intelligence.
- [Chen et al.2014] Xinxiong Chen, Zhiyuan Liu, and Maosong Sun. 2014. A unified model for word sense representation and disambiguation. In Proceedings of EMNLP, pages 1025–1035, Doha, Qatar.
[Chen et al.2015]
Tao Chen, Ruifeng Xu, Yulan He, and Xuan Wang.
Improving distributed representation of word sense via wordnet gloss composition and context clustering.In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing – Short Papers, pages 15–20, Beijing, China.
- [Crouch1988] C. J. Crouch. 1988. A cluster-based approach to thesaurus construction. In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’88, pages 309–320.
- [Curran and Moens2002] James R. Curran and Marc Moens. 2002. Improvements in automatic thesaurus extraction. In Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition - Volume 9, ULA ’02, pages 59–66.
- [Deerwester et al.1990] Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. Journal of American Society for Information Science, 41(6):391–407.
- [Delli Bovi et al.2015] Claudio Delli Bovi, Luis Espinosa Anke, and Roberto Navigli. 2015. Knowledge base unification via sense embeddings and disambiguation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 726–736, Lisbon, Portugal, September. Association for Computational Linguistics.
- [Erk2007] Katrin Erk. 2007. A simple, similarity-based model for selectional preferences. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic.
- [Espinosa-Anke et al.2016] Luis Espinosa-Anke, Horacio Saggion, Francesco Ronzano, and Roberto Navigli. 2016. ExTaSem! Extending, Taxonomizing and Semantifying Domain Terminologies. In Proceedings of the 30th Conference on Artificial Intelligence (AAAI’16).
- [Hassan and Mihalcea2011] Samer Hassan and Rada Mihalcea. 2011. Semantic relatedness using salient semantic analysis. In Proceedings of AAAI, pages 884,889.
- [Huang et al.2012] Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of ACL, pages 873–882, Jeju Island, Korea.
- [Iacobacci et al.2015] Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2015. SensEmbed: Learning sense embeddings for word and relational similarity. In Proceedings of ACL, pages 95–105, Beijing, China.
- [Jauhar et al.2015] Sujay Kumar Jauhar, Chris Dyer, and Eduard Hovy. 2015. Ontologically grounded multi-sense representation learning for semantic vector space models. In Proceedings of NAACL.
- [Johansson and Piña Nieto2015] Richard Johansson and Luis Piña Nieto. 2015. Embedding a semantic network in a word space. In Proceedings of NAACL, pages 1428–1433.
- [Jurgens and Pilehvar2016] David Jurgens and Mohammad Taher Pilehvar. 2016. SemEval-2016 task 14: Semantic taxonomy enrichment. In Proceedings of the 10th International Workshop on Semantic Evaluation, pages 1092–1102, San Diego, California, June.
- [Jurgens et al.2014] David Jurgens, Mohammad Taher Pilehvar, and Roberto Navigli. 2014. Semeval-2014 task 3: Cross-level semantic similarity. SemEval 2014, page 17.
- [Laender et al.2002] Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, and Juliana S. Teixeira. 2002. A brief survey of web data extraction tools. SIGMOD Rec., 31(2):84–93.
- [Li and Jurafsky2015] Jiwei Li and Dan Jurafsky. 2015. Do multi-sense embeddings improve natural language understanding? In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1722–1732, Lisbon, Portugal, September.
- [Liu et al.2015] Yang Liu, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2015. Topical word embeddings. In AAAI, pages 2418–2424.
- [Mikolov et al.2013a] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.
- [Mikolov et al.2013b] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.
- [Miller et al.1990] George A. Miller, R.T. Beckwith, Christiane D. Fellbaum, D. Gross, and K. Miller. 1990. WordNet: an online lexical database. International Journal of Lexicography, 3(4):235–244.
- [Moro et al.2014] Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics (TACL), 2:231–244.
- [Navigli and Ponzetto2012] Roberto Navigli and Simone Paolo Ponzetto. 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217–250.
- [Navigli2009] Roberto Navigli. 2009. Word Sense Disambiguation: A survey. ACM Computing Surveys, 41(2):1–69.
[Neelakantan et al.2014]
Arvind Neelakantan, Jeevan Shankar, Alexandre Passos, and Andrew McCallum.
Efficient non-parametric estimation of multiple embeddings per word in vector space.In Proceedings of EMNLP, pages 1059–1069, Doha, Qatar.
- [Niemann and Gurevych2011] Elisabeth Niemann and Iryna Gurevych. 2011. The people’s web meets linguistic knowledge: automatic sense alignment of Wikipedia and Wordnet. In Proceedings of the Ninth International Conference on Computational Semantics, pages 205–214.
- [Pantel and Lin2002] Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In Proceedings of KDD, pages 613–619.
- [Pennacchiotti et al.2008] Marco Pennacchiotti, Diego De Cao, Roberto Basili, Danilo Croce, and Michael Roth. 2008. Automatic induction of FrameNet lexical units. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 457–465.
- [Pilehvar and Collier2016] Mohammad Taher Pilehvar and Nigel Collier. 2016. De-conflated semantic representations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA.
- [Pilehvar and Navigli2013] Mohammad Taher Pilehvar and Roberto Navigli. 2013. Paving the way to a large-scale pseudosense-annotated dataset. In Proceedings of NAACL-HLT, Atlanta, USA.
- [Pilehvar and Navigli2014] Mohammad Taher Pilehvar and Roberto Navigli. 2014. A robust approach to aligning heterogeneous lexical resources. In Proceedings of ACL, pages 468–478.
- [Pilehvar and Navigli2015] Mohammad Taher Pilehvar and Roberto Navigli. 2015. From senses to texts: An all-in-one graph-based approach for measuring semantic similarity. Artificial Intelligence, 228:95–128.
- [Pilehvar et al.2013] Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. 2013. Align, Disambiguate and Walk: a Unified Approach for Measuring Semantic Similarity. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 1341–1351, Sofia, Bulgaria.
- [Radinsky et al.2011] Kira Radinsky, Eugene Agichtein, Evgeniy Gabrilovich, and Shaul Markovitch. 2011. A word at a time: computing word relatedness using temporal semantic analysis. In Proceedings of WWW, pages 337–346, Hyderabad, India.
- [Reisinger and Mooney2010] Joseph Reisinger and Raymond J. Mooney. 2010. Multi-prototype vector-space models of word meaning. In Proceedings of ACL, pages 109–117.
- [Rothe and Schütze2015] Sascha Rothe and Hinrich Schütze. 2015. AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes. In Proceedings of ACL, pages 1793–1803, Beijing, China, July. Association for Computational Linguistics.
- [Snow et al.2007] Rion Snow, Sushant Prakash, Daniel Jurafsky, and Andrew Y. Ng. 2007. Learning to merge word senses. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 1005–1014, Prague, Czech Republic.
[Šuster et al.2016]
Simon Šuster, Ivan Titov, and Gertjan van Noord.
Bilingual learning of multi-sense embeddings with discrete autoencoders.In Proceedings of NAACL-HLT.
- [Tian et al.2014] Fei Tian, Hanjun Dai, Jiang Bian, Bin Gao, Rui Zhang, Enhong Chen, and Tie-Yan Liu. 2014. A probabilistic model for learning multi-prototype word embeddings. In COLING, pages 151–160.
- [Turney and Pantel2010] Peter D. Turney and Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37:141–188.
- [Turney et al.2003] Peter D. Turney, Michael L. Littman, Jeffrey Bigham, and Victor Shnayder. 2003. Combining independent modules to solve multiple-choice synonym and analogy problems. In Proceedings of Recent Advances in Natural Language Processing, pages 482–489, Borovets, Bulgaria.
- [Vu and Parker2016] Thuy Vu and D Stott Parker. 2016. K-embeddings: Learning conceptual embeddings for words using context. In Proceedings of NAACL-HLT, pages 1262–1267.
- [Wu and Giles2015] Zhaohui Wu and C Lee Giles. 2015. Sense-aaware semantic analysis: A multi-prototype word representation model using wikipedia. In AAAI, pages 2188–2194. Citeseer.